Question

Consider the points: A1(2, 10), A2(2, 5), A3(8, 4), A4(5, 8), A5(7, 5), A6(6, 4), A7(1, 2), A8(4, 9).

a) Compute the distance matrix using Euclidean distance measure.
b) Identify the clusters that could be formed using DBSCAN algorithm, assuming Eps = 2, MinPts = 2.
c) What is the impact on the number of clusters if Eps = \sqrt 10?

Solution

==================================================================================================

This question is about finding clusters using DBSCAN method.

Ans a: Calculate Distance Matrix Using Euclidean Distance

Let’s first find the Euclidean distances of A1 to all other points.

  • d(A1, A2) = [ (2-2)^2 + (10-5)^2 ]^{1/2} = 5
  • d(A1, A3) = [ (2-8)^2 + (10-4)^2 ]^{1/2} = 8.49
  • d(A1, A4) = [ (2-5)^2 + (10-8)^2 ]^{1/2} = 3.61
  • d(A1, A5) = [ (2-7)^2 + (10-5)^2 ]^{1/2} = 7.07
  • d(A1, A6) = [ (2-6)^2 + (10-4)^2 ]^{1/2} = 7.21
  • d(A1, A7) = [ (2-1)^2 + (10-2)^2 ]^{1/2} = 8.06
  • d(A1, A8) = [ (2-4)^2 + (10-9)^2 ]^{1/2} = 2.24

Similarly, find distances for all points from each other. The below is our distance matrix.

    \[ \begin{tabular}{|c|c|c|c|c|c|c|c|c|} \hline _ & A1 & A2 & A3 & A4 & A5 & A6 & A7 & A8\\ \hline A1 & 0 & 5 & 8.49 & 3.61 & 7.07 & 7.21 & 8.06 & 2.24\\ \hline A2 & 5 & 0 & 6.08 & 4.24 & 5 & 4.12 & 3.16 & 4.47\\ \hline A3 & 8.49 & 6.08 & 0 & 5 & 1.41 & 2 & 7.28 & 6.4\\ \hline A4 & 3.61 & 4.24 & 5 & 0 & 3.61 & 4.12 & 7.21 & 1.41\\ \hline A5 & 7.07 & 5 & 1.41 & 3.61 & 0 & 1.41 & 6.71 & 5\\ \hline A6 & 7.21 & 4.12 & 2 & 4.12 & 1.41 & 0 & 5.39 & 5.39\\ \hline A7 & 8.06 & 3.16 & 7.28 & 7.21 & 6.71 & 5.39 & 0 & 7.62\\ \hline A8 & 2.24 & 4.47 & 6.4 & 1.41 & 5 & 5.39 & 7.62 & 0\\\hline \end{tabular} \]

Ans b: Identify Clusters Using DBSCAN Algorithm

Eps = 2 implies that clusters should be formed using points that are less than 2 units of distance from each other.
MinPts = 2 implies that in order to consider a point as part of a cluster, it needs to have at least 2 points (including itself) within Eps = 2.

Based on the given parameters, we have the below findings:

  • Cluster \ 1 = \{ A3, A5, A6 \}
  • Cluster \ 2 = \{ A4, A8 \}
  • Outliers = \{ A1, A2, A7 \}

Ans c: Identify Clusters Using DBSCAN Algorithm when Eps = \sqrt 10

Eps = \sqrt 10 = 3.16

Based on the given parameters, we have the below findings:

  • Cluster \ 1 = \{ A1, A4, A8 \}
  • Cluster \ 2 = \{ A3, A5, A6 \}
  • Cluster \ 3= \{ A2, A7 \}

As a result of a change in Epsilon, the set of points were divided into 3 clusters and there are no outliers.

Thus, we can conclude that an increase in Epsilon can cause outliers to get included in the clusters. We can also say that DBSCAN method is sensitive to the parameters.

Subscribe to Ehan Ghalib!

Invalid email address
We promise not to spam you. You can unsubscribe at any time.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>