Question

Given the following distance matrix, compute LOF (Local Outlier Factor) of all points, based on 2-distance neighborhood consideration (k=2). Show intermediate steps.
\newline
Local Outlier Factor

Solution

==================================================================================================

k = 2 => Second nearest neighbor

Step-1: Find the dist_{k}(O)

dist_{k}(O) is the distance between point O and its k^{th} nearest neighbor.

  • dist_{2}(P1) = dist(P1, P2) = dist(P1, P5) = 0.7
  • dist_{2}(P2) = dist(P2, P4) = 0.15
  • dist_{2}(P3) = dist(P3, P4) = dist(P3, P5) = 0.12
  • dist_{2}(P4) = dist(P4, P3) = 0.12
  • dist_{2}(P5) = dist(P5, P3) = 0.12
  • dist_{2}(P6) = dist(P6, P2) = 0.65

Step-2: Calculate N_{k}(O)

N_{k}(O) is the set of all points within the k^{th} distance neighborhood of O.

    \begin{align*}  \boxed{N_{k}(O) = \left\{ O' \mid dist(O,O') \leq dist_{k}(O) \right\}} \end{align*}

  • N_{2}(P1) = \left\{ P2, P4, P5 \right\}
  • N_{2}(P2) = \left\{ P3, P4 \right\}
  • N_{2}(P3) = \left\{ P2, P4, P5 \right\}
  • N_{2}(P4) = \left\{ P3, P5 \right\}
  • N_{2}(P5) = \left\{ P3, P4 \right\}
  • N_{2}(P6) = \left\{ P2, P3 \right\}

Step-3: Calculate lrd_{k}(O)

lrd_{k}(O) is the Local Reachability Density of O.

    \begin{align*}  \boxed{lrd_{k}(O) = \frac{\lVert N_{k}(O) \rVert}{ \sum_{O' \epsilon N_{k}(O)} reachdist_{k}(O' \leftarrow O)} } \end{align*}

\lVert N_{k}(O) \rVert means the number of objects in N_{k}(O).

  • \lVert N_{2}(P1) \rVert = 3
  • \lVert N_{2}(P2) \rVert = 2
  • \lVert N_{2}(P3) \rVert = 3
  • \lVert N_{2}(P4) \rVert = 2
  • \lVert N_{2}(P5) \rVert = 2
  • \lVert N_{2}(P6) \rVert = 2

reachdist_{k}(O' \leftarrow O) is the reachability distance from O to O’.

    \begin{align*}  \boxed{reachdist_{k}(O' \leftarrow O) = max \left\{ dist_{k}(O), dist(O,O') \right\}} \end{align*}

  • reachdist_{2}(P2 \leftarrow P1) = max \left\{dist_{2}(P2),dist(P2,P1) \right\} = max \left\{0.12,0.7\right\}
  • reachdist_{2}(P4 \leftarrow P1) = max \left\{dist_{2}(P4), dist(P4,P1) \right\} = max \left\{0.12,0.65\right\}
  • reachdist_{2}(P5 \leftarrow P1) = max \left\{dist_{2}(P5), dist(P5,P1) \right\} = max \left\{0.12,0.7\right\}

    \begin{align*}  \Aboxed{lrd_{2}(P1) = \frac{\lVert N_{2}(P1) \rVert}{ reachdist_{2}(P2 \leftarrow P1) + reachdist_{2}(P4 \leftarrow P1) + reachdist_{2}(P5 \leftarrow P1)} } \end{align*}

    \begin{align*}  \Aboxed{lrd_{2}(P1) = \frac{3}{0.7 + 0.65 + 0.7} = 1.46} \end{align*}

    \begin{align*}  \Aboxed{lrd_{2}(P2) = \frac{\lVert N_{2}(P2) \rVert}{reachdist_{2}(P3 \leftarrow P2) + reachdist_{2}(P3 \leftarrow P2)} } \end{align*}

    \begin{align*}  \Aboxed{lrd_{2}(P1) = \frac{2}{0.12 + 0.15} = 7.41} \end{align*}

    \begin{align*}  \Aboxed{lrd_{2}(P3) = \frac{\lVert N_{2}(P3) \rVert}{ reachdist_{2}(P2 \leftarrow P3) + reachdist_{2}(P4 \leftarrow P3) + reachdist_{2}(P5 \leftarrow P3)} } \end{align*}

    \begin{align*}  \Aboxed{lrd_{2}(P3) = \frac{3}{0.15 + 0.12 + 0.12} = 7.69} \end{align*}

    \begin{align*}  \Aboxed{lrd_{2}(P4) = \frac{\lVert N_{2}(P4) \rVert}{reachdist_{2}(P3 \leftarrow P4) + reachdist_{2}(P5 \leftarrow P4)} } \end{align*}

    \begin{align*}  \Aboxed{lrd_{2}(P4) = \frac{2}{0.12 + 0.12} = 8.33} \end{align*}

    \begin{align*}  \Aboxed{lrd_{2}(P5) = \frac{\lVert N_{2}(P5) \rVert}{reachdist_{2}(P3 \leftarrow P5) + reachdist_{2}(P4 \leftarrow P5)} } \end{align*}

    \begin{align*}  \Aboxed{lrd_{2}(P5) = \frac{2}{0.12 + 0.12} = 8.33} \end{align*}

    \begin{align*}  \Aboxed{lrd_{2}(P6) = \frac{\lVert N_{2}(P6) \rVert}{reachdist_{2}(P2 \leftarrow P6) + reachdist_{2}(P3 \leftarrow P6)} } \end{align*}

    \begin{align*}  \Aboxed{lrd_{2}(P6) = \frac{2}{0.65 + 0.63} = 1.56} \end{align*}

Step-4: Calculate LOF_{k}(O)

    \begin{align*}  \boxed{LOF_{k}(O) = \sum_{O' \epsilon N_{k}(O)} lrd_{k}(O') . \sum_{O' \epsilon N_{k}(O)} reachdist_{k}(O' \leftarrow O) } \end{align*}

LOF is the Local Outlier Factor.

    \begin{align*}  \Aboxed{LOF_{2}(P1) = [lrd_{2}(P2) + lrd_{2}(P4) + lrd_{2}(P5)] . [reachdist_{2}(P2 \leftarrow P1) + reachdist_{2}(P4 \leftarrow P1) + reachdist_{2}(P5 \leftarrow P1)] } \end{align*}

    \begin{align*}  \Aboxed{ = [7.41 + 8.33 + 8.33] . [0.7 + 0.65 + 0.7] = 49.34 } \end{align*}

    \begin{align*}  \Aboxed{LOF_{2}(P2) = [lrd_{2}(P3) + lrd_{2}(P4)] . [reachdist_{2}(P3 \leftarrow P2) + reachdist_{2}(P4 \leftarrow P2)] } \end{align*}

    \begin{align*}  \Aboxed{ = [7.69 + 8.33] . [0.12 + 0.15] = 4.32 } \end{align*}

    \begin{align*}  \Aboxed{LOF_{2}(P3) = [7.41 + 8.33 + 8.33] . [0.15 + 0.12 + 0.12] = 9.39 } \end{align*}

    \begin{align*}  \Aboxed{LOF_{2}(P4) = [7.69 + 8.33] . [0.12 + 0.12] = 3.84 } \end{align*}

    \begin{align*}  \Aboxed{LOF_{2}(P5) = [7.69 + 8.33] . [0.12 + 0.12] = 3.84 } \end{align*}

    \begin{align*}  \Aboxed{LOF_{2}(P6) = [7.41 + 7.69] . [0.65 + 0.63] = 19.33 } \end{align*}

Step-5: Find the Outlier

Sort the LOF in descending order:

  • LOF_{2}(P1) = 49.34
  • LOF_{2}(P6) = 19.33
  • LOF_{2}(P3) = 9.39
  • LOF_{2}(P2) = 4.32
  • LOF_{2}(P4) = 3.84
  • LOF_{2}(P5) = 3.84

The top 1 outlier is P1.

Subscribe to Ehan Ghalib!

Invalid email address
We promise not to spam you. You can unsubscribe at any time.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>