Question

Consider a three-class classification problem. We have a sample set of 15,000 records. The machine learning model is subject to the below conditions upon testing:

  • The model confuses Class 1 with Class 2 and Class 3. The error of classification is 20% with regards to C2 and 10% with regards to C3.
  • The model classifies Class 2 correctly.
  • The model confuses Class 3 with with Class 2.

Create the confusion matrix for this multi-classifier problem.

Solution

==================================================================================================

The total sample size is 9000 = N.

Let’s consider the train-test split of 80-20.

  • Train set size = 0.8 * N = 0.8 * 15000 = 12000
  • Test set size = 0.2 * N = 0.2 * 15000 = 3000

We are considering the confusion matrix from the predicted values. So only the test set data need to be considered.

Let us assume a balanced dataset. I.e. all classes have the same number of samples.

Thus, number of records belonging to C1 = C2 = C3 = \frac{3000}{3} = 1000.

We know the below:

  • Model confuses C1 with C2 and C3. 20% of C1 are classified as C2 (this adds to FP of C2 and C3 and FN of C1) and 10% of C1 are classified as C3.
  • All records of C2 has been classified correctly. I.e. True positives (TP) = Actual Positives of C2.
  • Model confuses C3 with C2. Since no particular number is given, consider a uniform/equal distribution. This is 50%.

Theoretical Confusion Matrix

    \[ \begin{tabular}{|c|c|c|c|c|} \hline \multicolumn{2}{c}{}&\multicolumn{2}{c}{Actual Class}&\\ \multirow{Predicted Class} \\ \hline _ & C1 & C2 & C3 & Total' \\ \hline C1 & TP1 & FN2/FP1 & FN3/FP1 & P1' \\ \hline C2 & FN1/FP2 & TP2 & FN3/FP2 & P2' \\ \hline C3 & FN1/FP3 & FN2/FP3 & TP3 & P3' \\ \hline Total & P1 & P2 & P3 & P1+P2+P3\\ \hline \end{tabular} \]

Ideal Confusion Matrix with No Errors

    \[ \begin{tabular}{|c|c|c|c|c|} \hline \multicolumn{2}{c}{}&\multicolumn{2}{c}{Actual Class}&\\ \multirow{Predicted Class} \\ \hline _ & C1 & C2 & C3 & Total' \\ \hline C1 & 1000 & 0 & 0 & 1000 \\ \hline C2 & 0 & 1000 & 0 & 1000 \\ \hline C3 & 0 & 0 & 1000 & 1000 \\ \hline Total & 1000 & 1000 & 1000 & 3000\\ \hline \end{tabular} \]

Actual Confusion Matrix with Given Errors

  • TP1 = 70% of C1 = 0.7 * 1000 = 700
  • FN1 = FP2 = 20% of C1 = 0.2 * 1000 = 200
  • FN1 = FP3 = 10% of C1 = 0.1 * 1000 = 100
  • TP2 = 100% of C2 = 1000
  • FN3 = FP2 = 50% of C3 = 0.5 * 1000 = 500
  • TP3 = 50% of C3 = 0.5 * 1000 = 500

    \[ \begin{tabular}{|c|c|c|c|c|} \hline \multicolumn{2}{c}{}&\multicolumn{2}{c}{Actual Class}&\\ \multirow{Predicted Class} \\ \hline _ & C1 & C2 & C3 & Total' \\ \hline C1 & 700 & 0 & 0 & 700 \\ \hline C2 & 200 & 1000 & 500 & 1700 \\ \hline C3 & 100 & 0 & 500 & 600 \\ \hline Total & 1000 & 1000 & 1000 & 3000\\ \hline \end{tabular} \]

Thus, we have our desired confusion matrix.

Subscribe to Ehan Ghalib!

Invalid email address
We promise not to spam you. You can unsubscribe at any time.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>