Question
Consider a three-class classification problem. We have a sample set of 15,000 records. The machine learning model is subject to the below conditions upon testing:
- The model confuses Class 1 with Class 2 and Class 3. The error of classification is 20% with regards to C2 and 10% with regards to C3.
- The model classifies Class 2 correctly.
- The model confuses Class 3 with with Class 2.
Create the confusion matrix for this multi-classifier problem.
Solution
==================================================================================================
The total sample size is 9000 = N.
Let’s consider the train-test split of 80-20.
- Train set size = 0.8 * N = 0.8 * 15000 = 12000
- Test set size = 0.2 * N = 0.2 * 15000 = 3000
We are considering the confusion matrix from the predicted values. So only the test set data need to be considered.
Let us assume a balanced dataset. I.e. all classes have the same number of samples.
Thus, number of records belonging to C1 = C2 = C3 = = 1000.
We know the below:
- Model confuses C1 with C2 and C3. 20% of C1 are classified as C2 (this adds to FP of C2 and C3 and FN of C1) and 10% of C1 are classified as C3.
- All records of C2 has been classified correctly. I.e. True positives (TP) = Actual Positives of C2.
- Model confuses C3 with C2. Since no particular number is given, consider a uniform/equal distribution. This is 50%.
Theoretical Confusion Matrix
Ideal Confusion Matrix with No Errors
Actual Confusion Matrix with Given Errors
- TP1 = 70% of C1 = 0.7 * 1000 = 700
- FN1 = FP2 = 20% of C1 = 0.2 * 1000 = 200
- FN1 = FP3 = 10% of C1 = 0.1 * 1000 = 100
- TP2 = 100% of C2 = 1000
- FN3 = FP2 = 50% of C3 = 0.5 * 1000 = 500
- TP3 = 50% of C3 = 0.5 * 1000 = 500
Thus, we have our desired confusion matrix.