Question
Consider a supermarket that contains 1000 products. In a market-basket analysis, you want to compare baskets of 2 customers C1 and C2 to find similarity in their buying behavior. C1’s basket contains sugar, coffee, tea, rice and eggs. C2’s basket contains sugar, coffee, bread and biscuit. Find the Jaccard and simple matching coefficient for the two customers. Comment on which coefficient is more suitable.
Solution
==================================================================================================
Let’s first write the given data in table form:
Step-1: Create a Contingency table
Contingency table structure:
Contingency table for the problem:
Step-2: Calculate Jaccard Coefficient
These are asymmetric binary variables. Thus,
Step-3: Calculate Simple Matching Coefficient
The absence of a particular product in a basket is not important; only its presence is. Thus, the variables are binary asymmetric in nature. Simple Matching Coefficient does not consider this, whereas Jaccard Coefficient does. As such, the latter gives a more accurate indication of similarity.
Jaccard Coefficient gives a more accurate description of this scenario because the variables in question are binary asymmetric variables.