Question

An FMCG Company training set has 100 records for T (tooth paste) & 400 records for competitor C. P, Q, R denote subsets of attribute values in records which consumers use for decision-making.

    \begin{align*} \left R0: \phi \rightarrow T (Null \ rule \ that \ covers \ all) \end{align*}

    \begin{align*} \left R1: P \rightarrow T (Covers \ 4 \ T \ and \ one \ C \ records) \end{align*}

    \begin{align*} \left R2: Q \rightarrow T (Covers \ 30 \ T \ and \ 10 \ C \ records) \end{align*}

    \begin{align*} \left R3: R \rightarrow T (Covers \ 100 \ T \ and \ 90 \ C \ records) \end{align*}

Using FOIL Gain metric, find out which rule would company be interested in for predictive modeling.

Solution

==================================================================================================

We are learning rules for the class T. Thus, tuples of class T will be our positive tuples and rest will be negative tuples.

    \[ \begin{tabular}{|c|c|c|} \hline _ & pos & neg\\ \hline R0 & 100 & 400\\ \hline R1 & 4 & 1\\ \hline R2 & 30 & 10\\ \hline R3 & 100 & 90\\ \hline \end{tabular} \]

The formula for FOIL Gain is:

    \begin{align*}  \boxed{FOIL \ Gain = pos' * [log_2(\frac{pos'}{pos' + neg'}) - log_2(\frac{pos}{pos + neg})] } \end{align*}

    \begin{align*} \left FOIL \ Gain(R0,R1) = pos_1 * [log_2(\frac{pos_1}{pos_1 + neg_1}) - log_2(\frac{pos_0}{pos_0 + neg_0})] \end{align*}

    \begin{align*} \left = 4 * [log_2(\frac{4}{4 + 1}) - log_2(\frac{100}{100 + 400})] = 4 * [-0.32 + 2.32] = 8.01 \end{align*}

    \begin{align*} \left FOIL \ Gain(R0,R2) = pos_2 * [log_2(\frac{pos_2}{pos_2 + neg_2}) - log_2(\frac{pos_0}{pos_0 + neg_0})] \end{align*}

    \begin{align*} \left = 30 * [log_2(\frac{30}{40}) - log_2(\frac{100}{500})] = 30 * [-0.42 + 2.32] = 57 \end{align*}

    \begin{align*} \left FOIL \ Gain(R0,R3) = pos_3 * [log_2(\frac{pos_3}{pos_3 + neg_3}) - log_2(\frac{pos_0}{pos_0 + neg_0})] \end{align*}

    \begin{align*} \left = 100 * [log_2(\frac{100}{190}) - log_2(\frac{100}{500})] = 100 * [-0.93 + 2.32] = 139 \end{align*}

Conclusion

The Company would be most interested in R3 as it has the highest FOIL Gain. As expected, the FOIL Gain favors the rule that have high accuracy and covers many positive tuples.

Subscribe to Ehan Ghalib!

Invalid email address
We promise not to spam you. You can unsubscribe at any time.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>