MDI, MDA, and Sharpley#
MDI, MDA, and Sharpley#
Feature importance metrics: MDI, MDA, and Sharpley. These terms are often associated with tree-based models, particularly Random Forests. Let’s break them down:
Mean Decrease Impurity (MDI):#
Definition: MDI is a feature importance score obtained from tree-based classifiers, including Random Forests.
Calculation: It corresponds to the
feature_importancesattribute in scikit-learn.Methodology: MDI uses in-sample (IS) performance to estimate feature importance.
Limitations: However, it has a drawback—it can inflate the importance of numerical features. Additionally, it’s computed on statistics derived from the training dataset, which means that even non-predictive features may appear important if the model can use them to overfit¹².
Mean Decrease Accuracy (MDA):#
Definition: MDA is a method applicable to any classifier, not just tree-based ones.
Computation: It measures a feature’s importance by reducing the model’s accuracy after randomly permuting the values of that feature.
Intuition: If an important feature is permuted, the accuracy will significantly decrease, whereas permuting an unimportant feature will have a negligible effect⁴.
Sharpley (or Shapley) Value:#
Definition: The Shapley value is a concept from cooperative game theory, adapted for feature importance.
Idea: It assigns a value to each feature based on its contribution to the model’s prediction.
Calculation: The Shapley value considers all possible combinations of features and evaluates their impact on predictions.
Interpretation: Features that consistently contribute more to predictions receive higher Shapley values.
Advantage: It provides a more nuanced understanding of feature importance, considering interactions between features⁵.
Conclusion#
In summary, MDI relies on impurity-based measures, MDA focuses on accuracy reduction, and Sharpley values offer a holistic view of feature contributions. Depending on your context and goals, you can choose the most suitable method for assessing feature importance in your specific model.
References#
Permutation Importance vs Random Forest Feature Importance (MDI). https://scikit-learn.org/stable/auto_examples/inspection/plot_permutation_importance.html.
Feature Importance — mlfinlab 0.12.0 documentation - Read the Docs. https://random-docs.readthedocs.io/en/latest/implementations/feature_importance.html.
A Debiased MDI Feature Importance Measure for Random Forests - NeurIPS. https://proceedings.neurips.cc/paper/2019/file/702cafa3bb4c9c86e4a3b6834b45aedd-Paper.pdf.
MDI : A Flexible Random Forest-Based Feature Importance Framework. https://arxiv.org/pdf/2307.01932.pdf.
MDI, MDA, and SFI — mlfinlab 1.5.0 documentation. https://www.mlfinlab.com/en/latest/feature_importance/afm.html.