Model interpretability — Making your model confesses: Shapley values

In a previous post, I wrote about why checking model fairness is such a critical task. I am starting here a series of post where I will share with you some ways you can achieve different levels of interpretability from your model and make it confess. Today I will introduce you to the topic and our first method: Shapley values.


Miller, Tim. 2017 “Explanation in Artificial Intelligence: Insights from the Social Sciences.” defines interpretability as “the degree to which a human can understand the cause of a decision in a model”. So it means it’s something that you achieve in some sort of “degree”. A model can be “more interpretable” or “less interpretable”. Keep that in your head.

The reason we want interpretability is that when achieved, we would have a way to check the following (Doshi-Velez and Kim 2017):

  • Fairness: See my post “What does it mean for a model to be fair
  • Privacy: Ensuring that sensitive information cannot be disclosed by the model. e.g. being able to guess sensitive information but submitting specific examples to the model.
  • Robustness: Changes in the input lead to “proportional changes” (yes, I know it’s is a vague definition. I promise to clarify that in the future) in the outcome/predictions.
  • Causality: Only causal relationships are picked up. This means, for instance, that we don’t predict the sex of a baby based on the stock price of NYSE, for instance.
  • Trust: Doshi-Velez and Kim talk about Trust since things are most trusty when you understand them. This is especially important because, in my opinion, it can happen that people would tend to grant more intelligence to systems that behave in ways they don’t understand.

How to achieve interpretability?

OOTB: Get it Out-Of-The-Box

The most straightforward way to get to interpretable machine learning is, yas, to use an algorithm that creates interpretable models at the very beginning. Problem solved! Life’s not that hard after all. This includes:

  • Linear models
  • Logistic Regression
  • Decision Trees
  • Naïve Bayes
  • K-nearest neighbors

However, more advanced algorithms don’t get that luxury, including Random Forest, Boosted Decision Trees, Neural Networks, etc. We need another approach and those are the following:

Most commonly used methods for explainability:

Model-Agnostic: These are methods that do not rely on any particularity of the model we want to interpret so the great advantage lies in their flexibility. Machine learning developers are free to use any machine learning model they like when the interpretation methods can be applied to any model.

Gradient-based: There are methods that rely on backpropagation and gradient descent training data. Because of that, it can only be applied when a model was trained using such a method, like in Neural Networks.

  • Saliency maps (coming soon)
  • TCAV — Concept Activation Vectors (coming soon)

Example-based method: There are methods that explain a model by selecting instances of the dataset and not by creating summaries of features as before. This works well for images or text when examples have context.

  • Counterfactual explanations (coming soon)
  • Anchors (coming soon)
  • Adversarial example (coming soon)

1) Shapley values

The idea behind Shapley values (Shapley, Lloyd S 1953) is as follow: Given a feature set, find each feature’s marginal contribution to the overall prediction. Ok… what’s the overall prediction? It is the expected value of the model (EV). Think of it as the baseline of the model. Then, the marginal contribution would mean how much each feature forces the prediction to move away from that baseline.

The way Sharpley values calculate the marginal contribution is by computing the predicted value with and without the feature value currently being considered and take the difference to get the marginal contribution. Finally, the Sharpley value is calculated by averaging the marginal contribution of the feature value across all such possible feature subsets (called coalitions) inside the feature set where the feature participates.

However, you may start wondering how to run a model “with and without a feature” since most of the models can’t handle missing data. It basically replaces the feature values of features that are not in a coalition with random values from the dataset to get a prediction from the machine learning model. Another alternative is to use the expected value for such a feature.


The interpretation of the Shapley value X is: The value of the feature A contributed X to the prediction of this particular instance compared to the average prediction for the dataset. More clearly, it is the contribution of a feature value to the difference between the actual prediction and the mean prediction.

If the actual prediction is Y and the average prediction is Ym, then the sum of all Shapley values for all the features in the model will equal to Y – Ym, which is the deviation of the prediction from its basiline.

Note that a Shapley value can be either positive or negative, as you may infer from the definition.

Let’s see an example:

A friend of mine has been working with a dataset to understand the aspects that make a verb more likely to be used in the past or in the present when referring to an event in the past (he is a researcher in linguistics). To do that he is using a dataset where he has different features about how the verb is used and finally, the target variable he wants to predict which is if the sentence is in present or past. He used a Random Forest to do that, and as you now know, the random forest is not that easy to interpret. So. what’s the output or information we can get from applying Shapley values to this model?

The following image shows the output of applying the method with this dataset using Python and the library SHAP (see notes below for some disclaimers about this library). You can install this library using PIP.

import shap
explainer = shap.TreeExplainer(model) #(***)
shap_values = explainer.shap_values(X_test)shap.initjs()
shap.summary_plot(shap_values, X_test, plot_type="bar")

(***) You will see later in the post that SHAP library implements an approximation of the Shapley values. For this, you need to give the library some details of the model you are using. Since I’m using a Random Forest, the explainer is a TreeExplainer. There specific explainers for Neural Networks and Kernel methods.

This graph is telling us the importance of each of the features at classifying something as present or past (class 0 or class 1). Notes that the library gives the importance of a feature by class. This is useful since some features may be relevant for one class, but not for another. Of course, in this model is a binary classification task, so it won’t surprise us to find that if a feature is important to classify something as Class 0, it will be so for Class 1. In a multiclass setting may be different.

If we want to explain a particular prediction we can do as follows:

instance_to_explain = 0
shap.force_plot(explainer.expected_value[1], shap_values[1][instance_to_explain], X_test.iloc[instance_to_explain])
Shapley values for the verb to be used in the past rather than present.

This graph is called a forces graph and displays the results of the Shapley value for an observation. If we pay attention, you will see at the very middle the legend “Base value 0.5645”. This refers to the definition of Shapley values. We said that they are the contribution of a feature value to the difference between the actual prediction and the mean prediction. This number is the base predictionYou can get the base prediction by:

# The following is the expected probability of something being 
# classified as Class 1. explainer.expected_value[0] gives the 
# expected probability of something being classfied as Class 0. Any # SHAP value contributes towards or against this base expected 
# probability, which is calcultated for the dataset, not for the 
# model.

Then, the arrows below the line indicate all the features values that are moving the actual prediction from the base value to 0.73 (0.73 probability of the sentence to be in present). Feature values in red moves the prediction to bigger values from the baseline and blue arrow to smaller. In this case, for instance, it says that the main verb (feature main_v) being “demand” (encoded as 1) is forcing the prediction to move 0.06 more likely as “present”. This diagram makes really easy to understand why something is classified as it is.

Another interesting thing you can do is understand which values of the feature affect your predictions. You can achieve that as follows:

features= X_test.columns.tolist()
for feat in features:
    shap.dependence_plot(feat, shap_values[1], X_test, dot_size=100)

The output will be a sequence of graphs like the following (I’m showing just one feature to make it easier to read):

Shapley values for the feature type_subv against its range of values

In this case, we are seeing the Shapley value for type_subv according to the different values the feature can take. As can be seen, the feature is more relevant when it has values 4 (it is a categorical feature).


The Shapley value is really important as it is the only attribution method that satisfies the properties of EfficiencySymmetryDummy, and Additivity, which together can be considered a synonymous of a fair explanation:

  • Efficiency, meaning that the feature contributions must add up to the difference of prediction with the features and the average (expected value). This is highly important as if you see it the other way around, it means that the average prediction is fairly distributed across all the features.
  • Symmetry, the contributions of two feature values A and B should be the same if they contribute equally to the prediction.
  • Dummy, a feature with Shapley value of 0 does not change the predicted value.
  • Additivity guarantees that for a feature A used in a model M that average the prediction of two other models M1 and M2, you can calculate the Shapley value for each model individually M1 and M2, average them, and get the Shapley value of A for model M.

In situations where the law requires explainability (GDPR), Shapley values may be the method able to provide a full explanation with limited assumptions over the data. Other methods, like LIME, assumes linearity at locality level which may not hold true.


It requires a lot of computing time. For more than a few features, the exact solution to this problem becomes problematic as the number of possible coalitions exponentially increases as more features are added. This is the reason why most of the packages implement an approximation method since it is likely to be the only feasible option.

Another drawback is that Shapley value method suffers from the inclusion of unrealistic data instances when features are correlated. Remember that to simulate that a feature value is missing from a coalition, we marginalize the feature by sampling values from the feature’s marginal distribution. When features are dependent, then we might sample feature values that do not make sense for this instance.

Available packages:

SHAP, an alternative formulation of the Shapley values, is implemented in Python. SHAP turns the Shapley values method into an optimization problem and hence is an approximation. The results of SHAP are sparse (many Shapley values are estimated to be zero), which is the biggest difference from the classic Shapley values.


IML, contains an implementation available for R.



In the next post of the series, we will have a look at a different approach which has a lower computational requirement and hence doesn’t need of an approximation to be computed: Permutation feature importance. Stay tuned!

Original post:

Leave a Reply

Your email address will not be published. Required fields are marked *