Interpretability vS. Explainability

Interpretability VS. Explainability

Interpretability and explainability are often used interchangeably in the literature, and while in some cases, the semantic intention of both words is the same, it is still important for us to unpack the differences for a more in-depth understanding of the concepts.

Interpretability

Interpretability has been defined as is “the extent to which a cause and effect can be observed within a system.” In other words, you can understand what is going to happen when you change the input and/or parameters of a system; you can see what is happening when and where. When you apply this definition to a model, which if you recall, is just a representation of reality, interpretability describes the extent to which a human can understand a model intuitively, based only on how the model is designed and without needing additional information. A model’s interpretability is separate from how well a model represents reality; it is about understanding how the model itself works and how it makes its decision. To put it differently, a model could be a terrible model, but it could still be easy to understand. For example, I could create a model that says, for every inch of rainfall received, the stock of Company X will increase by 1%. Regardless of its accuracy, the model itself is very easy to understand; I know why the model gives me the answer it does and I know exactly how the output of the model will change as the amount of rainfall changes. 

Another example is the CHADS2-VASc score, which is considered an interpretable model. Based on the model, you know that the risk for stroke is linearly calculated based on the sum of seven different risk factors. The more risk factors a patient has, the higher their risk score. It is easy to see exactly how the CHADS2-VASc score is calculated. It is easy to see how the model uses the input data to represent risk and make a final risk prediction score. 

In the world of machine learning and deep learning, there is a trade off between complexity and interpretability. Deep learning models are able to represent complex, non-linear relationships and can perform tasks that would be impossible for simple algorithms or logic-based decision trees to perform; however, the very complexity of those models, which allows them to learn the non-linear relationships, is also the same reason they may not be interpretable. So, if a model is not interpretable, the next question we ask ourselves is this: “Is it explainable?” 


Explainability

Explainability is “the extent to which the internal mechanics of a machine or deep learning system can be explained in human terms.” Put simply, it refers to how well you can actually explain the inner workings of a model in a way that is understandable to the layperson. Why do we need the term “explainability”? As we just saw, not all models are interpretable (intuitively understandable). Therefore, we need another way to try to make sense of them, and that is where explainability comes into play. Can we break a complex model down into smaller parts that are each simple enough to be interpretable? Can we use this strategy to understand how and why the model made the prediction it did? 

In the upcoming sections of the roadmap, we will talk about different types of strategies, or techniques, used to try to explain a model, where they are helpful, and where they sometimes go wrong. Even within explainability methods, there are important clinical caveats to be aware of, and we will go over those in detail.  

Interpretability vs Explainability 

Interpretability and explainability are both continuums, sometimes with blurred edges of where interpretability ends and explainability begins. To help make the distinction clearer, let’s contrast the two.

If we were to use drug therapies as examples, interpretability would be analogous to understanding a clinical trial result that shows antiviral A reduces mortality by 20% among those with a viral infection. Intuitively, we can grasp how this drug reduces risk of death, because it treats an infection, thereby preventing death caused by the infection. No other information or explanation is needed. Now, to illustrate non-interpretability, let’s consider a placebo-controlled trial, which shows that a new antipsychotic drug reduces length of stay for hospitalizations by 30%. Knowing nothing else but the therapeutic class of the drug, it is not clear how this drug produces this result simply based on its mechanism of action. In other words, we do not intuitively know how to interpret this result. What is the causal relationship between antipsychotic treatment and length of hospital stay? We would have to look at other results or clinical data to see how the drug might cause this effect, and to do so, we might consider further studies to demonstrate additional mechanisms of action or other clinical effects of the drug. For example, perhaps the drug reduces delirium, so we might test if the drug reduces ICU delirium and ICU length of stay, therefore reducing overall hospital length of stay. In this scenario, an antipsychotic reducing ICU length of stay by decreasing ICU delirium is more interpretable than the study showing a 30% reduction in overall length of stay, because it’s broken down into relationships that can be understood without additional knowledge. 

To continue with the above example to illustrate explainability, we can use the interpretable result (ie. reducing ICU delirium) about a drug effect to help explain the non-interpretable result (ie. reducing the overall length of stay). Keep in mind, however, that we are not directly measuring how the drug reduces the total length of stay; we are measuring something else to help explain how the placebo-controlled trial got the result that it did. We do not know if this reduction in ICU delirium is the primary factor in how this drug reduces the length of stay, and we do not even know if it is a causative factor; it is simply a fact about the drug that logically matches with the study results we do not know, fully, how to explain. 

What Model Explainability is Not: Explainability is not knowledge discovery

To continue setting the stage before we dive into the types of models used, let us just clarify the difference between a prediction and an explanation in the context of knowledge discovery. Deep learning is often used to generate a prediction, as we have seen in earlier sections of the AI Roadmap, and this is fundamentally separate and distinct from an explanation. Explanatory studies are designed to help us understand something about the way the world works; we start with a hypothesis, then we generate data to see if the hypothesis is supported. Prediction studies, on the other hand, are designed to help us predict if something will or will not happen (NOT why it will or will not happen); in this case, we start instead, with data, and then, create a model.  The key issue, and the reason that we are careful in outlining these differences to you, is that most people incorrectly think that the model they created explains the underlying mechanism behind the event (ie. that the model, itself, can tell you why or how the event they were trying to predict occurs). More specifically, people sometimes try to use explainability methods as a way to describe how an event or phenomenon works in the real world. This is incorrect. An explainability technique is only helping to explain how a model works, which itself is just a prediction, not a proposed explanation (i.e. testable hypothesis) for how the world works. If you are designing a model to create a prediction, it is exactly that one output only your model is designed to give, the prediction. Prediction does not mean explanation.