Interpreting deep learning models

With the fast development of sophisticated machine learning algorithms, artificial intelligence has been gradually penetrating a number of brand new fields with unprecedented speed.

One of the outstanding problems hampering further progress is the interpretability challenge.
This challenge arises when the models built by the machine learning algorithms are to be used by humans in their decision making, particularly when such decisions are subject to legal consequences and/or administrative audits.
For human decision makers operating in those circumstances, to accept the professional and legal responsibility ensuing from decisions assisted by machine learning, it is critical to comprehend the models.
For areas like the healthcare domain, business, crime prediction, etc., mistakes in these areas can be catastrophic. For instance, to develop safe self-driving cars we need to understand their rare but costly mistakes. Therefore, it is imperative to explain the learned representations, relationships between the inputs and the dependent variables and decisions made by these models. To trust the model, decision makers need to first understand the model’s behavior, and then evaluate and refine the model using their domain knowledge. One critical issue associated with future automated systems based on machine learning is its misalignment with the objectives of its stakeholders. That is, whether these systems really behave reliably in unforeseen situations. They may perform pretty well on test cases, but might do the wrong thing in deployment in the wild. Ironically, it could also be revealed later that they were doing the right things for the wrong reasons. Hence, interpretability plays a significant role in assisting us to reduce errors.

Properties of Interpretability

In the machine learning community, recently, interpretability is defined as “the ability to explain or to present in understandable terms to a human”.
Besides definition, a much harder task is to quantify and measure interpretability. Hence, the effort is extended beyond typical machine learning research into human-computer interaction. There are also other studies on aspects of interpretability such as the plausibility of models: the likeliness that a user accepts it as an explanation for a prediction.

For evaluation metrics, there are no mutually agreed standards. Sometimes the evaluation methods are only applicable to a specific model.
I present some general evaluation metrics: fidelity, comprehensibility and accuracy, which are frequently used by some state-of-art works.

Fidelity: It is not realistic for the interpretation model to be entirely faithful to the black-box model. Fidelity demands that the interpretation model’s prediction should match that of the black-box model as closely as possible. In other words, the interpretation model tries to mimic the behavior of the model itself on the instance being predicted.

Comprehensibility: Comprehensibility requires that the interpretation results are understandable to the users. When building an interpretation method, we should take into consideration the limitation of human cognition. For instance, decision trees involving thousands of nodes and decision rules having hundreds of levels of if-then conditions are not interpretable in this sense, although they are commonly regarded as inherently interpretable algorithms for textual representations.

Accuracy: Accuracy measures the performance of the interpretation model on the original training data used to train the black-box model to check if the interpretation model could outperform the black-box model. The measurements could be traditional evaluation metrics in machine learning such as accuracy score, AUC score, F1-score, etc.

Interpretations of models

One interpretation method is a visualization technique that interprets deep Convolutional Neural Networks (CNN) via meta-learning, named CNN-INTE . Compared to LIME which provides local interpretations for the entire model in specific regions of the feature space, this method provides global interpretation for any test instances on the hidden layers in the whole feature space.

The second interpretation method applies the Knowledge Distillation technique to distill Deep Neural Networks into decision trees in order to attain good performance and interpretability simultaneously.

To know more about these interpretations, please look at these research papers:

Xuan Liu, Xiaoguang Wang, and Stan Matwin. “QDV: Refining Deep Neural Networks with Quantified Interpretability.” In 2020 European Conference on Artificial Intelligence (ECAI), submitted.
Liu, Xuan, Xiaoguang Wang, and Stan Matwin. “Improving the Interpretability of Deep Neural Networks with Knowledge Distillation.” In 2018 IEEE International Conference on Data Mining Workshops (ICDMW), pp. 905-912. IEEE, 2018.
Liu, Xuan, Xiaoguang Wang, and Stan Matwin. “Interpretable deep convolutional neural networks via meta-learning.” In 2018 International Joint Conference on Neural Networks (IJCNN), pp. 1-9. IEEE, 2018.

AI trustworthiness artificial intelligence decision-making processes Deep Neural Networks healthcare domain interpretability evaluation knowledge distillation machine learning algorithms research papers risks mitigation self-driving cars SEO Keywords: deep learning interpretation visualization techniques

Interpreting deep learning models

Properties of Interpretability

Interpretations of models

Why Meta-learning is important

Speaker at the next Future Labs 2021

You may also like