Knowledge Distillation vs Post-hoc Interpretability
Model extraction aims at distilling the knowledge in a black-box model and capture it in a white-box explainable model. The explainable model gives a global view of the original model, while it can easily be simulated to get local explanation for each prediction. To compare the quality of the extracted model w.r.t the original one, quantitive measures such as fidelity (how well the extracted model approximates the predictions of the original one) are commonly used. The aim of this project is to have a closer look at the qualitative differences of the original and extracted model using post-hoc interpretability methods and answer questions such as are the important features/samples in the original model stay important in the extracted one? Such qualitative analysis will be used to propose hybrid explainability methods that stay faithful to the original model not only in terms of quantitative measures but also qualitative aspects such as features that the model relies on the most for predictions.
The ideal candidate for this project has experience with post-hoc explainability methods. Some relevant references are:
What Have You Learnt? Knowledge Discovery Using Deep Neural Networks
Majority of interpretability methods used for explaining the predictions of neural networks, such as feature importance and sample-based explanations are post-hoc and focus on measuring the impact of perturbation on a network’s prediction. In this project we aim to take one step further than prediction explanation and use cutting edge interpretability methods for knowledge discovery. In particular the aim of this project is to extract high level structures from hidden space of a DNN and verify whether they capture a known structure (e.g., a signal that distinguishes samples of one class against another) or a new one that can be inspected for discovery purposes. Connection between these structures and downstream tasks (e.g., whether a structure is shared between classes or its a class-specific structure) can nonetheless be used for interpretability purposes.
The ideal candidate for this project will have a strong background in deep learning. Familiarity with explanatory AI is a big plus. Some relevant references are below:
Machine Learning Explainability in Integrative Cancer Medicine (co-supervised with Marwa Mahmoud)
Explainability is becoming an inevitable part of machine-learning enabled systems, in particular in the light of the EU’s General Data Protection Regulation (GDPR). The importance of explanation is more evidenced in safety critical domains such as healthcare. The aim of this project is to add an explainability module to VIIDA, a visual integrative interface for data analysis, recently developed by CancerAI team in the lab as a part of MFICM. The module will bring explainability methods for feature importance , example-based explanation  and model extraction  together in a plug-in to VIIDA. User studies will be conducted to evaluate the usability and extensibility of the module. This is a very exciting opportunity for students who want to work in the intersection of ML and HCI in heath domain.
The ideal candidate for this project requires great python programming skills as well as API and web-based development. Experience with explainable AI is a great advantage.
 from libraries such as lime, SHAP, and iNNvestigate
 from the following repo: https://github.com/Timothy-Ye/example-based-explanation
 from the following repo: https://github.com/mateoespinosa/rem
Concept Learning for Human-Like Explanation in Machine Learning
Majority of explainability methods focus on providing explanation at input feature level (the reveal the most important feature for making a prediction). Recently there has been a shift towards concept-based explanation, where concepts refer to high level intermediate representation of the input that is relevant to the downstream task and also meaningful to humans. As a result of this in tasks such as image classification we can explain the behaviour of a classifier in terms of a groups of pixels that signify a semantically relevant (e.g. beak, feather, leg) entity rather than individual pixels that are on they own not much informative.
Concept-based explanation methods are however mainly focused on preserving the properties of concept w.r.t. to output (i.e. that the intermediate representation can predict the classification labels well if used instead of input features). The aim of this project is to develop methods that comply with certain properties w.r.t. to input as well as output (e.g. that the concepts indeed only capture the part of input that is necessary for that concept and nothing more). This is a great opportunity for students who want to be involved in the forefront of explainability in ML that is getting increasingly important for full deployment of ML in society, in particular in the light of the EU’s General Data Protection Regulation (GDPR). Few core papers relevant to this project are listed below.
The ideal candidate for this project will have a strong background in deep learning. Familiarity with explanatory AI is a big plus.