Assignment 2. Explaining Model Predictions to Users
Due: 11:59pm on 11/13 (Fri)
20% toward your grade
Learning Objectives
Explainable AI helps users understand and interpret predictions produced by models. The objective of this assignment is for you to try existing off-the-shelf tools for explanations, think about strengths and weaknesses of the explanations they provide, and design your own user-centered explanations that can address such weaknesses.
Background
You will work with methods for explaining model predictions in image classification tasks. Such explanations help users resolve questions around what’s happening inside of the model and why. As users explore these explanations, they may come up with additional questions about the model, which possibly requires other kinds of explanations.
What should I do?
In this assignment, you are asked to (1) explore
Google’s What-If Tool, a platform that helps users understand the performance of models, (2) build and run an algorithm based on
LIME for presenting which parts of an image contribute to the prediction for better interpretation of classification results, and (3) design a UI that further helps users interpret the results especially when such explanation is not enough. For each of the stages, you are asked to discuss what can be explained with such tools/methods, and limitations of such explanations. For (2), we are going to use
Google Colab, an environment for executing code with documentations, similar to Jupyter Notebook.
Instructions
- Go to a What-If Tool demo about a smile classification task. You can explore the dataset, browse the model performance, and experiment with the model by asking what-if questions to the tool.
- Answer questions in Discussion “Stage 1” below. The discussion contains a specific task that you need to perform on the What-If tool.
- Go to the Colab notebook, which contains skeleton code that describes an algorithm that explains which parts of an image contribute to the prediction result. You need to fill out some blanks in the code with your implementation to make it work.
- Go to “File → Save a copy in Drive” so that your code is maintained in your personal Google drive. You need to have a Google account for this.
- Implement the empty parts in the skeleton code.
- Answer questions in Discussion “Stage 2” below.
- Create your own prototype of an interactive explainable UI [examples] to help users better understand the prediction results from Stage 2. Please follow the steps below:
- Prepare an example where the prediction result and LIME explanation from Stage 2 is hard to understand, e.g., “I cannot judge if referring to the shape of the ears would lead to a correct prediction, so I need to see more examples for comparison.”
- Come up with dimensions and functionalities that you think would be helpful to provide a better explanation in the case you identified in Step a, e.g., data points’ feature editor, feature statistics visualization, ROC curve visualization, a chatbot for explanations, etc.
- Use Figma to build a prototype of a web-based UI like Google’s What-If Tool that interactively shows all the information you listed in Step b. Make it interactive so that users can proactively seek satisfying explanations. For example, users may want to query additional information, request clarifications, edit data points to test a hypothesis, or give feedback to improve model performance. Your prototype doesn’t need to be fully interactive with every feature you add, e.g.,”login” button or “upload new image” button doesn’t need to be interactive. Please focus on the core interactive and explainable concept you’re introducing.
Answer questions in Discussion “Stage 3” below. You should include the URL of your prototype in the report.
Discussion
Stage 1. (What-If Tool)
What-If Tool consists of three tabs: Datapoint editor, Performance & Fairness, and Features. Each tab represents different aspects of the model and results.
Datapoint editor tab
- 1-1. Choose an image on the right plane, and modify a feature value on the left. What happens? How can the Datapoint editor be used as an explanation? Do you have any interesting observations?
- 1-2. You can set X-Axis, Y-Axis, Color By, and others on the top bar. Fix X-Axis to “Inference correct” and explore the visualization updates by changing other dimensions. What happens? How can this be used as an explanation? Do you have any interesting observations? You can try various combinations and explore how the visualization updates.
Performance & Fairness tab
- 1-3. It shows the overall performance of the model. How can this be used as an explanation? Do you have any interesting observations?
- 1-4. In the Configure tab on the left, fix “Ground Truth Feature” to “Smiling” and select a feature on “Slice By”. What happens? How can this be used as an explanation? Do you have any interesting observations?
Strength and Limitation of the tool
- 1-5. What are the strengths of this tool as an explanation?
- 1-6. What are the weaknesses of this tool as an explanation?
Stage 2. (LIME explainer)
- 2-1. Report the result of Activity 2, which asks you to compare explanations of multiple images. Attach code and images if necessary.
- 2-2. How does the generated explanation, which presents a set of important superpixels, help users understand the model predictions? Does it complement performance measures such as accuracy and F1-score? If so, how?
- 2-3. In Activity 2, did you find any image for which the algorithm does not give an explanation that is easy to understand for users? Why do you think the algorithm gives such an explanation?
- 2-4. Is our explanation sufficient to trust the model predictions? When is it sufficient? When is it not sufficient? What kinds of additional information would you include in your explanation?
- 2-5. Are explanations always necessary in order to understand the model performance? When do we need it? When do we not need it?
Stage 3. (Figma prototype)
- 3-1. List a few representative questions that your UI can possibly answer. Show us a walkthrough of how to resolve the questions. You may want to add screenshots of your UI.
Any useful resources?
The following resources should be useful for you in getting a sense of the background, methodology, and technical approaches we will be using.
- What-If Tool: https://pair-code.github.io/what-if-tool/
- Ribeiro et al., “"Why Should I Trust You?": Explaining the Predictions of Any Classifier”, Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 2016.
What do I submit?
You need to submit two things: (1) your .ipynb file exported from the CoLab notebook and (2) .pdf file that answers to the discussion questions. Note that you need to explicitly submit your code at this time. We highly encourage you to use resources such as code, figures, and statistical results to support your arguments in the discussion.
Grading
-
Stage 2 Implementation (50%)
-
Activity 1 / Task 1: Code for creating perturbed data (20%)
-
Activity 1 / Task 2: Code for fitting a linear model (15%)
-
Activity 1 / Task 3: Code for feature selection (15%)
-
Discussion (50%): further broken down to the following components
-
Completeness (10%): Are all the questions are answered with enough substance?
-
Depth (10%): Include thoughtful analysis beyond surface level
observations.
-
Clarity (10%): The reader who sees the example for the first time should not struggle to understand your points.
-
Visual Communication (10%): Use various resources such as numbers, figures, charts, external data, and statistics from the Google What-If Tool and the Colab code to effectively communicate your idea.
-
Conciseness (10%): Avoid being verbose in the description.
Please make sure to read the
academic integrity and collaboration policy of this course carefully.
How do I submit?
Submit your code (.ipynb file) and report (a PDF file) via KLMS.
Make sure that your report should include the URL of your prototype.