Artificial intelligence (AI) is a branch of computer science that dealswith the synthesis of human intelligence. To this end, the field draws fromvarious different disciplines including psychology, neurobiology, behavioralscience, and engineering. While the objective to construct an intelligent agentwho exhibits broad, creative problem-solving capabilities comparable to humansappears to be out for reach for the foreseeable future, AI applications arealready part of our everyday life. Popular examples include, but are notlimited to, intelligent assistants (e.g., Apple’s Siri or Amazon’s Alexa), objectrecognition (Instagram’s automated photo description), and intelligentrecommendations (e.g., Netflix’s movie recommender).
At their core, the most powerful AI applications such as Deep ConvolutionalNeural Networks, or Recurrent Neural Networks use large amounts of complextraining data to recognize hidden patterns and ultimately make highly accuratepredictions about uncertain (future) states. The high predictive performance ofstate-of-the-art machine learning models frequently comes at the expense of transparencyand interpretability of predictions, as machines fail to conveyhuman-interpretable information about why they come up with specific outcomes. Thatis the reason why machine learning applications are often labeled as blackboxes whose workings are neither entirely understood by expertdesigners nor human users. The lack of interpretability can be concerning forseveral different reasons.
First, opacity of machine generated outputs broadly creates accountability,responsibility, and auditing problems. This naturally impedes the possibilityto detect biased or discriminatory outcomes and renders navigating questionsabout liability difficult. Second, when human developers and users do notreceive explanations about the inner reasoning of AI applications, thisdeprives the opportunity to improve the system’s design, but also learn newinsights from the machine that can improve human problem-solving capabilities.The latter aspect in particular is a substantial obstacle to AI’s ability to enhanceeconomic efficiency and human welfare by revealing new domain knowledge hiddenin complex Big Data. Third, the black-box nature of machine learning applicationscan have a negative impact on people’s trust in its performance, eventually hamperingtheir acceptance.
The objective of eXplainable Artificial Intelligence (XAI) is to mitigatethe outlined problems associated with the black box nature by explaining theprocessing steps of the AI between input and output in a way that iscomprehensible to humans. There are several different approaches to crack openthe black box. On a high level, one can generally distinguish between (i)intrinsic explanatory methods, and (ii) post-hoc explanatory methods. Intrinsic methodseffectively are models that are inherently self-explanatory and provide animmediate human-readable interpretation about how they transform certain data inputsinto outputs. In other words, these are relatively simple models whose innerstructure humans can comprehend without additional transformations. Post-hocmethods, on the other hand, revolve around achieving the interpretability of agiven complex machine learning model via the construction of a second, simplermodel (called surrogate model) that approximates the behavior of the morecomplex model but is interpretable for humans.
Considering that AI applications are typically characterized by highscalability, it is encouraging that researchers, policy makers, andpractitioners alike are increasingly calling for standards regardingexplanations about how and why an AI application produces a specific output. Agrowing number of regulatory efforts, such as Europe’s General DataProtection Regulation or the Free Flow of Non-Personal Data Regulation,advocate that people interacting with AI applications, especially thoseaffected by them, have a right to explanation.
While the move toward fostering the interpretability of AI applicationsis arguably desirable from various points of views, there are also potentialdownsides. For one thing, rendering systems human interpretable may not alwaysbe possible without suffering considerable performance losses. In situationswhere the high accuracy of AI predictions is more important than hightransparency (e.g., the correct detection of cancer), making AI systemsinterpretable may be undesirable. Another issue thing is privacy protection.Making systems more interpretable may sometimes reveal sensitive (personal)data, which certain stakeholders may strictly refuse or which is even legallyprohibited. Generally, it is important to consider that explanations are alsonot necessarily correct and can be (intentionally) misleading. This may causeusers and targets to be more willing to rely and follow AI outputs, even thoughit is not correct. Relatedly, the observers of explanations may infer insightsabout the relation between the AI system’s in- and outputs that allows them togame the system (e.g., understanding how not to be detected when committing taxfraud) or adapt their perceptions in an undesirable way (e.g., perceiving thatgender is a determinant for a person’s propensity to work hard).
Finally, as with many methodologies and requirements one size will mostlikely not fit all with regards to explainability techniques. Differentstakeholders require different types of explanations. Developers andindividuals responsible for the maintenance of AI applications, for instance,require a more detailed explanation about the specific inner mathematicalcomputations than the end users who rather require high level explanationsabout the most relevant determinants of outputs.
For more information see: Bauer, K., Hinz, O.,van der Aalst, W., & Weinhardt, C. (2021). Expl (AI) n It to Me–Explainable AI andInformation Systems Research.