{"title":"Explaining AI through mechanistic interpretability","authors":"Lena Kästner, Barnaby Crook","doi":"10.1007/s13194-024-00614-4","DOIUrl":null,"url":null,"abstract":"<p>Recent work in explainable artificial intelligence (XAI) attempts to render opaque AI systems understandable through a divide-and-conquer strategy. However, this fails to illuminate <i>how trained AI systems work as a whole</i>. Precisely this kind of functional understanding is needed, though, to satisfy important societal desiderata such as safety. To remedy this situation, we argue, AI researchers should seek <i>mechanistic interpretability</i>, viz. apply coordinated discovery strategies familiar from the life sciences to uncover the functional organisation of complex AI systems. Additionally, theorists should accommodate for the unique costs and benefits of such strategies in their portrayals of XAI research.</p>","PeriodicalId":48832,"journal":{"name":"European Journal for Philosophy of Science","volume":"2 1","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Journal for Philosophy of Science","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1007/s13194-024-00614-4","RegionNum":1,"RegionCategory":"哲学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HISTORY & PHILOSOPHY OF SCIENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Recent work in explainable artificial intelligence (XAI) attempts to render opaque AI systems understandable through a divide-and-conquer strategy. However, this fails to illuminate how trained AI systems work as a whole. Precisely this kind of functional understanding is needed, though, to satisfy important societal desiderata such as safety. To remedy this situation, we argue, AI researchers should seek mechanistic interpretability, viz. apply coordinated discovery strategies familiar from the life sciences to uncover the functional organisation of complex AI systems. Additionally, theorists should accommodate for the unique costs and benefits of such strategies in their portrayals of XAI research.
期刊介绍:
The European Journal for Philosophy of Science publishes groundbreaking works that can deepen understanding of the concepts and methods of the sciences, as they explore increasingly many facets of the world we live in. It is of direct interest to philosophers of science coming from different perspectives, as well as scientists, citizens and policymakers. The journal is interested in articles from all traditions and all backgrounds, as long as they engage with the sciences in a constructive, and critical, way. The journal represents the various longstanding European philosophical traditions engaging with the sciences, but welcomes articles from every part of the world.