Murtaza Nasir, Nichalin S. Summerfield, Serhat Simsek, Asil Oztekin
{"title":"An interpretable machine learning methodology to generate interaction effect hypotheses from complex datasets","authors":"Murtaza Nasir, Nichalin S. Summerfield, Serhat Simsek, Asil Oztekin","doi":"10.1111/deci.12642","DOIUrl":null,"url":null,"abstract":"Machine learning (ML) models are increasingly being used in decision‐making, but they can be difficult to understand because most ML models are black boxes, meaning that their inner workings are not transparent. This can make interpreting the results of ML models and understanding the underlying data‐generation process (DGP) challenging. In this article, we propose a novel methodology called Simple Interaction Finding Technique (SIFT) that can help make ML models more interpretable. SIFT is a data‐ and model‐agnostic approach that can be used to identify interaction effects between variables in a dataset. This can help improve our understanding of the DGP and make ML models more transparent and explainable to a wider audience. We test the proposed methodology against various factors (such as ML model complexity, dataset noise, spurious variables, and variable distributions) to assess its effectiveness and weaknesses. We show that the methodology is robust against many potential problems in the underlying dataset as well as ML algorithms.","PeriodicalId":48256,"journal":{"name":"DECISION SCIENCES","volume":"54 1","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"DECISION SCIENCES","FirstCategoryId":"91","ListUrlMain":"https://doi.org/10.1111/deci.12642","RegionNum":4,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MANAGEMENT","Score":null,"Total":0}
引用次数: 0
Abstract
Machine learning (ML) models are increasingly being used in decision‐making, but they can be difficult to understand because most ML models are black boxes, meaning that their inner workings are not transparent. This can make interpreting the results of ML models and understanding the underlying data‐generation process (DGP) challenging. In this article, we propose a novel methodology called Simple Interaction Finding Technique (SIFT) that can help make ML models more interpretable. SIFT is a data‐ and model‐agnostic approach that can be used to identify interaction effects between variables in a dataset. This can help improve our understanding of the DGP and make ML models more transparent and explainable to a wider audience. We test the proposed methodology against various factors (such as ML model complexity, dataset noise, spurious variables, and variable distributions) to assess its effectiveness and weaknesses. We show that the methodology is robust against many potential problems in the underlying dataset as well as ML algorithms.
机器学习(ML)模型越来越多地被用于决策,但它们可能难以理解,因为大多数 ML 模型都是黑盒子,这意味着它们的内部运作并不透明。这就使得解释 ML 模型的结果和理解底层数据生成过程(DGP)具有挑战性。在本文中,我们提出了一种名为 "简单交互查找技术"(SIFT)的新方法,它有助于提高 ML 模型的可解释性。SIFT 是一种与数据和模型无关的方法,可用于识别数据集中变量之间的交互效应。这有助于提高我们对 DGP 的理解,使 ML 模型更加透明,更容易为更多人所解释。我们针对各种因素(如 ML 模型复杂性、数据集噪声、虚假变量和变量分布)测试了所提出的方法,以评估其有效性和弱点。我们表明,该方法对底层数据集和 ML 算法中的许多潜在问题都具有鲁棒性。
期刊介绍:
Decision Sciences, a premier journal of the Decision Sciences Institute, publishes scholarly research about decision making within the boundaries of an organization, as well as decisions involving inter-firm coordination. The journal promotes research advancing decision making at the interfaces of business functions and organizational boundaries. The journal also seeks articles extending established lines of work assuming the results of the research have the potential to substantially impact either decision making theory or industry practice. Ground-breaking research articles that enhance managerial understanding of decision making processes and stimulate further research in multi-disciplinary domains are particularly encouraged.