首页 > 最新文献

International Journal of Data Science and Analytics最新文献

英文 中文
Theoretical and practical data science and analytics: challenges and solutions 理论与实践数据科学与分析:挑战与解决方案
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-10-01 DOI: 10.1007/s41060-023-00465-x
Carson K. Leung, Gabriella Pasi, Li Wang
Big data have become a core technology for providing innovative solutions in numerical applications and services in many fields. Embedded in these big data is valuable information and knowledge. This calls for data science and analytics, which has emerged as an important paradigm for driving the new economy and domains (e.g., Internet of Things, social and mobile networks, cloud computing), reforming classic disciplines (e.g., telecommunications, biology, health and social science), as well as upgrading core business and economic activity. In this article, we focus on both theoretical and practical data science and analytics. We summarize and highlight some of its challenges and solutions, which are covered in the eight articles in the current Special Issue on "theoretical and practical data science and analytics."
大数据已成为为数值应用提供创新解决方案和服务的核心技术。这些大数据中蕴含着宝贵的信息和知识。这就需要数据科学和分析,它已经成为推动新经济和新领域(例如,物联网、社交和移动网络、云计算)、改革经典学科(例如,电信、生物学、卫生和社会科学)以及升级核心业务和经济活动的重要范例。在本文中,我们将重点介绍数据科学和分析的理论和实践。我们总结并强调了其中的一些挑战和解决方案,这些内容在本期“理论和实践数据科学与分析”特刊的八篇文章中有所介绍。
{"title":"Theoretical and practical data science and analytics: challenges and solutions","authors":"Carson K. Leung, Gabriella Pasi, Li Wang","doi":"10.1007/s41060-023-00465-x","DOIUrl":"https://doi.org/10.1007/s41060-023-00465-x","url":null,"abstract":"Big data have become a core technology for providing innovative solutions in numerical applications and services in many fields. Embedded in these big data is valuable information and knowledge. This calls for data science and analytics, which has emerged as an important paradigm for driving the new economy and domains (e.g., Internet of Things, social and mobile networks, cloud computing), reforming classic disciplines (e.g., telecommunications, biology, health and social science), as well as upgrading core business and economic activity. In this article, we focus on both theoretical and practical data science and analytics. We summarize and highlight some of its challenges and solutions, which are covered in the eight articles in the current Special Issue on \"theoretical and practical data science and analytics.\"","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135568940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A deep learning-based approach for identifying unresolved questions on Stack Exchange Q &A communities through graph-based communication modelling 一种基于深度学习的方法,通过基于图的通信建模来识别Stack Exchange Q &A社区中未解决的问题
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-09-30 DOI: 10.1007/s41060-023-00454-0
Hassan Abedi Firouzjaei
Abstract In recent years, online question–answer (Q &A) platforms, such as Stack Exchange (SE), have become increasingly popular for information and knowledge sharing. Despite the vast amount of information available on these platforms, many questions remain unresolved. In this work, we aim to address this issue by proposing a novel approach to identify unresolved questions in SE Q &A communities. Our approach utilises the graph structure of communication formed around a question by users to model the communication network surrounding it. We employ a property graph model and graph neural networks (GNNs), which can effectively capture both the structure of communication and the content of messages exchanged among users. By leveraging the power of graph representation and GNNs, our approach can effectively identify unresolved questions in SE communities. Experimental results on the complete historical data from three distinct Q &A communities demonstrate the superiority of our proposed approach over baseline methods that only consider the content of questions. Finally, our work represents a first but important step towards better understanding the factors that can affect questions becoming and remaining unresolved in SE communities.
近年来,以Stack Exchange (SE)为代表的在线问答(Q &A)平台在信息和知识共享方面越来越受欢迎。尽管这些平台上有大量的信息,但许多问题仍未得到解决。在这项工作中,我们的目标是通过提出一种新的方法来识别SE Q & a社区中未解决的问题来解决这个问题。我们的方法利用用户围绕问题形成的通信图结构来建模围绕该问题的通信网络。我们采用属性图模型和图神经网络(gnn),可以有效地捕获用户之间的通信结构和交换的消息内容。通过利用图表示和gnn的力量,我们的方法可以有效地识别SE社区中未解决的问题。来自三个不同问答社区的完整历史数据的实验结果表明,我们提出的方法优于仅考虑问题内容的基线方法。最后,我们的工作代表了第一步,但重要的一步,朝着更好地理解可能影响在东南社区中产生和保持未解决问题的因素。
{"title":"A deep learning-based approach for identifying unresolved questions on Stack Exchange Q &A communities through graph-based communication modelling","authors":"Hassan Abedi Firouzjaei","doi":"10.1007/s41060-023-00454-0","DOIUrl":"https://doi.org/10.1007/s41060-023-00454-0","url":null,"abstract":"Abstract In recent years, online question–answer (Q &A) platforms, such as Stack Exchange (SE), have become increasingly popular for information and knowledge sharing. Despite the vast amount of information available on these platforms, many questions remain unresolved. In this work, we aim to address this issue by proposing a novel approach to identify unresolved questions in SE Q &A communities. Our approach utilises the graph structure of communication formed around a question by users to model the communication network surrounding it. We employ a property graph model and graph neural networks (GNNs), which can effectively capture both the structure of communication and the content of messages exchanged among users. By leveraging the power of graph representation and GNNs, our approach can effectively identify unresolved questions in SE communities. Experimental results on the complete historical data from three distinct Q &A communities demonstrate the superiority of our proposed approach over baseline methods that only consider the content of questions. Finally, our work represents a first but important step towards better understanding the factors that can affect questions becoming and remaining unresolved in SE communities.","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136341742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Duo satellite-based remotely sensed land surface temperature prediction by various methods of machine learning 基于多卫星的地表温度遥感预测
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-09-30 DOI: 10.1007/s41060-023-00459-9
Shivam Chauhan, Ajay Singh Jethoo, Ajay Mishra, Vaibhav Varshney
{"title":"Duo satellite-based remotely sensed land surface temperature prediction by various methods of machine learning","authors":"Shivam Chauhan, Ajay Singh Jethoo, Ajay Mishra, Vaibhav Varshney","doi":"10.1007/s41060-023-00459-9","DOIUrl":"https://doi.org/10.1007/s41060-023-00459-9","url":null,"abstract":"","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136279790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A new regression model for count data with applications to health care data 一种新的计数数据回归模型及其在医疗保健数据中的应用
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-09-25 DOI: 10.1007/s41060-023-00453-1
Muneeb Ahmad Wani, Peer Bilal Ahmad, Bilal Ahmad Para, Na Elah
{"title":"A new regression model for count data with applications to health care data","authors":"Muneeb Ahmad Wani, Peer Bilal Ahmad, Bilal Ahmad Para, Na Elah","doi":"10.1007/s41060-023-00453-1","DOIUrl":"https://doi.org/10.1007/s41060-023-00453-1","url":null,"abstract":"","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135816989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Graph construction on complex spatiotemporal data for enhancing graph neural network-based approaches 基于图神经网络的复杂时空数据图构建方法
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-09-25 DOI: 10.1007/s41060-023-00452-2
Stefan Bloemheuvel, Jurgen van den Hoogen, Martin Atzmueller
Abstract Graph neural networks (GNNs) haven proven to be an indispensable approach in modeling complex data, in particular spatial temporal data, e.g., relating to sensor data given as time series with according spatial information. Although GNNs provide powerful modeling capabilities on such kind of data, they require adequate input data in terms of both signal and the underlying graph structures. However, typically the according graphs are not automatically available or even predefined, such that typically an ad hoc graph representation needs to be constructed. However, often the construction of the underlying graph structure is given insufficient attention. Therefore, this paper performs an in-depth analysis of several methods for constructing graphs from a set of sensors attributed with spatial information, i.e., geographical coordinates, or using their respective attached signal data. We apply a diverse set of standard methods for estimating groups and similarities between graph nodes as location-based as well as signal-driven approaches on multiple benchmark datasets for evaluation and assessment. Here, for both areas, we specifically include distance-based, clustering-based, as well as correlation-based approaches for estimating the relationships between nodes for subsequent graph construction. In addition, we consider two different GNN approaches, i.e., regression and forecasting in order to enable a broader experimental assessment. Typically, no predefined graph is given, such that (ad hoc) graph creation is necessary. Here, our results indicate the criticality of factoring in the crucial step of graph construction into GNN-based research on spatial temporal data. Overall, in our experimentation no single approach for graph construction emerged as a clear winner. However, in our analysis we are able to provide specific indications based on the obtained results, for a specific class of methods. Collectively, the findings highlight the need for researchers to carefully consider graph construction when employing GNNs in the analysis of spatial temporal data.
图神经网络(gnn)已被证明是复杂数据建模中不可或缺的方法,特别是时空数据,例如,与传感器数据相关的时间序列具有相应的空间信息。尽管gnn在这类数据上提供了强大的建模能力,但它们在信号和底层图结构方面都需要足够的输入数据。然而,通常情况下,相应的图不是自动可用的,甚至不是预定义的,因此通常需要构造一个特别的图表示。然而,底层图结构的构造往往没有得到足够的重视。因此,本文深入分析了从一组具有空间信息(即地理坐标)的传感器或使用其各自附加的信号数据构建图形的几种方法。我们应用了一套不同的标准方法来估计图节点之间的组和相似性,作为基于位置的方法和信号驱动的方法,用于多个基准数据集的评估和评估。在这里,对于这两个领域,我们特别包括基于距离的,基于聚类的,以及基于相关性的方法来估计节点之间的关系,以便后续的图构建。此外,我们考虑了两种不同的GNN方法,即回归和预测,以便进行更广泛的实验评估。通常,没有给出预定义的图,因此(特别的)图创建是必要的。在这里,我们的研究结果表明,在基于gnn的时空数据研究中,将图构建的关键步骤纳入其中是至关重要的。总的来说,在我们的实验中,没有一种图构建方法是明显的赢家。然而,在我们的分析中,我们能够根据获得的结果,为特定类别的方法提供特定的适应症。总的来说,这些发现强调了研究人员在使用gnn分析时空数据时需要仔细考虑图的构建。
{"title":"Graph construction on complex spatiotemporal data for enhancing graph neural network-based approaches","authors":"Stefan Bloemheuvel, Jurgen van den Hoogen, Martin Atzmueller","doi":"10.1007/s41060-023-00452-2","DOIUrl":"https://doi.org/10.1007/s41060-023-00452-2","url":null,"abstract":"Abstract Graph neural networks (GNNs) haven proven to be an indispensable approach in modeling complex data, in particular spatial temporal data, e.g., relating to sensor data given as time series with according spatial information. Although GNNs provide powerful modeling capabilities on such kind of data, they require adequate input data in terms of both signal and the underlying graph structures. However, typically the according graphs are not automatically available or even predefined, such that typically an ad hoc graph representation needs to be constructed. However, often the construction of the underlying graph structure is given insufficient attention. Therefore, this paper performs an in-depth analysis of several methods for constructing graphs from a set of sensors attributed with spatial information, i.e., geographical coordinates, or using their respective attached signal data. We apply a diverse set of standard methods for estimating groups and similarities between graph nodes as location-based as well as signal-driven approaches on multiple benchmark datasets for evaluation and assessment. Here, for both areas, we specifically include distance-based, clustering-based, as well as correlation-based approaches for estimating the relationships between nodes for subsequent graph construction. In addition, we consider two different GNN approaches, i.e., regression and forecasting in order to enable a broader experimental assessment. Typically, no predefined graph is given, such that (ad hoc) graph creation is necessary. Here, our results indicate the criticality of factoring in the crucial step of graph construction into GNN-based research on spatial temporal data. Overall, in our experimentation no single approach for graph construction emerged as a clear winner. However, in our analysis we are able to provide specific indications based on the obtained results, for a specific class of methods. Collectively, the findings highlight the need for researchers to carefully consider graph construction when employing GNNs in the analysis of spatial temporal data.","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135816920","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A machine learning approach to predict geomechanical properties of rocks from well logs 从测井资料中预测岩石地质力学性质的机器学习方法
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-09-21 DOI: 10.1007/s41060-023-00451-3
None Rohit, Shri Ram Manda, Aditya Raj, Nagababu Andraju
{"title":"A machine learning approach to predict geomechanical properties of rocks from well logs","authors":"None Rohit, Shri Ram Manda, Aditya Raj, Nagababu Andraju","doi":"10.1007/s41060-023-00451-3","DOIUrl":"https://doi.org/10.1007/s41060-023-00451-3","url":null,"abstract":"","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136154592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A new generalization of the zero-truncated negative binomial distribution by a Lagrange expansion with associated regression model and applications 零截断负二项分布的拉格朗日展开式新推广及其相关的回归模型和应用
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-09-16 DOI: 10.1007/s41060-023-00449-x
Mohanan Monisha, Radhakumari Maya, Muhammed Rasheed Irshad, Christophe Chesneau, Damodaran Santhamani Shibu
{"title":"A new generalization of the zero-truncated negative binomial distribution by a Lagrange expansion with associated regression model and applications","authors":"Mohanan Monisha, Radhakumari Maya, Muhammed Rasheed Irshad, Christophe Chesneau, Damodaran Santhamani Shibu","doi":"10.1007/s41060-023-00449-x","DOIUrl":"https://doi.org/10.1007/s41060-023-00449-x","url":null,"abstract":"","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135307743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic identification of rank correlation between image sequences 图像序列间秩相关的自动识别
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-09-15 DOI: 10.1007/s41060-023-00450-4
Lior Shamir
{"title":"Automatic identification of rank correlation between image sequences","authors":"Lior Shamir","doi":"10.1007/s41060-023-00450-4","DOIUrl":"https://doi.org/10.1007/s41060-023-00450-4","url":null,"abstract":"","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"2013 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135436799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Study of violence against women and its characteristics through the application of text mining techniques 通过应用文本挖掘技术研究对妇女的暴力行为及其特征
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-09-14 DOI: 10.1007/s41060-023-00448-y
E. M. A. Stephanie, L. G. B. Ruiz, M. A. Vila, M. C. Pegalajar
The Internet provides a wide variety of information that can be collected and studied, creating a massive data repository. Among the data available on the Internet, we can find articles about Violence against Women (VAW) published in the digital press, which are of great societal interest. In this work, we utilized Web scraping techniques to gather VAW-related news from the internet. Applying Text Mining techniques, we conducted a study on VAW and its characteristics. Our work comprises an exploratory analysis and the application of Topic Modelling to VAW events to identify latent topics and their semantic structures. We employed classification algorithms on a set of VAW press articles to determine the type of violence they refer to, namely physical, psychological, sexual, or a combination of them. We proposed two methodologies to target the data: the first one is based on dictionaries of VAW types, while the second approach extends the former by using the predominant violence to identify other associated types. Furthermore, we implemented two feature selection techniques: TF-IDF and $${Chi}^{2}$$ . Then, we applied Support Vector Machine, Decision Tree, Bayesian Networks, XGBoost Classifier, Random Forest, and Artificial Neural Networks. The results obtained showed that the classifiers achieved better performance when using $${Chi}^{2}$$ . The Boost Classifier demonstrated the best performance, followed by Random Forest.
Internet提供了可以收集和研究的各种各样的信息,从而创建了一个庞大的数据存储库。在互联网上可获得的数据中,我们可以找到在数字媒体上发表的关于暴力侵害妇女行为的文章,这些文章引起了社会的极大兴趣。在这项工作中,我们利用Web抓取技术从互联网上收集与vaw相关的新闻。应用文本挖掘技术,对VAW及其特征进行了研究。我们的工作包括探索性分析和主题建模对VAW事件的应用,以识别潜在主题及其语义结构。我们对一组VAW新闻文章使用分类算法来确定它们所指的暴力类型,即身体暴力、心理暴力、性暴力或它们的组合。我们提出了两种方法来定位数据:第一种方法基于VAW类型的字典,而第二种方法通过使用主要暴力来识别其他相关类型来扩展前者。此外,我们实现了两种特征选择技术:TF-IDF和$${Chi}^{2}$$。然后,我们应用了支持向量机、决策树、贝叶斯网络、XGBoost分类器、随机森林和人工神经网络。结果表明,当使用$${Chi}^{2}$$时,分类器获得了更好的性能。Boost分类器表现出最好的性能,其次是Random Forest。
{"title":"Study of violence against women and its characteristics through the application of text mining techniques","authors":"E. M. A. Stephanie, L. G. B. Ruiz, M. A. Vila, M. C. Pegalajar","doi":"10.1007/s41060-023-00448-y","DOIUrl":"https://doi.org/10.1007/s41060-023-00448-y","url":null,"abstract":"The Internet provides a wide variety of information that can be collected and studied, creating a massive data repository. Among the data available on the Internet, we can find articles about Violence against Women (VAW) published in the digital press, which are of great societal interest. In this work, we utilized Web scraping techniques to gather VAW-related news from the internet. Applying Text Mining techniques, we conducted a study on VAW and its characteristics. Our work comprises an exploratory analysis and the application of Topic Modelling to VAW events to identify latent topics and their semantic structures. We employed classification algorithms on a set of VAW press articles to determine the type of violence they refer to, namely physical, psychological, sexual, or a combination of them. We proposed two methodologies to target the data: the first one is based on dictionaries of VAW types, while the second approach extends the former by using the predominant violence to identify other associated types. Furthermore, we implemented two feature selection techniques: TF-IDF and $${Chi}^{2}$$ . Then, we applied Support Vector Machine, Decision Tree, Bayesian Networks, XGBoost Classifier, Random Forest, and Artificial Neural Networks. The results obtained showed that the classifiers achieved better performance when using $${Chi}^{2}$$ . The Boost Classifier demonstrated the best performance, followed by Random Forest.","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134912231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Through the looking glass: evaluating post hoc explanations using transparent models 透过镜子:使用透明模型评估事后解释
Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2023-09-12 DOI: 10.1007/s41060-023-00445-1
Mythreyi Velmurugan, Chun Ouyang, Renuka Sindhgatta, Catarina Moreira
Abstract Modern machine learning methods allow for complex and in-depth analytics, but the predictive models generated by these methods are often highly complex and lack transparency. Explainable Artificial Intelligence (XAI) methods are used to improve the interpretability of these complex “black box” models, thereby increasing transparency and enabling informed decision-making. However, the inherent fitness of these explainable methods, particularly the faithfulness of explanations to the decision-making processes of the model, can be hard to evaluate. In this work, we examine and evaluate the explanations provided by four XAI methods, using fully transparent “glass box” models trained on tabular data. Our results suggest that the fidelity of explanations is determined by the types of variables used, as well as the linearity of the relationship between variables and model prediction. We find that each XAI method evaluated has its own strengths and weaknesses, determined by the assumptions inherent in the explanation mechanism. Thus, though such methods are model-agnostic, we find significant differences in explanation quality across different technical setups. Given the numerous factors that determine the quality of explanations, including the specific explanation-generation procedures implemented by XAI methods, we suggest that model-agnostic XAI methods may still require expert guidance for implementation.
现代机器学习方法允许进行复杂和深入的分析,但这些方法生成的预测模型通常非常复杂且缺乏透明度。可解释的人工智能(XAI)方法用于提高这些复杂的“黑箱”模型的可解释性,从而提高透明度并实现明智的决策。然而,这些可解释方法的固有适应性,特别是对模型决策过程的解释的忠实性,可能很难评估。在这项工作中,我们使用基于表格数据训练的完全透明的“玻璃盒”模型,检查和评估了四种XAI方法提供的解释。我们的研究结果表明,解释的保真度取决于所使用的变量类型,以及变量与模型预测之间的线性关系。我们发现每个评估的XAI方法都有自己的优点和缺点,这是由解释机制中固有的假设决定的。因此,尽管这些方法是模型不可知的,但我们发现不同技术设置的解释质量存在显着差异。考虑到决定解释质量的众多因素,包括由XAI方法实现的特定解释生成过程,我们建议与模型无关的XAI方法可能仍然需要专家指导才能实现。
{"title":"Through the looking glass: evaluating post hoc explanations using transparent models","authors":"Mythreyi Velmurugan, Chun Ouyang, Renuka Sindhgatta, Catarina Moreira","doi":"10.1007/s41060-023-00445-1","DOIUrl":"https://doi.org/10.1007/s41060-023-00445-1","url":null,"abstract":"Abstract Modern machine learning methods allow for complex and in-depth analytics, but the predictive models generated by these methods are often highly complex and lack transparency. Explainable Artificial Intelligence (XAI) methods are used to improve the interpretability of these complex “black box” models, thereby increasing transparency and enabling informed decision-making. However, the inherent fitness of these explainable methods, particularly the faithfulness of explanations to the decision-making processes of the model, can be hard to evaluate. In this work, we examine and evaluate the explanations provided by four XAI methods, using fully transparent “glass box” models trained on tabular data. Our results suggest that the fidelity of explanations is determined by the types of variables used, as well as the linearity of the relationship between variables and model prediction. We find that each XAI method evaluated has its own strengths and weaknesses, determined by the assumptions inherent in the explanation mechanism. Thus, though such methods are model-agnostic, we find significant differences in explanation quality across different technical setups. Given the numerous factors that determine the quality of explanations, including the specific explanation-generation procedures implemented by XAI methods, we suggest that model-agnostic XAI methods may still require expert guidance for implementation.","PeriodicalId":45667,"journal":{"name":"International Journal of Data Science and Analytics","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135878622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
International Journal of Data Science and Analytics
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1