Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery最新文献

Research on mining software repositories to facilitate refactoring 挖掘软件存储库以促进重构的研究

IF 7.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

Pub Date : 2023-05-22 DOI: 10.1002/widm.1508

Ally S. Nyamawe

Software refactoring focuses on improving software quality by applying changes to the internal structure that do not alter the observable behavior. Determining which refactorings should be applied and presented to developers the most relevant and optimal refactorings is often challenging. Existing literature suggests that one of the potential sources to identify and recommend required refactorings is the past software development and evolution histories which are often archived in software repositories. In this article, we review a selection of existing literature that has attempted to propose approaches that facilitate refactoring by exploiting information mined from software repositories. Based on the reviewed papers, existing works leverage software history mining to support analysis of code smells, refactoring, and guiding software changes. First, past history information is used to detect design flaws in source code commonly referred to as code smells. Moreover, other studies analyze the evolution of code smells to establish how and when they are introduced into the code base and get resolved. Second, software repositories mining provides useful insights that can be used in predicting the need for refactoring and what specific refactoring operations are required. In addition, past history can be used in detecting and analyzing previously applied refactorings to establish software change facts, for instance, how developers refactor code and the motivation behind it. Finally, change patterns are used to predict further changes that might be required and recommend a set of files for change during a given modification task. The paper further suggests other exciting possibilities that can be pursued in the future in this research direction.

软件重构着重于通过对内部结构进行更改而不改变可观察到的行为来提高软件质量。确定应该应用哪些重构，并将最相关和最优的重构呈现给开发人员，通常是具有挑战性的。现有文献表明，识别和推荐所需重构的潜在来源之一是过去的软件开发和演化历史，这些历史通常存档在软件存储库中。在本文中，我们回顾了一些现有文献，这些文献试图提出通过利用从软件存储库中挖掘的信息来促进重构的方法。根据已审阅的论文，现有的工作利用软件历史挖掘来支持代码气味分析、重构和指导软件更改。首先，过去的历史信息用于检测源代码中的设计缺陷，通常称为代码气味。此外，其他研究分析了代码气味的演变，以确定它们如何以及何时被引入代码库并得到解决。其次，软件存储库挖掘提供了有用的见解，可用于预测重构需求以及需要哪些特定的重构操作。此外，过去的历史可以用于检测和分析以前应用的重构，以建立软件变更事实，例如，开发人员如何重构代码及其背后的动机。最后，更改模式用于预测可能需要的进一步更改，并在给定的修改任务期间为更改推荐一组文件。论文进一步提出了未来在这一研究方向上可以追求的其他令人兴奋的可能性。

{"title":"Research on mining software repositories to facilitate refactoring","authors":"Ally S. Nyamawe","doi":"10.1002/widm.1508","DOIUrl":"https://doi.org/10.1002/widm.1508","url":null,"abstract":"Software refactoring focuses on improving software quality by applying changes to the internal structure that do not alter the observable behavior. Determining which refactorings should be applied and presented to developers the most relevant and optimal refactorings is often challenging. Existing literature suggests that one of the potential sources to identify and recommend required refactorings is the past software development and evolution histories which are often archived in software repositories. In this article, we review a selection of existing literature that has attempted to propose approaches that facilitate refactoring by exploiting information mined from software repositories. Based on the reviewed papers, existing works leverage software history mining to support analysis of code smells, refactoring, and guiding software changes. First, past history information is used to detect design flaws in source code commonly referred to as code smells. Moreover, other studies analyze the evolution of code smells to establish how and when they are introduced into the code base and get resolved. Second, software repositories mining provides useful insights that can be used in predicting the need for refactoring and what specific refactoring operations are required. In addition, past history can be used in detecting and analyzing previously applied refactorings to establish software change facts, for instance, how developers refactor code and the motivation behind it. Finally, change patterns are used to predict further changes that might be required and recommend a set of files for change during a given modification task. The paper further suggests other exciting possibilities that can be pursued in the future in this research direction.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"130 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90643900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Use of artificial intelligence algorithms to predict systemic diseases from retinal images 利用人工智能算法从视网膜图像预测全身性疾病

IF 7.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

Pub Date : 2023-05-16 DOI: 10.1002/widm.1506

R. Khan, Janani Surya, Maitreyee Roy, M. S. Swathi Priya, Sashwanthi Mohan, S. Raman, Akshay Raman, Abhishek Vyas, R. Raman

The rise of non‐invasive, rapid, and widely accessible quantitative high‐resolution imaging methods, such as modern retinal photography and optical coherence tomography (OCT), has significantly impacted ophthalmology. These techniques offer remarkable accuracy and resolution in assessing ocular diseases and are increasingly recognized for their potential in identifying ocular biomarkers of systemic diseases. The application of artificial intelligence (AI) has been demonstrated to have promising results in identifying age, gender, systolic blood pressure, smoking status, and assessing cardiovascular disorders from the fundus and OCT images. Although our understanding of eye–body relationships has advanced from decades of conventional statistical modeling in large population‐based studies incorporating ophthalmic assessments, the application of AI to this field is still in its early stages. In this review article, we concentrate on the areas where AI‐based investigations could expand on existing conventional analyses to produce fresh findings using retinal biomarkers of systemic diseases. Five databases—Medline, Scopus, PubMed, Google Scholar, and Web of Science were searched using terms related to ocular imaging, systemic diseases, and artificial intelligence characteristics. Our review found that AI has been employed in a wide range of clinical tests and research applications, primarily for disease prediction, finding biomarkers and risk factor identification. We envisage artificial intelligence‐based models to have significant clinical and research impacts in the future through screening for high‐risk individuals, particularly in less developed areas, and identifying new retinal biomarkers, even though technical and socioeconomic challenges remain. Further research is needed to validate these models in real‐world setting.

非侵入性、快速和广泛使用的定量高分辨率成像方法的兴起，如现代视网膜摄影和光学相干断层扫描(OCT)，对眼科产生了重大影响。这些技术在评估眼部疾病方面提供了显著的准确性和分辨率，并越来越多地认识到它们在识别全身性疾病的眼部生物标志物方面的潜力。人工智能(AI)的应用已被证明在识别年龄、性别、收缩压、吸烟状况以及从眼底和OCT图像评估心血管疾病方面具有良好的效果。虽然我们对眼身关系的理解已经从几十年来基于大量人群的传统统计建模研究中得到了发展，但人工智能在这一领域的应用仍处于早期阶段。在这篇综述文章中，我们集中讨论了基于人工智能的研究可以扩展现有传统分析的领域，从而利用系统性疾病的视网膜生物标志物产生新的发现。五个数据库- medline, Scopus, PubMed，谷歌Scholar和Web of Science使用与眼部成像，全身性疾病和人工智能特征相关的术语进行了搜索。我们的审查发现，人工智能已广泛应用于临床试验和研究应用，主要用于疾病预测、寻找生物标志物和风险因素识别。尽管技术和社会经济挑战依然存在，但我们设想基于人工智能的模型通过筛查高风险个体，特别是在欠发达地区，以及识别新的视网膜生物标志物，在未来对临床和研究产生重大影响。需要进一步的研究来验证这些模型在现实世界的设置。

{"title":"Use of artificial intelligence algorithms to predict systemic diseases from retinal images","authors":"R. Khan, Janani Surya, Maitreyee Roy, M. S. Swathi Priya, Sashwanthi Mohan, S. Raman, Akshay Raman, Abhishek Vyas, R. Raman","doi":"10.1002/widm.1506","DOIUrl":"https://doi.org/10.1002/widm.1506","url":null,"abstract":"The rise of non‐invasive, rapid, and widely accessible quantitative high‐resolution imaging methods, such as modern retinal photography and optical coherence tomography (OCT), has significantly impacted ophthalmology. These techniques offer remarkable accuracy and resolution in assessing ocular diseases and are increasingly recognized for their potential in identifying ocular biomarkers of systemic diseases. The application of artificial intelligence (AI) has been demonstrated to have promising results in identifying age, gender, systolic blood pressure, smoking status, and assessing cardiovascular disorders from the fundus and OCT images. Although our understanding of eye–body relationships has advanced from decades of conventional statistical modeling in large population‐based studies incorporating ophthalmic assessments, the application of AI to this field is still in its early stages. In this review article, we concentrate on the areas where AI‐based investigations could expand on existing conventional analyses to produce fresh findings using retinal biomarkers of systemic diseases. Five databases—Medline, Scopus, PubMed, Google Scholar, and Web of Science were searched using terms related to ocular imaging, systemic diseases, and artificial intelligence characteristics. Our review found that AI has been employed in a wide range of clinical tests and research applications, primarily for disease prediction, finding biomarkers and risk factor identification. We envisage artificial intelligence‐based models to have significant clinical and research impacts in the future through screening for high‐risk individuals, particularly in less developed areas, and identifying new retinal biomarkers, even though technical and socioeconomic challenges remain. Further research is needed to validate these models in real‐world setting.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"30 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2023-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88036615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The benefits and dangers of using machine learning to support making legal predictions 使用机器学习来支持法律预测的好处和危险

IF 7.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

Pub Date : 2023-05-11 DOI: 10.1002/widm.1505

John Zeleznikow

Rule‐based systems have been used in the legal domain since the 1970s. Save for rare exceptions, machine learning has only recently been used. But why this delay? We investigate the appropriate use of machine learning to support and make legal predictions. To do so, we need to examine the appropriate use of data in global legal domains—including in common law, civil law, and hybrid jurisdictions. The use of various forms of Artificial Intelligence, including rule‐based reasoning, case‐based reasoning and machine learning in law requires an understanding of jurisprudential theories. We will see that the use of machine learning is particularly appropriate for non‐professionals: in particular self‐represented litigants or those relying upon legal aid services. The primary use of machine learning to support decision‐making in legal domains has been in criminal detection, financial domains, and sentencing. The use in these areas has led to concerns that the inappropriate use of Artificial Intelligence leads to biased decision making. This requires us to examine concerns about governance and ethics. Ethical concerns can be minimized by providing enhanced explanation, choosing appropriate data to be used, appropriately cleaning that data, and having human reviews of any decisions.

自20世纪70年代以来，基于规则的系统一直用于法律领域。除了极少数例外，机器学习直到最近才被使用。但为什么会延迟呢?我们研究了机器学习的适当使用，以支持和做出法律预测。为此，我们需要检查全球法律领域(包括普通法、民法和混合司法管辖区)对数据的适当使用。在法律中使用各种形式的人工智能，包括基于规则的推理、基于案例的推理和机器学习，需要理解法理学理论。我们将看到，机器学习的使用特别适合非专业人士:特别是那些自我代表的诉讼当事人或依赖法律援助服务的人。机器学习在法律领域支持决策的主要用途是刑事侦查、金融领域和量刑。这些领域的使用引发了人们的担忧，即人工智能的不当使用会导致有偏见的决策。这就要求我们审视对治理和道德的关注。伦理问题可以通过提供增强的解释、选择要使用的适当数据、适当地清理数据以及对任何决定进行人工审查来最小化。

{"title":"The benefits and dangers of using machine learning to support making legal predictions","authors":"John Zeleznikow","doi":"10.1002/widm.1505","DOIUrl":"https://doi.org/10.1002/widm.1505","url":null,"abstract":"Rule‐based systems have been used in the legal domain since the 1970s. Save for rare exceptions, machine learning has only recently been used. But why this delay? We investigate the appropriate use of machine learning to support and make legal predictions. To do so, we need to examine the appropriate use of data in global legal domains—including in common law, civil law, and hybrid jurisdictions. The use of various forms of Artificial Intelligence, including rule‐based reasoning, case‐based reasoning and machine learning in law requires an understanding of jurisprudential theories. We will see that the use of machine learning is particularly appropriate for non‐professionals: in particular self‐represented litigants or those relying upon legal aid services. The primary use of machine learning to support decision‐making in legal domains has been in criminal detection, financial domains, and sentencing. The use in these areas has led to concerns that the inappropriate use of Artificial Intelligence leads to biased decision making. This requires us to examine concerns about governance and ethics. Ethical concerns can be minimized by providing enhanced explanation, choosing appropriate data to be used, appropriately cleaning that data, and having human reviews of any decisions.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"5 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2023-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80828066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sports analytics review: Artificial intelligence applications, emerging technologies, and algorithmic perspective 体育分析综述:人工智能应用、新兴技术和算法视角

IF 7.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

Pub Date : 2023-03-21 DOI: 10.1002/widm.1496

Indrajeet Ghosh, Sreenivasan Ramasamy Ramamurthy, Avijoy Chakma, Nirmalya Roy

The rapid and impromptu interest in the coupling of machine learning (ML) algorithms with wearable and contactless sensors aimed at tackling real‐world problems warrants a pedagogical study to understand all the aspects of this research direction. Considering this aspect, this survey aims to review the state‐of‐the‐art literature on ML algorithms, methodologies, and hypotheses adopted to solve the research problems and challenges in the domain of sports. First, we categorize this study into three main research fields: sensors, computer vision, and wireless and mobile‐based applications. Then, for each of these fields, we thoroughly analyze the systems that are deployable for real‐time sports analytics. Next, we meticulously discuss the learning algorithms (e.g., statistical learning, deep learning, reinforcement learning) that power those deployable systems while also comparing and contrasting the benefits of those learning methodologies. Finally, we highlight the possible future open‐research opportunities and emerging technologies that could contribute to the domain of sports analytics.

对机器学习(ML)算法与可穿戴和非接触式传感器的耦合的快速和即兴的兴趣，旨在解决现实世界的问题，需要进行教学研究，以了解这一研究方向的所有方面。考虑到这方面，本调查旨在回顾关于ML算法、方法和假设的最新文献，以解决体育领域的研究问题和挑战。首先，我们将本研究分为三个主要研究领域:传感器、计算机视觉以及基于无线和移动的应用。然后，对于每个领域，我们都彻底分析了可用于实时体育分析的系统。接下来，我们仔细讨论了为这些可部署系统提供动力的学习算法(例如，统计学习，深度学习，强化学习)，同时也比较和对比了这些学习方法的好处。最后，我们强调了未来可能的开放研究机会和新兴技术，这些技术可能有助于体育分析领域。

{"title":"Sports analytics review: Artificial intelligence applications, emerging technologies, and algorithmic perspective","authors":"Indrajeet Ghosh, Sreenivasan Ramasamy Ramamurthy, Avijoy Chakma, Nirmalya Roy","doi":"10.1002/widm.1496","DOIUrl":"https://doi.org/10.1002/widm.1496","url":null,"abstract":"The rapid and impromptu interest in the coupling of machine learning (ML) algorithms with wearable and contactless sensors aimed at tackling real‐world problems warrants a pedagogical study to understand all the aspects of this research direction. Considering this aspect, this survey aims to review the state‐of‐the‐art literature on ML algorithms, methodologies, and hypotheses adopted to solve the research problems and challenges in the domain of sports. First, we categorize this study into three main research fields: sensors, computer vision, and wireless and mobile‐based applications. Then, for each of these fields, we thoroughly analyze the systems that are deployable for real‐time sports analytics. Next, we meticulously discuss the learning algorithms (e.g., statistical learning, deep learning, reinforcement learning) that power those deployable systems while also comparing and contrasting the benefits of those learning methodologies. Finally, we highlight the possible future open‐research opportunities and emerging technologies that could contribute to the domain of sports analytics.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"1 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2023-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79200830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

ExplainFix: Explainable spatially fixed deep networks ExplainFix:可解释的空间固定深度网络

IF 7.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

Pub Date : 2023-03-18 DOI: 10.1002/widm.1483

Alex Gaudio, C. Faloutsos, A. Smailagic, P. Costa, A. Campilho

Is there an initialization for deep networks that requires no learning? ExplainFix adopts two design principles: the “fixed filters” principle that all spatial filter weights of convolutional neural networks can be fixed at initialization and never learned, and the “nimbleness” principle that only few network parameters suffice. We contribute (a) visual model‐based explanations, (b) speed and accuracy gains, and (c) novel tools for deep convolutional neural networks. ExplainFix gives key insights that spatially fixed networks should have a steered initialization, that spatial convolution layers tend to prioritize low frequencies, and that most network parameters are not necessary in spatially fixed models. ExplainFix models have up to ×100 fewer spatial filter kernels than fully learned models and matching or improved accuracy. Our extensive empirical analysis confirms that ExplainFix guarantees nimbler models (train up to 17% faster with channel pruning), matching or improved predictive performance (spanning 13 distinct baseline models, four architectures and two medical image datasets), improved robustness to larger learning rate, and robustness to varying model size. We are first to demonstrate that all spatial filters in state‐of‐the‐art convolutional deep networks can be fixed at initialization, not learned.

是否存在不需要学习的深度网络初始化?ExplainFix采用两种设计原则:一种是“固定滤波器”原则，即卷积神经网络的所有空间滤波器权重在初始化时都是固定的，无需学习;另一种是“灵活”原则，即仅需少量网络参数即可。我们贡献了(a)基于视觉模型的解释，(b)速度和准确性的提高，以及(c)深度卷积神经网络的新工具。ExplainFix给出了关键的见解，即空间固定的网络应该有一个定向初始化，空间卷积层倾向于优先考虑低频，并且大多数网络参数在空间固定模型中是不必要的。与完全学习的模型相比，ExplainFix模型的空间过滤核数最多可达×100，并且匹配或提高了精度。我们广泛的实证分析证实，ExplainFix保证了更灵活的模型(通过通道修剪将训练速度提高17%)，匹配或改进的预测性能(跨越13个不同的基线模型，四种架构和两个医学图像数据集)，提高了对更大学习率的鲁棒性，以及对不同模型大小的鲁棒性。我们首先证明了最先进的卷积深度网络中的所有空间滤波器都可以在初始化时固定，而不是学习。

{"title":"ExplainFix: Explainable spatially fixed deep networks","authors":"Alex Gaudio, C. Faloutsos, A. Smailagic, P. Costa, A. Campilho","doi":"10.1002/widm.1483","DOIUrl":"https://doi.org/10.1002/widm.1483","url":null,"abstract":"Is there an initialization for deep networks that requires no learning? ExplainFix adopts two design principles: the “fixed filters” principle that all spatial filter weights of convolutional neural networks can be fixed at initialization and never learned, and the “nimbleness” principle that only few network parameters suffice. We contribute (a) visual model‐based explanations, (b) speed and accuracy gains, and (c) novel tools for deep convolutional neural networks. ExplainFix gives key insights that spatially fixed networks should have a steered initialization, that spatial convolution layers tend to prioritize low frequencies, and that most network parameters are not necessary in spatially fixed models. ExplainFix models have up to ×100 fewer spatial filter kernels than fully learned models and matching or improved accuracy. Our extensive empirical analysis confirms that ExplainFix guarantees nimbler models (train up to 17% faster with channel pruning), matching or improved predictive performance (spanning 13 distinct baseline models, four architectures and two medical image datasets), improved robustness to larger learning rate, and robustness to varying model size. We are first to demonstrate that all spatial filters in state‐of‐the‐art convolutional deep networks can be fixed at initialization, not learned.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"14 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2023-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73146059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

DeepFixCX: Explainable privacy‐preserving image compression for medical image analysis DeepFixCX:可解释的隐私保护图像压缩医学图像分析

IF 7.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

Pub Date : 2023-03-11 DOI: 10.1002/widm.1495

Alex Gaudio, A. Smailagic, C. Faloutsos, Shreshta Mohan, Elvin Johnson, Yuhao Liu, P. Costa, A. Campilho

Explanations of a model's biases or predictions are essential to medical image analysis. Yet, explainable machine learning approaches for medical image analysis are challenged by needs to preserve privacy of patient data, and by current trends in deep learning to use unsustainably large models and large datasets. We propose DeepFixCX for explainable and privacy‐preserving medical image compression that is nimble and performant. We contribute a review of the field and a conceptual framework for simultaneous privacy and explainability via tools of compression. DeepFixCX compresses images without learning by removing or obscuring spatial and edge information. DeepFixCX is ante‐hoc explainable and gives privatized post hoc explanations of spatial and edge bias without accessing the original image. DeepFixCX privatizes images to prevent image reconstruction and mitigate patient re‐identification. DeepFixCX is nimble. Compression can occur on a laptop CPU or GPU to compress and privatize 1700 images per second of size 320 × 320. DeepFixCX enables use of low memory MLP classifiers for vision data; permitting small performance loss gives end‐to‐end MLP performance over 70× faster and batch size over 100× larger. DeepFixCX consistently improves predictive classification performance of a Deep Neural Network (DNN) by 0.02 AUC ROC on Glaucoma and Cervix Type detection datasets, and can improve multi‐label chest x‐ray classification performance in seven of 10 tested settings. In all three datasets, compression to less than 5% of original number of pixels gives matching or improved performance. Our main novelty is to define an explainability versus privacy problem and address it with lossy compression.

解释模型的偏差或预测对医学图像分析至关重要。然而，用于医学图像分析的可解释机器学习方法受到保护患者数据隐私需求的挑战，以及当前深度学习使用不可持续的大型模型和大型数据集的趋势。我们提出DeepFixCX用于灵活和高性能的可解释和隐私保护医学图像压缩。我们对该领域进行了回顾，并通过压缩工具提供了同时隐私和可解释性的概念框架。DeepFixCX压缩图像没有学习通过删除或模糊空间和边缘信息。DeepFixCX是事前可解释的，并在不访问原始图像的情况下提供空间和边缘偏差的私有化事后解释。DeepFixCX将图像私有化，以防止图像重建并减轻患者的重新识别。DeepFixCX很灵活。压缩可以在笔记本电脑的CPU或GPU上进行，以每秒压缩和私有化1700张大小为320 × 320的图像。DeepFixCX支持对视觉数据使用低内存MLP分类器;允许较小的性能损失，使端到端MLP性能提高70倍以上，批量大小增加100倍以上。DeepFixCX在青光眼和宫颈类型检测数据集上持续提高深度神经网络(DNN)的预测分类性能0.02 AUC ROC，并且可以在10个测试设置中的7个中提高多标签胸部x线分类性能。在所有三个数据集中，压缩到原始像素数的5%以下可以获得匹配或改进的性能。我们的主要新颖之处在于定义了可解释性与隐私性的问题，并用有损压缩来解决这个问题。

{"title":"DeepFixCX: Explainable privacy‐preserving image compression for medical image analysis","authors":"Alex Gaudio, A. Smailagic, C. Faloutsos, Shreshta Mohan, Elvin Johnson, Yuhao Liu, P. Costa, A. Campilho","doi":"10.1002/widm.1495","DOIUrl":"https://doi.org/10.1002/widm.1495","url":null,"abstract":"Explanations of a model's biases or predictions are essential to medical image analysis. Yet, explainable machine learning approaches for medical image analysis are challenged by needs to preserve privacy of patient data, and by current trends in deep learning to use unsustainably large models and large datasets. We propose DeepFixCX for explainable and privacy‐preserving medical image compression that is nimble and performant. We contribute a review of the field and a conceptual framework for simultaneous privacy and explainability via tools of compression. DeepFixCX compresses images without learning by removing or obscuring spatial and edge information. DeepFixCX is ante‐hoc explainable and gives privatized post hoc explanations of spatial and edge bias without accessing the original image. DeepFixCX privatizes images to prevent image reconstruction and mitigate patient re‐identification. DeepFixCX is nimble. Compression can occur on a laptop CPU or GPU to compress and privatize 1700 images per second of size 320 × 320. DeepFixCX enables use of low memory MLP classifiers for vision data; permitting small performance loss gives end‐to‐end MLP performance over 70× faster and batch size over 100× larger. DeepFixCX consistently improves predictive classification performance of a Deep Neural Network (DNN) by 0.02 AUC ROC on Glaucoma and Cervix Type detection datasets, and can improve multi‐label chest x‐ray classification performance in seven of 10 tested settings. In all three datasets, compression to less than 5% of original number of pixels gives matching or improved performance. Our main novelty is to define an explainability versus privacy problem and address it with lossy compression.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"94 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2023-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76416264","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Interpretable and explainable machine learning: A methods‐centric overview with concrete examples 可解释和可解释的机器学习:以方法为中心的概述和具体示例

IF 7.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

Pub Date : 2023-02-28 DOI: 10.1002/widm.1493

Ricards Marcinkevics, Julia E. Vogt

Interpretability and explainability are crucial for machine learning (ML) and statistical applications in medicine, economics, law, and natural sciences and form an essential principle for ML model design and development. Although interpretability and explainability have escaped a precise and universal definition, many models and techniques motivated by these properties have been developed over the last 30 years, with the focus currently shifting toward deep learning. We will consider concrete examples of state‐of‐the‐art, including specially tailored rule‐based, sparse, and additive classification models, interpretable representation learning, and methods for explaining black‐box models post hoc. The discussion will emphasize the need for and relevance of interpretability and explainability, the divide between them, and the inductive biases behind the presented “zoo” of interpretable models and explanation methods.

可解释性和可解释性对于机器学习(ML)和统计学在医学、经济、法律和自然科学中的应用至关重要，也是ML模型设计和开发的基本原则。虽然可解释性和可解释性没有一个精确和通用的定义，但在过去的30年里，许多由这些属性驱动的模型和技术已经被开发出来，目前的重点正在转向深度学习。我们将考虑最新技术的具体例子，包括专门定制的基于规则的、稀疏的和附加的分类模型，可解释的表示学习，以及事后解释黑箱模型的方法。讨论将强调可解释性和可解释性的必要性和相关性，它们之间的区别，以及所呈现的可解释性模型和解释方法“动物园”背后的归纳偏见。

引用次数: 7

A systematic review of Green AI 绿色人工智能的系统回顾

IF 7.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

Pub Date : 2023-01-26 DOI: 10.1002/widm.1507

R. Verdecchia, June Sallou, Luís Cruz

With the ever‐growing adoption of artificial intelligence (AI)‐based systems, the carbon footprint of AI is no longer negligible. AI researchers and practitioners are therefore urged to hold themselves accountable for the carbon emissions of the AI models they design and use. This led in recent years to the appearance of researches tackling AI environmental sustainability, a field referred to as Green AI. Despite the rapid growth of interest in the topic, a comprehensive overview of Green AI research is to date still missing. To address this gap, in this article, we present a systematic review of the Green AI literature. From the analysis of 98 primary studies, different patterns emerge. The topic experienced a considerable growth from 2020 onward. Most studies consider monitoring AI model footprint, tuning hyperparameters to improve model sustainability, or benchmarking models. A mix of position papers, observational studies, and solution papers are present. Most papers focus on the training phase, are algorithm‐agnostic or study neural networks, and use image data. Laboratory experiments are the most common research strategy. Reported Green AI energy savings go up to 115%, with savings over 50% being rather common. Industrial parties are involved in Green AI studies, albeit most target academic readers. Green AI tool provisioning is scarce. As a conclusion, the Green AI research field results to have reached a considerable level of maturity. Therefore, from this review emerges that the time is suitable to adopt other Green AI research strategies, and port the numerous promising academic results to industrial practice.

随着人工智能(AI)系统的日益普及，人工智能的碳足迹不再是可以忽略不计的。因此，人工智能研究人员和从业者被敦促对他们设计和使用的人工智能模型的碳排放负责。这导致近年来出现了针对人工智能环境可持续性的研究，这一领域被称为绿色人工智能。尽管对这一主题的兴趣迅速增长，但迄今为止，对绿色人工智能研究的全面概述仍然缺失。为了解决这一差距，在本文中，我们对绿色人工智能文献进行了系统回顾。通过对98项初步研究的分析，我们发现了不同的模式。从2020年开始，这个话题经历了相当大的增长。大多数研究考虑监测人工智能模型的足迹，调整超参数以提高模型的可持续性，或对模型进行基准测试。本文包含立场论文、观察研究和解决方案论文。大多数论文关注于训练阶段，与算法无关或研究神经网络，并使用图像数据。实验室实验是最常用的研究策略。据报道，绿色人工智能节能高达115%，其中超过50%的节能相当普遍。工业界也参与了绿色人工智能研究，尽管大多数针对的是学术读者。绿色人工智能工具供应稀缺。综上所述，绿色人工智能研究领域的成果已经达到了相当成熟的水平。因此，从这一综述中可以看出，现在是时候采用其他绿色人工智能研究策略，并将众多有前途的学术成果移植到工业实践中。

{"title":"A systematic review of Green AI","authors":"R. Verdecchia, June Sallou, Luís Cruz","doi":"10.1002/widm.1507","DOIUrl":"https://doi.org/10.1002/widm.1507","url":null,"abstract":"With the ever‐growing adoption of artificial intelligence (AI)‐based systems, the carbon footprint of AI is no longer negligible. AI researchers and practitioners are therefore urged to hold themselves accountable for the carbon emissions of the AI models they design and use. This led in recent years to the appearance of researches tackling AI environmental sustainability, a field referred to as Green AI. Despite the rapid growth of interest in the topic, a comprehensive overview of Green AI research is to date still missing. To address this gap, in this article, we present a systematic review of the Green AI literature. From the analysis of 98 primary studies, different patterns emerge. The topic experienced a considerable growth from 2020 onward. Most studies consider monitoring AI model footprint, tuning hyperparameters to improve model sustainability, or benchmarking models. A mix of position papers, observational studies, and solution papers are present. Most papers focus on the training phase, are algorithm‐agnostic or study neural networks, and use image data. Laboratory experiments are the most common research strategy. Reported Green AI energy savings go up to 115%, with savings over 50% being rather common. Industrial parties are involved in Green AI studies, albeit most target academic readers. Green AI tool provisioning is scarce. As a conclusion, the Green AI research field results to have reached a considerable level of maturity. Therefore, from this review emerges that the time is suitable to adopt other Green AI research strategies, and port the numerous promising academic results to industrial practice.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"232 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2023-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82554190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

Privacy‐preserving data mining and machine learning in healthcare: Applications, challenges, and solutions 医疗保健中的隐私保护数据挖掘和机器学习:应用、挑战和解决方案

IF 7.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

Pub Date : 2023-01-24 DOI: 10.1002/widm.1490

V. Naresh, Muthusamy Thamarai

Data mining (DM) and machine learning (ML) applications in medical diagnostic systems are budding. Data privacy is essential in these systems as healthcare data are highly sensitive. The proposed work first discusses various privacy and security challenges in these systems. To address these next, we discuss different privacy‐preserving (PP) computation techniques in the context of DM and ML for secure data evaluation and processing. The state‐of‐the‐art applications of these systems in healthcare are analyzed at various stages such as data collection, data publication, data distribution, and output phases regarding PPDM and input, model, training, and output phases in the context of PPML. Furthermore, PP federated learning is also discussed. Finally, we present open challenges in these systems and future research directions.

数据挖掘(DM)和机器学习(ML)在医疗诊断系统中的应用正在崭露头角。由于医疗保健数据高度敏感，数据隐私在这些系统中至关重要。建议的工作首先讨论了这些系统中的各种隐私和安全挑战。为了解决这些问题，接下来，我们将讨论DM和ML背景下用于安全数据评估和处理的不同隐私保护(PP)计算技术。这些系统在医疗保健中的最新应用在各个阶段进行了分析，例如数据收集、数据发布、数据分发和关于PPDM的输出阶段，以及PPML背景下的输入、模型、培训和输出阶段。此外，还讨论了PP联合学习。最后，我们提出了这些系统面临的挑战和未来的研究方向。

引用次数: 7

A review on multimodal zero‐shot learning 多模态零次学习研究综述

IF 7.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

Pub Date : 2023-01-20 DOI: 10.1002/widm.1488

Weipeng Cao, Yuhao Wu, Yixuan Sun, Haigang Zhang, Jin Ren, Dujuan Gu, Xingkai Wang

Multimodal learning provides a path to fully utilize all types of information related to the modeling target to provide the model with a global vision. Zero‐shot learning (ZSL) is a general solution for incorporating prior knowledge into data‐driven models and achieving accurate class identification. The combination of the two, known as multimodal ZSL (MZSL), can fully exploit the advantages of both technologies and is expected to produce models with greater generalization ability. However, the MZSL algorithms and applications have not yet been thoroughly investigated and summarized. This study fills this gap by providing an objective overview of MZSL's definition, typical algorithms, representative applications, and critical issues. This article will not only provide researchers in this field with a comprehensive perspective, but it will also highlight several promising research directions.

多模态学习提供了一个途径，可以充分利用与建模目标相关的各类信息，为模型提供全局视野。零射击学习(ZSL)是将先验知识纳入数据驱动模型并实现准确类别识别的通用解决方案。两者的结合被称为多模态ZSL (multimodal ZSL, MZSL)，可以充分利用两种技术的优势，并有望产生具有更强泛化能力的模型。然而，MZSL算法及其应用尚未得到深入的研究和总结。本研究通过提供MZSL的定义、典型算法、代表性应用和关键问题的客观概述来填补这一空白。本文不仅将为该领域的研究人员提供一个全面的视角，而且还将突出几个有前景的研究方向。

引用次数: 4