首页 > 最新文献

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery最新文献

英文 中文
Themes in data mining, big data, and crime analytics 数据挖掘、大数据和犯罪分析的主题
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2021-10-18 DOI: 10.1002/widm.1432
G. Oatley
This article examines the impact of new AI‐related technologies in data mining and big data on important research questions in crime analytics. Because the field is so broad, the review focuses on a selection of the most important topics. Challenges for information management, and in turn law and society, include: AI‐powered predictive policing; big data for legal and adversarial decisions; bias using big data and analytics in profiling and predicting criminality; forecasting crime risk and crime rates; and, regulating AI systems.
本文探讨了数据挖掘和大数据中新的人工智能相关技术对犯罪分析中重要研究问题的影响。由于这个领域是如此的广泛,本评论集中在一些最重要的主题上。信息管理以及法律和社会面临的挑战包括:人工智能驱动的预测性警务;法律和对抗决策的大数据;在分析和预测犯罪行为时使用大数据和分析的偏见;预测犯罪风险和犯罪率;以及调节人工智能系统。
{"title":"Themes in data mining, big data, and crime analytics","authors":"G. Oatley","doi":"10.1002/widm.1432","DOIUrl":"https://doi.org/10.1002/widm.1432","url":null,"abstract":"This article examines the impact of new AI‐related technologies in data mining and big data on important research questions in crime analytics. Because the field is so broad, the review focuses on a selection of the most important topics. Challenges for information management, and in turn law and society, include: AI‐powered predictive policing; big data for legal and adversarial decisions; bias using big data and analytics in profiling and predicting criminality; forecasting crime risk and crime rates; and, regulating AI systems.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2021-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79585296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Predicting home sale prices: A review of existing methods and illustration of data stream methods for improved performance 预测房屋销售价格:现有方法的回顾和数据流方法的说明,以提高性能
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2021-10-18 DOI: 10.1002/widm.1435
Donghui Shi, J. Guan, J. Zurada, Alan Levitan
The need for accurate and unbiased assessment of residential real property has always been important not only to financial institutions lending on or holding such assets but also to municipalities that rely on property taxes as their critical source of revenue. The common methodology for predicting residential property sale price is based on traditional multiple regression in spite of known issues. Machine learning methods have been proposed as an alternative approach but the results are far from satisfactory. A review of existing studies and relevant issues can help researchers better assess the pros and cons of the approaches in this important stream of research and move the field forward. This article provides such a review. In our review, we have noticed that common to both the regression‐based methods and machine learning methods are the use of batch‐mode learning. Thus in addition to providing a review of recent research on batch‐based residential property prediction models, this article also explores a new approach to constructing residential property price prediction models by treating past sale records as an evolving data stream. The results of our study show that the data stream approach outperforms the traditional regression method and demonstrate the potential of data stream methods in improving prediction models for residential property prices.
对住宅房地产进行准确、公正的评估,不仅对贷款或持有此类资产的金融机构很重要,对依赖房产税作为关键收入来源的市政当局也很重要。尽管存在已知的问题,但预测住宅物业销售价格的常用方法是基于传统的多元回归。机器学习方法已经被提出作为一种替代方法,但结果远不能令人满意。对现有研究和相关问题的回顾可以帮助研究人员更好地评估这一重要研究流中方法的优缺点,并推动该领域向前发展。本文提供了这样的回顾。在我们的回顾中,我们注意到基于回归的方法和机器学习方法的共同点是使用批处理模式学习。因此,除了对基于批量的住宅物业预测模型的最新研究进行回顾外,本文还探索了一种通过将过去的销售记录视为不断发展的数据流来构建住宅物业价格预测模型的新方法。我们的研究结果表明,数据流方法优于传统的回归方法,并展示了数据流方法在改进住宅物业价格预测模型方面的潜力。
{"title":"Predicting home sale prices: A review of existing methods and illustration of data stream methods for improved performance","authors":"Donghui Shi, J. Guan, J. Zurada, Alan Levitan","doi":"10.1002/widm.1435","DOIUrl":"https://doi.org/10.1002/widm.1435","url":null,"abstract":"The need for accurate and unbiased assessment of residential real property has always been important not only to financial institutions lending on or holding such assets but also to municipalities that rely on property taxes as their critical source of revenue. The common methodology for predicting residential property sale price is based on traditional multiple regression in spite of known issues. Machine learning methods have been proposed as an alternative approach but the results are far from satisfactory. A review of existing studies and relevant issues can help researchers better assess the pros and cons of the approaches in this important stream of research and move the field forward. This article provides such a review. In our review, we have noticed that common to both the regression‐based methods and machine learning methods are the use of batch‐mode learning. Thus in addition to providing a review of recent research on batch‐based residential property prediction models, this article also explores a new approach to constructing residential property price prediction models by treating past sale records as an evolving data stream. The results of our study show that the data stream approach outperforms the traditional regression method and demonstrate the potential of data stream methods in improving prediction models for residential property prices.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2021-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87352821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A survey on datasets for fairness‐aware machine learning 公平感知机器学习的数据集调查
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2021-10-01 DOI: 10.1002/widm.1452
Tai Le Quy, Arjun Roy, Vasileios Iosifidis, Eirini Ntoutsi
As decision‐making increasingly relies on machine learning (ML) and (big) data, the issue of fairness in data‐driven artificial intelligence systems is receiving increasing attention from both research and industry. A large variety of fairness‐aware ML solutions have been proposed which involve fairness‐related interventions in the data, learning algorithms, and/or model outputs. However, a vital part of proposing new approaches is evaluating them empirically on benchmark datasets that represent realistic and diverse settings. Therefore, in this paper, we overview real‐world datasets used for fairness‐aware ML. We focus on tabular data as the most common data representation for fairness‐aware ML. We start our analysis by identifying relationships between the different attributes, particularly with respect to protected attributes and class attribute, using a Bayesian network. For a deeper understanding of bias in the datasets, we investigate interesting relationships using exploratory analysis.
随着决策越来越依赖于机器学习(ML)和(大)数据,数据驱动的人工智能系统的公平性问题越来越受到研究和行业的关注。已经提出了各种各样的公平意识ML解决方案,其中涉及数据、学习算法和/或模型输出中与公平相关的干预。然而,提出新方法的一个重要部分是在代表现实和不同设置的基准数据集上进行经验评估。因此,在本文中,我们概述了用于公平感知机器学习的真实世界数据集。我们重点关注表格数据作为公平感知机器学习最常见的数据表示形式。我们通过使用贝叶斯网络识别不同属性之间的关系开始我们的分析,特别是关于受保护属性和类属性。为了更深入地了解数据集中的偏差,我们使用探索性分析研究了有趣的关系。
{"title":"A survey on datasets for fairness‐aware machine learning","authors":"Tai Le Quy, Arjun Roy, Vasileios Iosifidis, Eirini Ntoutsi","doi":"10.1002/widm.1452","DOIUrl":"https://doi.org/10.1002/widm.1452","url":null,"abstract":"As decision‐making increasingly relies on machine learning (ML) and (big) data, the issue of fairness in data‐driven artificial intelligence systems is receiving increasing attention from both research and industry. A large variety of fairness‐aware ML solutions have been proposed which involve fairness‐related interventions in the data, learning algorithms, and/or model outputs. However, a vital part of proposing new approaches is evaluating them empirically on benchmark datasets that represent realistic and diverse settings. Therefore, in this paper, we overview real‐world datasets used for fairness‐aware ML. We focus on tabular data as the most common data representation for fairness‐aware ML. We start our analysis by identifying relationships between the different attributes, particularly with respect to protected attributes and class attribute, using a Bayesian network. For a deeper understanding of bias in the datasets, we investigate interesting relationships using exploratory analysis.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79017406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 116
Detecting communities using social network analysis in online learning environments: Systematic literature review 在线学习环境中使用社会网络分析检测社区:系统文献综述
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2021-09-25 DOI: 10.1002/widm.1431
Sahar Yassine, S. Kadry, M. Sicilia
Uncovering community structure has made a significant advancement in explaining, analyzing, and forecasting behaviors and dynamics of networks related to different fields in sociology, criminology, biology, medicine, communication, economics, and academia. Detecting and clustering communities is a powerful step toward identifying the structural properties and the behavioral patterns in social networks. Recently, online learning has been progressively adopted by a lot of educational practices which raise many questions about assessing the learners' engagement, collaboration, and behaviors in the new emerging learning communities. This systematic literature review aims to assess the use of community detection techniques in analyzing the network's structure in online learning environments. It provides a comprehensive overview of the existing research that adopted those techniques with identifying the educational objectives behind their application as well as suggesting possible future research directions. Our analysis covered 65 studies that found in the literature and applied different community discovery techniques on various types of online learning environments to analyze their users' interactions patterns. Our review revealed the potential of this field in improving educational practices and decisions and in utilizing the massive amount of data generated from interacting with those environments. Finally, we highlighted the need to include automated community discovery techniques in online learning environments to facilitate and enhance their use as well as we stressed on the urge for further advance research to uncover a lot of hidden opportunities.
揭示社区结构在解释、分析和预测与社会学、犯罪学、生物学、医学、传播学、经济学和学术界等不同领域相关的网络行为和动态方面取得了重大进展。检测和聚类社区是识别社会网络结构属性和行为模式的有力一步。最近,在线学习已经逐渐被许多教育实践所采用,这就提出了许多关于评估新兴学习社区中学习者的参与、协作和行为的问题。这篇系统的文献综述旨在评估社区检测技术在分析在线学习环境中网络结构中的应用。它全面概述了采用这些技术的现有研究,并确定了其应用背后的教育目标,并提出了可能的未来研究方向。我们的分析涵盖了在文献中发现的65项研究,并在各种类型的在线学习环境中应用了不同的社区发现技术,以分析其用户的交互模式。我们的回顾揭示了该领域在改善教育实践和决策以及利用与这些环境交互产生的大量数据方面的潜力。最后,我们强调了在在线学习环境中加入自动社区发现技术的必要性,以促进和加强它们的使用,我们还强调了进一步推进研究以发现大量隐藏机会的迫切需要。
{"title":"Detecting communities using social network analysis in online learning environments: Systematic literature review","authors":"Sahar Yassine, S. Kadry, M. Sicilia","doi":"10.1002/widm.1431","DOIUrl":"https://doi.org/10.1002/widm.1431","url":null,"abstract":"Uncovering community structure has made a significant advancement in explaining, analyzing, and forecasting behaviors and dynamics of networks related to different fields in sociology, criminology, biology, medicine, communication, economics, and academia. Detecting and clustering communities is a powerful step toward identifying the structural properties and the behavioral patterns in social networks. Recently, online learning has been progressively adopted by a lot of educational practices which raise many questions about assessing the learners' engagement, collaboration, and behaviors in the new emerging learning communities. This systematic literature review aims to assess the use of community detection techniques in analyzing the network's structure in online learning environments. It provides a comprehensive overview of the existing research that adopted those techniques with identifying the educational objectives behind their application as well as suggesting possible future research directions. Our analysis covered 65 studies that found in the literature and applied different community discovery techniques on various types of online learning environments to analyze their users' interactions patterns. Our review revealed the potential of this field in improving educational practices and decisions and in utilizing the massive amount of data generated from interacting with those environments. Finally, we highlighted the need to include automated community discovery techniques in online learning environments to facilitate and enhance their use as well as we stressed on the urge for further advance research to uncover a lot of hidden opportunities.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2021-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77808874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Mining text from natural scene and video images: A survey 从自然场景和视频图像中挖掘文本:综述
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2021-08-24 DOI: 10.1002/widm.1428
P. Shivakumara, Alireza Alaei, U. Pal
In computer terminology, mining is considered as extracting meaningful information or knowledge from a large amount of data/information using computers. The meaningful information can be extracted from normal text, and images obtained from different resources, such as natural scene images, video, and documents by deriving semantics from text and content of the images. Although there are many pieces of work on text/data mining and several survey/review papers are published in the literature, to the best of our knowledge there is no survey paper on mining textual information from the natural scene, video, and document images considering word spotting techniques. In this article, we, therefore, provide a comprehensive review of both the non‐spotting and spotting based mining techniques. The mining approaches are categorized as feature, learning and hybrid‐based methods to analyze the strengths and limitations of the models of each category. In addition, it also discusses the usefulness of the methods according to different situations and applications. Furthermore, based on the review of different mining approaches, this article identifies the limitations of the existing methods and suggests new applications and future directions to continue the research in multiple directions. We believe such a review article will be useful to the researchers to quickly become familiar with the state‐of‐the‐art information and progresses made toward mining textual information from natural scene and video images.
在计算机术语中,挖掘被认为是使用计算机从大量数据/信息中提取有意义的信息或知识。通过从图像的文本和内容中派生语义,可以从正常文本中提取有意义的信息,也可以从不同资源(如自然场景图像、视频和文档)中提取图像。虽然有很多关于文本/数据挖掘的工作和一些调查/评论论文发表在文献中,但据我们所知,还没有一篇关于从自然场景、视频和文档图像中挖掘文本信息的调查论文。因此,在本文中,我们对非点状和基于点状的采矿技术进行了全面的综述。挖掘方法被分类为特征、学习和基于混合的方法,以分析每个类别模型的优势和局限性。此外,还根据不同的情况和应用,讨论了这些方法的实用性。此外,本文在综述不同挖掘方法的基础上,指出了现有方法的局限性,并提出了新的应用和未来的研究方向,以便在多个方向上继续研究。我们相信这样一篇综述文章将有助于研究人员迅速熟悉最新的信息,以及从自然场景和视频图像中挖掘文本信息的进展。
{"title":"Mining text from natural scene and video images: A survey","authors":"P. Shivakumara, Alireza Alaei, U. Pal","doi":"10.1002/widm.1428","DOIUrl":"https://doi.org/10.1002/widm.1428","url":null,"abstract":"In computer terminology, mining is considered as extracting meaningful information or knowledge from a large amount of data/information using computers. The meaningful information can be extracted from normal text, and images obtained from different resources, such as natural scene images, video, and documents by deriving semantics from text and content of the images. Although there are many pieces of work on text/data mining and several survey/review papers are published in the literature, to the best of our knowledge there is no survey paper on mining textual information from the natural scene, video, and document images considering word spotting techniques. In this article, we, therefore, provide a comprehensive review of both the non‐spotting and spotting based mining techniques. The mining approaches are categorized as feature, learning and hybrid‐based methods to analyze the strengths and limitations of the models of each category. In addition, it also discusses the usefulness of the methods according to different situations and applications. Furthermore, based on the review of different mining approaches, this article identifies the limitations of the existing methods and suggests new applications and future directions to continue the research in multiple directions. We believe such a review article will be useful to the researchers to quickly become familiar with the state‐of‐the‐art information and progresses made toward mining textual information from natural scene and video images.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2021-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85412730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Critical insights into modern hyperspectral image applications through deep learning 通过深度学习对现代高光谱图像应用的关键见解
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2021-07-21 DOI: 10.1002/widm.1426
Garima Jaiswal, Aruna Sharma, S. Yadav
Hyperspectral imaging has shown tremendous growth over the past three decades. Hyperspectral imaging was evolved through remote sensing. Along, with the technological enhancements hyperspectral imaging has outgrown, conquering over other various application areas. In addition to it, data enriched data cubes with abundant spectral and spatial information works as perk for capturing, analyzing, reviewing, and interpreting results from data. This review concentrates on emerging application areas of hyperspectral imaging. Emerging application areas are selected in ways where there is a vast scope for future enhancements by exploiting cutting edge technology, that is, deep learning. Applications of hyperspectral imaging techniques in some selected areas (remote sensing, document forgery, history and archaeology conservation, surveillance and security, machine vision for fruit quality inspection, medical imaging) are focused. The review pivots around the publicly available datasets and features used domain wise. This review can act as a baseline for deep learning and machine vision experts, historical geographers, and scholars by providing them a view of how hyperspectral imaging is implemented in multiple domains along with future research prospects.
在过去的三十年里,高光谱成像显示出巨大的增长。高光谱成像是从遥感发展而来的。随着技术的提高,高光谱成像已经超越了其他各种应用领域。此外,数据丰富的数据立方体具有丰富的光谱和空间信息,可以作为捕获、分析、审查和解释数据结果的额外功能。本文就高光谱成像的新兴应用领域作一综述。新兴应用领域的选择方式是,通过利用尖端技术(即深度学习),未来有很大的增强空间。重点介绍了高光谱成像技术在一些选定领域(遥感、文件伪造、历史和考古保护、监视和安全、水果质量检测的机器视觉、医学成像)的应用。审查围绕公开可用的数据集和使用领域明智的特征。这篇综述可以作为深度学习和机器视觉专家、历史地理学家和学者的基线,为他们提供了如何在多个领域实现高光谱成像以及未来研究前景的观点。
{"title":"Critical insights into modern hyperspectral image applications through deep learning","authors":"Garima Jaiswal, Aruna Sharma, S. Yadav","doi":"10.1002/widm.1426","DOIUrl":"https://doi.org/10.1002/widm.1426","url":null,"abstract":"Hyperspectral imaging has shown tremendous growth over the past three decades. Hyperspectral imaging was evolved through remote sensing. Along, with the technological enhancements hyperspectral imaging has outgrown, conquering over other various application areas. In addition to it, data enriched data cubes with abundant spectral and spatial information works as perk for capturing, analyzing, reviewing, and interpreting results from data. This review concentrates on emerging application areas of hyperspectral imaging. Emerging application areas are selected in ways where there is a vast scope for future enhancements by exploiting cutting edge technology, that is, deep learning. Applications of hyperspectral imaging techniques in some selected areas (remote sensing, document forgery, history and archaeology conservation, surveillance and security, machine vision for fruit quality inspection, medical imaging) are focused. The review pivots around the publicly available datasets and features used domain wise. This review can act as a baseline for deep learning and machine vision experts, historical geographers, and scholars by providing them a view of how hyperspectral imaging is implemented in multiple domains along with future research prospects.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80501507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges 超参数优化:基础、算法、最佳实践和公开挑战
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2021-07-13 DOI: 10.1002/widm.1484
B. Bischl, Martin Binder, Michel Lang, Tobias Pielok, Jakob Richter, Stefan Coors, Janek Thomas, Theresa Ullmann, M. Becker, A. Boulesteix, Difan Deng, M. Lindauer
Most machine learning algorithms are configured by a set of hyperparameters whose values must be carefully chosen and which often considerably impact performance. To avoid a time‐consuming and irreproducible manual process of trial‐and‐error to find well‐performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods—for example, based on resampling error estimation for supervised machine learning—can be employed. After introducing HPO from a general perspective, this paper reviews important HPO methods, from simple techniques such as grid or random search to more advanced methods like evolution strategies, Bayesian optimization, Hyperband, and racing. This work gives practical recommendations regarding important choices to be made when conducting HPO, including the HPO algorithms themselves, performance evaluation, how to combine HPO with machine learning pipelines, runtime improvements, and parallelization.
大多数机器学习算法都是由一组超参数配置的,这些超参数的值必须仔细选择,并且通常会对性能产生很大影响。为了避免耗时且不可重复的手动试错过程来寻找性能良好的超参数配置,可以采用各种自动超参数优化(HPO)方法,例如,基于监督机器学习的重采样误差估计。在从一般角度介绍HPO之后,本文回顾了重要的HPO方法,从简单的网格或随机搜索技术到更高级的方法,如进化策略、贝叶斯优化、Hyperband和赛车。这项工作提供了关于执行HPO时要做出的重要选择的实用建议,包括HPO算法本身、性能评估、如何将HPO与机器学习管道结合起来、运行时改进和并行化。
{"title":"Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges","authors":"B. Bischl, Martin Binder, Michel Lang, Tobias Pielok, Jakob Richter, Stefan Coors, Janek Thomas, Theresa Ullmann, M. Becker, A. Boulesteix, Difan Deng, M. Lindauer","doi":"10.1002/widm.1484","DOIUrl":"https://doi.org/10.1002/widm.1484","url":null,"abstract":"Most machine learning algorithms are configured by a set of hyperparameters whose values must be carefully chosen and which often considerably impact performance. To avoid a time‐consuming and irreproducible manual process of trial‐and‐error to find well‐performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods—for example, based on resampling error estimation for supervised machine learning—can be employed. After introducing HPO from a general perspective, this paper reviews important HPO methods, from simple techniques such as grid or random search to more advanced methods like evolution strategies, Bayesian optimization, Hyperband, and racing. This work gives practical recommendations regarding important choices to be made when conducting HPO, including the HPO algorithms themselves, performance evaluation, how to combine HPO with machine learning pipelines, runtime improvements, and parallelization.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2021-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88823751","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 113
Explainable artificial intelligence: an analytical review 可解释的人工智能:分析回顾
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2021-07-12 DOI: 10.1002/widm.1424
P. Angelov, E. Soares, Richard Jiang, Nicholas I. Arnold, Peter M. Atkinson
This paper provides a brief analytical review of the current state‐of‐the‐art in relation to the explainability of artificial intelligence in the context of recent advances in machine learning and deep learning. The paper starts with a brief historical introduction and a taxonomy, and formulates the main challenges in terms of explainability building on the recently formulated National Institute of Standards four principles of explainability. Recently published methods related to the topic are then critically reviewed and analyzed. Finally, future directions for research are suggested.
本文在机器学习和深度学习的最新进展背景下,对人工智能的可解释性进行了简要的分析回顾。本文从简要的历史介绍和分类开始,并根据最近制定的国家标准研究所可解释性的四项原则,阐述了可解释性方面的主要挑战。最近发表的方法相关的主题,然后严格审查和分析。最后,对今后的研究方向提出了建议。
{"title":"Explainable artificial intelligence: an analytical review","authors":"P. Angelov, E. Soares, Richard Jiang, Nicholas I. Arnold, Peter M. Atkinson","doi":"10.1002/widm.1424","DOIUrl":"https://doi.org/10.1002/widm.1424","url":null,"abstract":"This paper provides a brief analytical review of the current state‐of‐the‐art in relation to the explainability of artificial intelligence in the context of recent advances in machine learning and deep learning. The paper starts with a brief historical introduction and a taxonomy, and formulates the main challenges in terms of explainability building on the recently formulated National Institute of Standards four principles of explainability. Recently published methods related to the topic are then critically reviewed and analyzed. Finally, future directions for research are suggested.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2021-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78016549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 208
Trending machine learning models in cyber‐physical building environment: A survey 网络物理建筑环境中机器学习模型的趋势:一项调查
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2021-06-29 DOI: 10.1002/widm.1422
Zahid Hasan, Nirmalya Roy
Electricity usage of buildings (including offices, malls, and residential apartments) represents a significant portion of a nation's energy expenditure and carbon footprint. In the United States, the buildings' appliances consume 72% of the total produced electricity approximately. In this regard, cyber‐physical system (CPS) researchers have put forth associated research questions to reduce cyber‐physical building environment energy consumption by minimizing the energy dissipation while securing occupants' comfort. Some of the questions in CPS building include finding the optimal HVAC control, monitoring appliances' energy usage, detecting insulation problems, estimating the occupants' number and activities, managing thermal comfort, intelligently interacting with the smart grid. Various machine learning (ML) applications have been studied in recent CPS researches to improve building energy efficiency by addressing these questions. In this paper, we comprehensively review and report on the contemporary applications of ML algorithms such as deep learning, transfer learning, active learning, reinforcement learning, and other emerging techniques that propose and envision to address the above challenges in the CPS building environment. Finally, we conclude this article by discussing diverse existing open questions and prospective future directions in the CPS building environment research.
建筑物(包括办公室、商场和住宅公寓)的用电量占一个国家能源支出和碳足迹的很大一部分。在美国,建筑物的电器消耗了大约72%的总发电量。在这方面,网络物理系统(CPS)的研究人员提出了相关的研究问题,以减少网络物理建筑环境的能源消耗,同时确保居住者的舒适。CPS建筑中的一些问题包括找到最佳的HVAC控制,监控设备的能源使用,检测绝缘问题,估计居住者的数量和活动,管理热舒适,与智能电网智能交互。在最近的CPS研究中,研究了各种机器学习(ML)应用,通过解决这些问题来提高建筑能源效率。在本文中,我们全面回顾和报告了机器学习算法的当代应用,如深度学习、迁移学习、主动学习、强化学习和其他新兴技术,这些技术提出并设想了在CPS建筑环境中解决上述挑战的方法。最后,我们讨论了CPS建筑环境研究中存在的各种问题和未来的发展方向。
{"title":"Trending machine learning models in cyber‐physical building environment: A survey","authors":"Zahid Hasan, Nirmalya Roy","doi":"10.1002/widm.1422","DOIUrl":"https://doi.org/10.1002/widm.1422","url":null,"abstract":"Electricity usage of buildings (including offices, malls, and residential apartments) represents a significant portion of a nation's energy expenditure and carbon footprint. In the United States, the buildings' appliances consume 72% of the total produced electricity approximately. In this regard, cyber‐physical system (CPS) researchers have put forth associated research questions to reduce cyber‐physical building environment energy consumption by minimizing the energy dissipation while securing occupants' comfort. Some of the questions in CPS building include finding the optimal HVAC control, monitoring appliances' energy usage, detecting insulation problems, estimating the occupants' number and activities, managing thermal comfort, intelligently interacting with the smart grid. Various machine learning (ML) applications have been studied in recent CPS researches to improve building energy efficiency by addressing these questions. In this paper, we comprehensively review and report on the contemporary applications of ML algorithms such as deep learning, transfer learning, active learning, reinforcement learning, and other emerging techniques that propose and envision to address the above challenges in the CPS building environment. Finally, we conclude this article by discussing diverse existing open questions and prospective future directions in the CPS building environment research.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2021-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75415831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Over‐optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results 基准研究中的过度乐观以及解释其结果时设计和分析选项的多样性
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2021-06-04 DOI: 10.1002/widm.1441
Chris Niessl, M. Herrmann, Chiara Wiedemann, Giuseppe Casalicchio, Anne-Laure Boulesteix Institute for Medical Information Processing, Biometry, Epidemiology, Lmu Munich, Germany, Department of Statistics
In recent years, the need for neutral benchmark studies that focus on the comparison of methods coming from computational sciences has been increasingly recognized by the scientific community. While general advice on the design and analysis of neutral benchmark studies can be found in recent literature, a certain flexibility always exists. This includes the choice of data sets and performance measures, the handling of missing performance values, and the way the performance values are aggregated over the data sets. As a consequence of this flexibility, researchers may be concerned about how their choices affect the results or, in the worst case, may be tempted to engage in questionable research practices (e.g., the selective reporting of results or the post hoc modification of design or analysis components) to fit their expectations. To raise awareness for this issue, we use an example benchmark study to illustrate how variable benchmark results can be when all possible combinations of a range of design and analysis options are considered. We then demonstrate how the impact of each choice on the results can be assessed using multidimensional unfolding. In conclusion, based on previous literature and on our illustrative example, we claim that the multiplicity of design and analysis options combined with questionable research practices lead to biased interpretations of benchmark results and to over‐optimistic conclusions. This issue should be considered by computational researchers when designing and analyzing their benchmark studies and by the scientific community in general in an effort towards more reliable benchmark results.
近年来,科学界越来越认识到需要对来自计算科学的方法进行比较的中性基准研究。虽然在最近的文献中可以找到关于中性基准研究的设计和分析的一般建议,但始终存在一定的灵活性。这包括数据集和性能度量的选择、缺失性能值的处理,以及性能值在数据集上的聚合方式。由于这种灵活性,研究人员可能会担心他们的选择如何影响结果,或者在最坏的情况下,可能会受到诱惑,从事有问题的研究实践(例如,选择性报告结果或事后修改设计或分析组件),以符合他们的期望。为了提高对这个问题的认识,我们使用一个示例基准研究来说明在考虑一系列设计和分析选项的所有可能组合时,基准测试结果是如何变化的。然后,我们演示了如何使用多维展开来评估每个选择对结果的影响。总之,基于先前的文献和我们的说明性例子,我们声称设计和分析选项的多样性与有问题的研究实践相结合,导致对基准结果的偏见解释和过度乐观的结论。计算研究人员在设计和分析基准研究时应该考虑这个问题,科学界也应该考虑这个问题,以努力获得更可靠的基准结果。
{"title":"Over‐optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results","authors":"Chris Niessl, M. Herrmann, Chiara Wiedemann, Giuseppe Casalicchio, Anne-Laure Boulesteix Institute for Medical Information Processing, Biometry, Epidemiology, Lmu Munich, Germany, Department of Statistics","doi":"10.1002/widm.1441","DOIUrl":"https://doi.org/10.1002/widm.1441","url":null,"abstract":"In recent years, the need for neutral benchmark studies that focus on the comparison of methods coming from computational sciences has been increasingly recognized by the scientific community. While general advice on the design and analysis of neutral benchmark studies can be found in recent literature, a certain flexibility always exists. This includes the choice of data sets and performance measures, the handling of missing performance values, and the way the performance values are aggregated over the data sets. As a consequence of this flexibility, researchers may be concerned about how their choices affect the results or, in the worst case, may be tempted to engage in questionable research practices (e.g., the selective reporting of results or the post hoc modification of design or analysis components) to fit their expectations. To raise awareness for this issue, we use an example benchmark study to illustrate how variable benchmark results can be when all possible combinations of a range of design and analysis options are considered. We then demonstrate how the impact of each choice on the results can be assessed using multidimensional unfolding. In conclusion, based on previous literature and on our illustrative example, we claim that the multiplicity of design and analysis options combined with questionable research practices lead to biased interpretations of benchmark results and to over‐optimistic conclusions. This issue should be considered by computational researchers when designing and analyzing their benchmark studies and by the scientific community in general in an effort towards more reliable benchmark results.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2021-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85483185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1