Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery最新文献

英文中文

A survey on datasets for fairness‐aware machine learning 公平感知机器学习的数据集调查

IF 7.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

Pub Date : 2021-10-01 DOI: 10.1002/widm.1452

Tai Le Quy, Arjun Roy, Vasileios Iosifidis, Eirini Ntoutsi

As decision‐making increasingly relies on machine learning (ML) and (big) data, the issue of fairness in data‐driven artificial intelligence systems is receiving increasing attention from both research and industry. A large variety of fairness‐aware ML solutions have been proposed which involve fairness‐related interventions in the data, learning algorithms, and/or model outputs. However, a vital part of proposing new approaches is evaluating them empirically on benchmark datasets that represent realistic and diverse settings. Therefore, in this paper, we overview real‐world datasets used for fairness‐aware ML. We focus on tabular data as the most common data representation for fairness‐aware ML. We start our analysis by identifying relationships between the different attributes, particularly with respect to protected attributes and class attribute, using a Bayesian network. For a deeper understanding of bias in the datasets, we investigate interesting relationships using exploratory analysis.

随着决策越来越依赖于机器学习(ML)和(大)数据，数据驱动的人工智能系统的公平性问题越来越受到研究和行业的关注。已经提出了各种各样的公平意识ML解决方案，其中涉及数据、学习算法和/或模型输出中与公平相关的干预。然而，提出新方法的一个重要部分是在代表现实和不同设置的基准数据集上进行经验评估。因此，在本文中，我们概述了用于公平感知机器学习的真实世界数据集。我们重点关注表格数据作为公平感知机器学习最常见的数据表示形式。我们通过使用贝叶斯网络识别不同属性之间的关系开始我们的分析，特别是关于受保护属性和类属性。为了更深入地了解数据集中的偏差，我们使用探索性分析研究了有趣的关系。

引用次数: 116

Detecting communities using social network analysis in online learning environments: Systematic literature review 在线学习环境中使用社会网络分析检测社区:系统文献综述

IF 7.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

Pub Date : 2021-09-25 DOI: 10.1002/widm.1431

Sahar Yassine, S. Kadry, M. Sicilia

Uncovering community structure has made a significant advancement in explaining, analyzing, and forecasting behaviors and dynamics of networks related to different fields in sociology, criminology, biology, medicine, communication, economics, and academia. Detecting and clustering communities is a powerful step toward identifying the structural properties and the behavioral patterns in social networks. Recently, online learning has been progressively adopted by a lot of educational practices which raise many questions about assessing the learners' engagement, collaboration, and behaviors in the new emerging learning communities. This systematic literature review aims to assess the use of community detection techniques in analyzing the network's structure in online learning environments. It provides a comprehensive overview of the existing research that adopted those techniques with identifying the educational objectives behind their application as well as suggesting possible future research directions. Our analysis covered 65 studies that found in the literature and applied different community discovery techniques on various types of online learning environments to analyze their users' interactions patterns. Our review revealed the potential of this field in improving educational practices and decisions and in utilizing the massive amount of data generated from interacting with those environments. Finally, we highlighted the need to include automated community discovery techniques in online learning environments to facilitate and enhance their use as well as we stressed on the urge for further advance research to uncover a lot of hidden opportunities.

揭示社区结构在解释、分析和预测与社会学、犯罪学、生物学、医学、传播学、经济学和学术界等不同领域相关的网络行为和动态方面取得了重大进展。检测和聚类社区是识别社会网络结构属性和行为模式的有力一步。最近，在线学习已经逐渐被许多教育实践所采用，这就提出了许多关于评估新兴学习社区中学习者的参与、协作和行为的问题。这篇系统的文献综述旨在评估社区检测技术在分析在线学习环境中网络结构中的应用。它全面概述了采用这些技术的现有研究，并确定了其应用背后的教育目标，并提出了可能的未来研究方向。我们的分析涵盖了在文献中发现的65项研究，并在各种类型的在线学习环境中应用了不同的社区发现技术，以分析其用户的交互模式。我们的回顾揭示了该领域在改善教育实践和决策以及利用与这些环境交互产生的大量数据方面的潜力。最后，我们强调了在在线学习环境中加入自动社区发现技术的必要性，以促进和加强它们的使用，我们还强调了进一步推进研究以发现大量隐藏机会的迫切需要。

{"title":"Detecting communities using social network analysis in online learning environments: Systematic literature review","authors":"Sahar Yassine, S. Kadry, M. Sicilia","doi":"10.1002/widm.1431","DOIUrl":"https://doi.org/10.1002/widm.1431","url":null,"abstract":"Uncovering community structure has made a significant advancement in explaining, analyzing, and forecasting behaviors and dynamics of networks related to different fields in sociology, criminology, biology, medicine, communication, economics, and academia. Detecting and clustering communities is a powerful step toward identifying the structural properties and the behavioral patterns in social networks. Recently, online learning has been progressively adopted by a lot of educational practices which raise many questions about assessing the learners' engagement, collaboration, and behaviors in the new emerging learning communities. This systematic literature review aims to assess the use of community detection techniques in analyzing the network's structure in online learning environments. It provides a comprehensive overview of the existing research that adopted those techniques with identifying the educational objectives behind their application as well as suggesting possible future research directions. Our analysis covered 65 studies that found in the literature and applied different community discovery techniques on various types of online learning environments to analyze their users' interactions patterns. Our review revealed the potential of this field in improving educational practices and decisions and in utilizing the massive amount of data generated from interacting with those environments. Finally, we highlighted the need to include automated community discovery techniques in online learning environments to facilitate and enhance their use as well as we stressed on the urge for further advance research to uncover a lot of hidden opportunities.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"89 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77808874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Overview of accurate coresets 准确核心集概述

IF 7.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

Pub Date : 2021-09-16 DOI: 10.1002/widm.1429

Ibrahim Jubran, Alaa Maalouf, D. Feldman

A coreset of an input set is its small summarization, such that solving a problem on the coreset as its input, provably yields the same result as solving the same problem on the original (full) set, for a given family of problems (models/classifiers/loss functions). Coresets have been suggested for many fundamental problems, for example, in machine/deep learning, computer vision, databases, and theoretical computer science. This introductory paper was written following requests regarding the many inconsistent coreset definitions, lack of source code, the required deep theoretical background from different fields, and the dense papers that make it hard for beginners to apply and develop coresets. The article provides folklore, classic, and simple results including step‐by‐step proofs and figures, for the simplest (accurate) coresets. Nevertheless, we did not find most of their constructions in the literature. Moreover, we expect that putting them together in a retrospective context would help the reader to grasp current results that usually generalize these fundamental observations. Experts might appreciate the unified notation and comparison table for existing results. Open source code is provided for all presented algorithms, to demonstrate their usage, and to support the readers who are more familiar with programming than mathematics.

输入集的核心集是它的小总结，这样，对于给定的问题族(模型/分类器/损失函数)，在作为输入的核心集上解决问题，可以证明产生与在原始(完整)集上解决相同问题相同的结果。核心集已被用于许多基本问题，例如机器/深度学习、计算机视觉、数据库和理论计算机科学。这篇介绍性论文是根据以下要求编写的:许多不一致的核心集定义，缺乏源代码，需要来自不同领域的深入理论背景，以及使初学者难以应用和开发核心集的密集论文。文章提供民间传说，经典，和简单的结果，包括一步一步的证明和数字，最简单的(准确的)核心集。然而，我们并没有在文献中找到他们的大部分结构。此外，我们希望将它们放在一起进行回顾，将有助于读者掌握通常概括这些基本观察结果的当前结果。专家可能会欣赏现有结果的统一符号和比较表。本文为所介绍的所有算法提供了开源代码，以演示它们的用法，并为更熟悉编程而不是数学的读者提供支持。

{"title":"Overview of accurate coresets","authors":"Ibrahim Jubran, Alaa Maalouf, D. Feldman","doi":"10.1002/widm.1429","DOIUrl":"https://doi.org/10.1002/widm.1429","url":null,"abstract":"A coreset of an input set is its small summarization, such that solving a problem on the coreset as its input, provably yields the same result as solving the same problem on the original (full) set, for a given family of problems (models/classifiers/loss functions). Coresets have been suggested for many fundamental problems, for example, in machine/deep learning, computer vision, databases, and theoretical computer science. This introductory paper was written following requests regarding the many inconsistent coreset definitions, lack of source code, the required deep theoretical background from different fields, and the dense papers that make it hard for beginners to apply and develop coresets. The article provides folklore, classic, and simple results including step‐by‐step proofs and figures, for the simplest (accurate) coresets. Nevertheless, we did not find most of their constructions in the literature. Moreover, we expect that putting them together in a retrospective context would help the reader to grasp current results that usually generalize these fundamental observations. Experts might appreciate the unified notation and comparison table for existing results. Open source code is provided for all presented algorithms, to demonstrate their usage, and to support the readers who are more familiar with programming than mathematics.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"5 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79049243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Mining text from natural scene and video images: A survey 从自然场景和视频图像中挖掘文本:综述

IF 7.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

Pub Date : 2021-08-24 DOI: 10.1002/widm.1428

P. Shivakumara, Alireza Alaei, U. Pal

In computer terminology, mining is considered as extracting meaningful information or knowledge from a large amount of data/information using computers. The meaningful information can be extracted from normal text, and images obtained from different resources, such as natural scene images, video, and documents by deriving semantics from text and content of the images. Although there are many pieces of work on text/data mining and several survey/review papers are published in the literature, to the best of our knowledge there is no survey paper on mining textual information from the natural scene, video, and document images considering word spotting techniques. In this article, we, therefore, provide a comprehensive review of both the non‐spotting and spotting based mining techniques. The mining approaches are categorized as feature, learning and hybrid‐based methods to analyze the strengths and limitations of the models of each category. In addition, it also discusses the usefulness of the methods according to different situations and applications. Furthermore, based on the review of different mining approaches, this article identifies the limitations of the existing methods and suggests new applications and future directions to continue the research in multiple directions. We believe such a review article will be useful to the researchers to quickly become familiar with the state‐of‐the‐art information and progresses made toward mining textual information from natural scene and video images.

在计算机术语中，挖掘被认为是使用计算机从大量数据/信息中提取有意义的信息或知识。通过从图像的文本和内容中派生语义，可以从正常文本中提取有意义的信息，也可以从不同资源(如自然场景图像、视频和文档)中提取图像。虽然有很多关于文本/数据挖掘的工作和一些调查/评论论文发表在文献中，但据我们所知，还没有一篇关于从自然场景、视频和文档图像中挖掘文本信息的调查论文。因此，在本文中，我们对非点状和基于点状的采矿技术进行了全面的综述。挖掘方法被分类为特征、学习和基于混合的方法，以分析每个类别模型的优势和局限性。此外，还根据不同的情况和应用，讨论了这些方法的实用性。此外，本文在综述不同挖掘方法的基础上，指出了现有方法的局限性，并提出了新的应用和未来的研究方向，以便在多个方向上继续研究。我们相信这样一篇综述文章将有助于研究人员迅速熟悉最新的信息，以及从自然场景和视频图像中挖掘文本信息的进展。

{"title":"Mining text from natural scene and video images: A survey","authors":"P. Shivakumara, Alireza Alaei, U. Pal","doi":"10.1002/widm.1428","DOIUrl":"https://doi.org/10.1002/widm.1428","url":null,"abstract":"In computer terminology, mining is considered as extracting meaningful information or knowledge from a large amount of data/information using computers. The meaningful information can be extracted from normal text, and images obtained from different resources, such as natural scene images, video, and documents by deriving semantics from text and content of the images. Although there are many pieces of work on text/data mining and several survey/review papers are published in the literature, to the best of our knowledge there is no survey paper on mining textual information from the natural scene, video, and document images considering word spotting techniques. In this article, we, therefore, provide a comprehensive review of both the non‐spotting and spotting based mining techniques. The mining approaches are categorized as feature, learning and hybrid‐based methods to analyze the strengths and limitations of the models of each category. In addition, it also discusses the usefulness of the methods according to different situations and applications. Furthermore, based on the review of different mining approaches, this article identifies the limitations of the existing methods and suggests new applications and future directions to continue the research in multiple directions. We believe such a review article will be useful to the researchers to quickly become familiar with the state‐of‐the‐art information and progresses made toward mining textual information from natural scene and video images.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"101 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85412730","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Critical insights into modern hyperspectral image applications through deep learning 通过深度学习对现代高光谱图像应用的关键见解

IF 7.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

Pub Date : 2021-07-21 DOI: 10.1002/widm.1426

Garima Jaiswal, Aruna Sharma, S. Yadav

Hyperspectral imaging has shown tremendous growth over the past three decades. Hyperspectral imaging was evolved through remote sensing. Along, with the technological enhancements hyperspectral imaging has outgrown, conquering over other various application areas. In addition to it, data enriched data cubes with abundant spectral and spatial information works as perk for capturing, analyzing, reviewing, and interpreting results from data. This review concentrates on emerging application areas of hyperspectral imaging. Emerging application areas are selected in ways where there is a vast scope for future enhancements by exploiting cutting edge technology, that is, deep learning. Applications of hyperspectral imaging techniques in some selected areas (remote sensing, document forgery, history and archaeology conservation, surveillance and security, machine vision for fruit quality inspection, medical imaging) are focused. The review pivots around the publicly available datasets and features used domain wise. This review can act as a baseline for deep learning and machine vision experts, historical geographers, and scholars by providing them a view of how hyperspectral imaging is implemented in multiple domains along with future research prospects.

在过去的三十年里，高光谱成像显示出巨大的增长。高光谱成像是从遥感发展而来的。随着技术的提高，高光谱成像已经超越了其他各种应用领域。此外，数据丰富的数据立方体具有丰富的光谱和空间信息，可以作为捕获、分析、审查和解释数据结果的额外功能。本文就高光谱成像的新兴应用领域作一综述。新兴应用领域的选择方式是，通过利用尖端技术(即深度学习)，未来有很大的增强空间。重点介绍了高光谱成像技术在一些选定领域(遥感、文件伪造、历史和考古保护、监视和安全、水果质量检测的机器视觉、医学成像)的应用。审查围绕公开可用的数据集和使用领域明智的特征。这篇综述可以作为深度学习和机器视觉专家、历史地理学家和学者的基线，为他们提供了如何在多个领域实现高光谱成像以及未来研究前景的观点。

{"title":"Critical insights into modern hyperspectral image applications through deep learning","authors":"Garima Jaiswal, Aruna Sharma, S. Yadav","doi":"10.1002/widm.1426","DOIUrl":"https://doi.org/10.1002/widm.1426","url":null,"abstract":"Hyperspectral imaging has shown tremendous growth over the past three decades. Hyperspectral imaging was evolved through remote sensing. Along, with the technological enhancements hyperspectral imaging has outgrown, conquering over other various application areas. In addition to it, data enriched data cubes with abundant spectral and spatial information works as perk for capturing, analyzing, reviewing, and interpreting results from data. This review concentrates on emerging application areas of hyperspectral imaging. Emerging application areas are selected in ways where there is a vast scope for future enhancements by exploiting cutting edge technology, that is, deep learning. Applications of hyperspectral imaging techniques in some selected areas (remote sensing, document forgery, history and archaeology conservation, surveillance and security, machine vision for fruit quality inspection, medical imaging) are focused. The review pivots around the publicly available datasets and features used domain wise. This review can act as a baseline for deep learning and machine vision experts, historical geographers, and scholars by providing them a view of how hyperspectral imaging is implemented in multiple domains along with future research prospects.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"102 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80501507","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges 超参数优化:基础、算法、最佳实践和公开挑战

IF 7.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

Pub Date : 2021-07-13 DOI: 10.1002/widm.1484

B. Bischl, Martin Binder, Michel Lang, Tobias Pielok, Jakob Richter, Stefan Coors, Janek Thomas, Theresa Ullmann, M. Becker, A. Boulesteix, Difan Deng, M. Lindauer

Most machine learning algorithms are configured by a set of hyperparameters whose values must be carefully chosen and which often considerably impact performance. To avoid a time‐consuming and irreproducible manual process of trial‐and‐error to find well‐performing hyperparameter configurations, various automatic hyperparameter optimization (HPO) methods—for example, based on resampling error estimation for supervised machine learning—can be employed. After introducing HPO from a general perspective, this paper reviews important HPO methods, from simple techniques such as grid or random search to more advanced methods like evolution strategies, Bayesian optimization, Hyperband, and racing. This work gives practical recommendations regarding important choices to be made when conducting HPO, including the HPO algorithms themselves, performance evaluation, how to combine HPO with machine learning pipelines, runtime improvements, and parallelization.

大多数机器学习算法都是由一组超参数配置的，这些超参数的值必须仔细选择，并且通常会对性能产生很大影响。为了避免耗时且不可重复的手动试错过程来寻找性能良好的超参数配置，可以采用各种自动超参数优化(HPO)方法，例如，基于监督机器学习的重采样误差估计。在从一般角度介绍HPO之后，本文回顾了重要的HPO方法，从简单的网格或随机搜索技术到更高级的方法，如进化策略、贝叶斯优化、Hyperband和赛车。这项工作提供了关于执行HPO时要做出的重要选择的实用建议，包括HPO算法本身、性能评估、如何将HPO与机器学习管道结合起来、运行时改进和并行化。

引用次数: 113

Explainable artificial intelligence: an analytical review 可解释的人工智能:分析回顾

IF 7.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

Pub Date : 2021-07-12 DOI: 10.1002/widm.1424

P. Angelov, E. Soares, Richard Jiang, Nicholas I. Arnold, Peter M. Atkinson

This paper provides a brief analytical review of the current state‐of‐the‐art in relation to the explainability of artificial intelligence in the context of recent advances in machine learning and deep learning. The paper starts with a brief historical introduction and a taxonomy, and formulates the main challenges in terms of explainability building on the recently formulated National Institute of Standards four principles of explainability. Recently published methods related to the topic are then critically reviewed and analyzed. Finally, future directions for research are suggested.

本文在机器学习和深度学习的最新进展背景下，对人工智能的可解释性进行了简要的分析回顾。本文从简要的历史介绍和分类开始，并根据最近制定的国家标准研究所可解释性的四项原则，阐述了可解释性方面的主要挑战。最近发表的方法相关的主题，然后严格审查和分析。最后，对今后的研究方向提出了建议。

引用次数: 208

A survey on machine learning based light curve analysis for variable astronomical sources 基于机器学习的变光源光曲线分析研究进展

IF 7.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

Pub Date : 2021-07-04 DOI: 10.1002/widm.1425

Ce Yu, Kun Li, Yanxia Zhang, Jian Xiao, Chenzhou Cui, Yihan Tao, Shanjian Tang, Chao Sun, Chongke Bi

The improvement of observation capabilities has expanded the scale of new data available for time domain astronomy research, and the accumulation of observational data continues to accelerate. However, traditional data analysis methods are difficult to fully tap the potential scientific value of all data. Therefore, in the current and future research on light curve analysis, it is inevitable to use artificial intelligence (AI) technology to assist in data analysis in order to obtain as many candidates as possible with scientific research goals. This survey reviews important developments in light curve analysis over the past years, summarizes the basic concepts in machine learning and their applications in light curve analysis and concludes perspectives and challenges for light curve analysis in the near future. The full exploration of light curves of variable celestial objects relies heavily on new techniques derived from promotion of machine learning and deep learning in the astronomical big data era.

观测能力的提高扩大了时域天文学研究的新数据规模，观测数据的积累不断加快。然而，传统的数据分析方法难以充分挖掘所有数据潜在的科学价值。因此，在当前和未来的光曲线分析研究中，为了获得尽可能多的具有科学研究目标的候选数据，不可避免地要使用人工智能(AI)技术来辅助数据分析。本文回顾了近年来光曲线分析的重要进展，总结了机器学习的基本概念及其在光曲线分析中的应用，并对未来光曲线分析的前景和挑战进行了展望。对可变天体光曲线的充分探索，在很大程度上依赖于天文大数据时代机器学习和深度学习推广所衍生的新技术。

引用次数: 3

IF 7.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

Pub Date : 2021-06-29 DOI: 10.1002/widm.1422

Zahid Hasan, Nirmalya Roy

Electricity usage of buildings (including offices, malls, and residential apartments) represents a significant portion of a nation's energy expenditure and carbon footprint. In the United States, the buildings' appliances consume 72% of the total produced electricity approximately. In this regard, cyber‐physical system (CPS) researchers have put forth associated research questions to reduce cyber‐physical building environment energy consumption by minimizing the energy dissipation while securing occupants' comfort. Some of the questions in CPS building include finding the optimal HVAC control, monitoring appliances' energy usage, detecting insulation problems, estimating the occupants' number and activities, managing thermal comfort, intelligently interacting with the smart grid. Various machine learning (ML) applications have been studied in recent CPS researches to improve building energy efficiency by addressing these questions. In this paper, we comprehensively review and report on the contemporary applications of ML algorithms such as deep learning, transfer learning, active learning, reinforcement learning, and other emerging techniques that propose and envision to address the above challenges in the CPS building environment. Finally, we conclude this article by discussing diverse existing open questions and prospective future directions in the CPS building environment research.

建筑物(包括办公室、商场和住宅公寓)的用电量占一个国家能源支出和碳足迹的很大一部分。在美国，建筑物的电器消耗了大约72%的总发电量。在这方面，网络物理系统(CPS)的研究人员提出了相关的研究问题，以减少网络物理建筑环境的能源消耗，同时确保居住者的舒适。CPS建筑中的一些问题包括找到最佳的HVAC控制，监控设备的能源使用，检测绝缘问题，估计居住者的数量和活动，管理热舒适，与智能电网智能交互。在最近的CPS研究中，研究了各种机器学习(ML)应用，通过解决这些问题来提高建筑能源效率。在本文中，我们全面回顾和报告了机器学习算法的当代应用，如深度学习、迁移学习、主动学习、强化学习和其他新兴技术，这些技术提出并设想了在CPS建筑环境中解决上述挑战的方法。最后，我们讨论了CPS建筑环境研究中存在的各种问题和未来的发展方向。

{"title":"Trending machine learning models in cyber‐physical building environment: A survey","authors":"Zahid Hasan, Nirmalya Roy","doi":"10.1002/widm.1422","DOIUrl":"https://doi.org/10.1002/widm.1422","url":null,"abstract":"Electricity usage of buildings (including offices, malls, and residential apartments) represents a significant portion of a nation's energy expenditure and carbon footprint. In the United States, the buildings' appliances consume 72% of the total produced electricity approximately. In this regard, cyber‐physical system (CPS) researchers have put forth associated research questions to reduce cyber‐physical building environment energy consumption by minimizing the energy dissipation while securing occupants' comfort. Some of the questions in CPS building include finding the optimal HVAC control, monitoring appliances' energy usage, detecting insulation problems, estimating the occupants' number and activities, managing thermal comfort, intelligently interacting with the smart grid. Various machine learning (ML) applications have been studied in recent CPS researches to improve building energy efficiency by addressing these questions. In this paper, we comprehensively review and report on the contemporary applications of ML algorithms such as deep learning, transfer learning, active learning, reinforcement learning, and other emerging techniques that propose and envision to address the above challenges in the CPS building environment. Finally, we conclude this article by discussing diverse existing open questions and prospective future directions in the CPS building environment research.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"729 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75415831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

Over‐optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results 基准研究中的过度乐观以及解释其结果时设计和分析选项的多样性

IF 7.8 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

Pub Date : 2021-06-04 DOI: 10.1002/widm.1441

Chris Niessl, M. Herrmann, Chiara Wiedemann, Giuseppe Casalicchio, Anne-Laure Boulesteix Institute for Medical Information Processing, Biometry, Epidemiology, Lmu Munich, Germany, Department of Statistics

In recent years, the need for neutral benchmark studies that focus on the comparison of methods coming from computational sciences has been increasingly recognized by the scientific community. While general advice on the design and analysis of neutral benchmark studies can be found in recent literature, a certain flexibility always exists. This includes the choice of data sets and performance measures, the handling of missing performance values, and the way the performance values are aggregated over the data sets. As a consequence of this flexibility, researchers may be concerned about how their choices affect the results or, in the worst case, may be tempted to engage in questionable research practices (e.g., the selective reporting of results or the post hoc modification of design or analysis components) to fit their expectations. To raise awareness for this issue, we use an example benchmark study to illustrate how variable benchmark results can be when all possible combinations of a range of design and analysis options are considered. We then demonstrate how the impact of each choice on the results can be assessed using multidimensional unfolding. In conclusion, based on previous literature and on our illustrative example, we claim that the multiplicity of design and analysis options combined with questionable research practices lead to biased interpretations of benchmark results and to over‐optimistic conclusions. This issue should be considered by computational researchers when designing and analyzing their benchmark studies and by the scientific community in general in an effort towards more reliable benchmark results.

近年来，科学界越来越认识到需要对来自计算科学的方法进行比较的中性基准研究。虽然在最近的文献中可以找到关于中性基准研究的设计和分析的一般建议，但始终存在一定的灵活性。这包括数据集和性能度量的选择、缺失性能值的处理，以及性能值在数据集上的聚合方式。由于这种灵活性，研究人员可能会担心他们的选择如何影响结果，或者在最坏的情况下，可能会受到诱惑，从事有问题的研究实践(例如，选择性报告结果或事后修改设计或分析组件)，以符合他们的期望。为了提高对这个问题的认识，我们使用一个示例基准研究来说明在考虑一系列设计和分析选项的所有可能组合时，基准测试结果是如何变化的。然后，我们演示了如何使用多维展开来评估每个选择对结果的影响。总之，基于先前的文献和我们的说明性例子，我们声称设计和分析选项的多样性与有问题的研究实践相结合，导致对基准结果的偏见解释和过度乐观的结论。计算研究人员在设计和分析基准研究时应该考虑这个问题，科学界也应该考虑这个问题，以努力获得更可靠的基准结果。

{"title":"Over‐optimism in benchmark studies and the multiplicity of design and analysis options when interpreting their results","authors":"Chris Niessl, M. Herrmann, Chiara Wiedemann, Giuseppe Casalicchio, Anne-Laure Boulesteix Institute for Medical Information Processing, Biometry, Epidemiology, Lmu Munich, Germany, Department of Statistics","doi":"10.1002/widm.1441","DOIUrl":"https://doi.org/10.1002/widm.1441","url":null,"abstract":"In recent years, the need for neutral benchmark studies that focus on the comparison of methods coming from computational sciences has been increasingly recognized by the scientific community. While general advice on the design and analysis of neutral benchmark studies can be found in recent literature, a certain flexibility always exists. This includes the choice of data sets and performance measures, the handling of missing performance values, and the way the performance values are aggregated over the data sets. As a consequence of this flexibility, researchers may be concerned about how their choices affect the results or, in the worst case, may be tempted to engage in questionable research practices (e.g., the selective reporting of results or the post hoc modification of design or analysis components) to fit their expectations. To raise awareness for this issue, we use an example benchmark study to illustrate how variable benchmark results can be when all possible combinations of a range of design and analysis options are considered. We then demonstrate how the impact of each choice on the results can be assessed using multidimensional unfolding. In conclusion, based on previous literature and on our illustrative example, we claim that the multiplicity of design and analysis options combined with questionable research practices lead to biased interpretations of benchmark results and to over‐optimistic conclusions. This issue should be considered by computational researchers when designing and analyzing their benchmark studies and by the scientific community in general in an effort towards more reliable benchmark results.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"168 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85483185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀