首页 > 最新文献

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery最新文献

英文 中文
Data and text mining from online reviews: An automatic literature analysis 在线评论的数据和文本挖掘:一种自动文献分析
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-01-20 DOI: 10.1002/widm.1448
Sérgio Moro, P. Rita
This paper reports on a thorough analysis of the scientific literature using data and text mining to uncover knowledge from online reviews due to their importance as user‐generated content. In this context, more than 12,000 papers were extracted from publications indexed in the Scopus database within the last 15 years. Regarding the type of data, most previous studies focused on qualitative textual data to perform their analysis, with fewer looking for quantitative scores and/or characterizing reviewer profiles. In terms of application domains, information management and technology, e‐commerce, and tourism stand out. It is also clear that other areas of potentially valuable applications should be addressed in future research, such as arts and education, as well as more interdisciplinary approaches, namely in the spectrum of the social sciences.
本文报告了对科学文献的全面分析,使用数据和文本挖掘从在线评论中发现知识,因为它们作为用户生成内容的重要性。在此背景下,在过去15年中,从Scopus数据库索引的出版物中提取了12,000多篇论文。关于数据的类型,大多数以前的研究集中在定性文本数据来执行他们的分析,很少寻找定量分数和/或描述评论者的个人资料。在应用领域方面,信息管理与技术、电子商务和旅游领域尤为突出。同样清楚的是,在未来的研究中应该处理其他可能有价值的应用领域,例如艺术和教育,以及更多的跨学科方法,即在社会科学的范围内。
{"title":"Data and text mining from online reviews: An automatic literature analysis","authors":"Sérgio Moro, P. Rita","doi":"10.1002/widm.1448","DOIUrl":"https://doi.org/10.1002/widm.1448","url":null,"abstract":"This paper reports on a thorough analysis of the scientific literature using data and text mining to uncover knowledge from online reviews due to their importance as user‐generated content. In this context, more than 12,000 papers were extracted from publications indexed in the Scopus database within the last 15 years. Regarding the type of data, most previous studies focused on qualitative textual data to perform their analysis, with fewer looking for quantitative scores and/or characterizing reviewer profiles. In terms of application domains, information management and technology, e‐commerce, and tourism stand out. It is also clear that other areas of potentially valuable applications should be addressed in future research, such as arts and education, as well as more interdisciplinary approaches, namely in the spectrum of the social sciences.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"5 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2022-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87830719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Methods and tools for causal discovery and causal inference 因果发现和因果推理的方法和工具
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-01-19 DOI: 10.1002/widm.1449
Ana Rita Nogueira, Andrea Pugnana, S. Ruggieri, D. Pedreschi, João Gama
Causality is a complex concept, which roots its developments across several fields, such as statistics, economics, epidemiology, computer science, and philosophy. In recent years, the study of causal relationships has become a crucial part of the Artificial Intelligence community, as causality can be a key tool for overcoming some limitations of correlation‐based Machine Learning systems. Causality research can generally be divided into two main branches, that is, causal discovery and causal inference. The former focuses on obtaining causal knowledge directly from observational data. The latter aims to estimate the impact deriving from a change of a certain variable over an outcome of interest. This article aims at covering several methodologies that have been developed for both tasks. This survey does not only focus on theoretical aspects. But also provides a practical toolkit for interested researchers and practitioners, including software, datasets, and running examples.
因果关系是一个复杂的概念,它的发展植根于多个领域,如统计学、经济学、流行病学、计算机科学和哲学。近年来,因果关系的研究已经成为人工智能领域的一个重要组成部分,因为因果关系可以成为克服基于相关性的机器学习系统的一些局限性的关键工具。因果关系研究一般可以分为两个主要分支,即因果发现和因果推理。前者侧重于直接从观测数据中获得因果知识。后者旨在估计某个变量的变化对感兴趣的结果的影响。本文旨在介绍为这两项任务开发的几种方法。本次调查并不仅仅集中在理论方面。但也为感兴趣的研究人员和实践者提供了实用的工具包,包括软件,数据集和运行示例。
{"title":"Methods and tools for causal discovery and causal inference","authors":"Ana Rita Nogueira, Andrea Pugnana, S. Ruggieri, D. Pedreschi, João Gama","doi":"10.1002/widm.1449","DOIUrl":"https://doi.org/10.1002/widm.1449","url":null,"abstract":"Causality is a complex concept, which roots its developments across several fields, such as statistics, economics, epidemiology, computer science, and philosophy. In recent years, the study of causal relationships has become a crucial part of the Artificial Intelligence community, as causality can be a key tool for overcoming some limitations of correlation‐based Machine Learning systems. Causality research can generally be divided into two main branches, that is, causal discovery and causal inference. The former focuses on obtaining causal knowledge directly from observational data. The latter aims to estimate the impact deriving from a change of a certain variable over an outcome of interest. This article aims at covering several methodologies that have been developed for both tasks. This survey does not only focus on theoretical aspects. But also provides a practical toolkit for interested researchers and practitioners, including software, datasets, and running examples.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"90 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2022-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86968307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 54
Facial feature discovery for ethnicity recognition 种族识别的面部特征发现
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-01-18 DOI: 10.1002/widm.1446
{"title":"Facial feature discovery for ethnicity recognition","authors":"","doi":"10.1002/widm.1446","DOIUrl":"https://doi.org/10.1002/widm.1446","url":null,"abstract":"","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"1 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2022-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77866698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comprehensive review on Arabic word sense disambiguation for natural language processing applications 阿拉伯语词义消歧在自然语言处理中的应用综述
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-01-18 DOI: 10.1002/widm.1447
S. Kaddoura, R. D. Ahmed, D. JudeHemanth
In communication, textual data are a vital attribute. In all languages, ambiguous or polysemous words' meaning changes depending on the context in which they are used. The ability to determine the ambiguous word's correct meaning is a Know‐distill challenging task in natural language processing (NLP). Word sense disambiguation (WSD) is an NLP process to analyze and determine the correct meaning of polysemous words in a text. WSD is a computational linguistics task that automatically identifies the polysemous word's set of senses. Based on the context some word comes into view, WSD recognizes and tags the word to its correct priori known meaning. Semitic languages like Arabic have even more significant challenges than other languages since Arabic lacks diacritics, standardization, and a massive shortage of available resources. Recently, many approaches and techniques have been suggested to solve word ambiguity dilemmas in many different ways and several languages. In this review paper, an extensive survey of research works is presented, seeking to solve Arabic word sense disambiguation with the existing AWSD datasets.
在通信中,文本数据是一个至关重要的属性。在所有的语言中,歧义或多义词的意思都是根据使用它们的上下文而变化的。在自然语言处理(NLP)中,确定歧义词的正确含义的能力是一项具有挑战性的任务。词义消歧是一种分析和确定文本中多义词正确含义的自然语言处理过程。WSD是一项计算语言学任务,可以自动识别多义词的词义集。根据上下文,WSD识别并标记该词的正确先验已知含义。像阿拉伯语这样的闪族语言比其他语言面临更大的挑战,因为阿拉伯语缺乏变音符、标准化和大量可用资源的短缺。近年来,人们提出了许多解决歧义困境的方法和技术,这些方法和技术以不同的方式和不同的语言出现。在这篇综述文章中,广泛的调查研究工作是提出,寻求解决阿拉伯语词义消歧与现有的AWSD数据集。
{"title":"A comprehensive review on Arabic word sense disambiguation for natural language processing applications","authors":"S. Kaddoura, R. D. Ahmed, D. JudeHemanth","doi":"10.1002/widm.1447","DOIUrl":"https://doi.org/10.1002/widm.1447","url":null,"abstract":"In communication, textual data are a vital attribute. In all languages, ambiguous or polysemous words' meaning changes depending on the context in which they are used. The ability to determine the ambiguous word's correct meaning is a Know‐distill challenging task in natural language processing (NLP). Word sense disambiguation (WSD) is an NLP process to analyze and determine the correct meaning of polysemous words in a text. WSD is a computational linguistics task that automatically identifies the polysemous word's set of senses. Based on the context some word comes into view, WSD recognizes and tags the word to its correct priori known meaning. Semitic languages like Arabic have even more significant challenges than other languages since Arabic lacks diacritics, standardization, and a massive shortage of available resources. Recently, many approaches and techniques have been suggested to solve word ambiguity dilemmas in many different ways and several languages. In this review paper, an extensive survey of research works is presented, seeking to solve Arabic word sense disambiguation with the existing AWSD datasets.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"15 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2022-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82501286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Machine learning methods for generating high dimensional discrete datasets 生成高维离散数据集的机器学习方法
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-01-18 DOI: 10.1002/widm.1450
G. Manco, E. Ritacco, Antonino Rullo, D. Saccá, Edoardo Serra
The development of platforms and techniques for emerging Big Data and Machine Learning applications requires the availability of real‐life datasets. A possible solution is to synthesize datasets that reflect patterns of real ones using a two‐step approach: first, a real dataset X is analyzed to derive relevant patterns Z and, then, to use such patterns for reconstructing a new dataset X′ that preserves the main characteristics of X . This survey explores two possible approaches: (1) Constraint‐based generation and (2) probabilistic generative modeling. The former is devised using inverse mining ( IFM ) techniques, and consists of generating a dataset satisfying given support constraints on the itemsets of an input set, that are typically the frequent ones. By contrast, for the latter approach, recent developments in probabilistic generative modeling ( PGM ) are explored that model the generation as a sampling process from a parametric distribution, typically encoded as neural network. The two approaches are compared by providing an overview of their instantiations for the case of discrete data and discussing their pros and cons.
新兴大数据和机器学习应用的平台和技术的发展需要真实生活数据集的可用性。一种可能的解决方案是使用两步方法合成反映真实数据模式的数据集:首先,分析真实数据集X以获得相关模式Z,然后使用这些模式重建保留X主要特征的新数据集X '。本研究探讨了两种可能的方法:(1)基于约束的生成和(2)概率生成建模。前者是使用逆挖掘(IFM)技术设计的,包括生成一个满足给定支持约束的数据集,这些约束是输入集的项目集,通常是频繁的项目集。相比之下,对于后一种方法,探索了概率生成建模(PGM)的最新发展,该模型将生成作为参数分布的采样过程,通常编码为神经网络。对这两种方法进行了比较,概述了离散数据的实例化,并讨论了它们的优缺点。
{"title":"Machine learning methods for generating high dimensional discrete datasets","authors":"G. Manco, E. Ritacco, Antonino Rullo, D. Saccá, Edoardo Serra","doi":"10.1002/widm.1450","DOIUrl":"https://doi.org/10.1002/widm.1450","url":null,"abstract":"The development of platforms and techniques for emerging Big Data and Machine Learning applications requires the availability of real‐life datasets. A possible solution is to synthesize datasets that reflect patterns of real ones using a two‐step approach: first, a real dataset X is analyzed to derive relevant patterns Z and, then, to use such patterns for reconstructing a new dataset X′ that preserves the main characteristics of X . This survey explores two possible approaches: (1) Constraint‐based generation and (2) probabilistic generative modeling. The former is devised using inverse mining ( IFM ) techniques, and consists of generating a dataset satisfying given support constraints on the itemsets of an input set, that are typically the frequent ones. By contrast, for the latter approach, recent developments in probabilistic generative modeling ( PGM ) are explored that model the generation as a sampling process from a parametric distribution, typically encoded as neural network. The two approaches are compared by providing an overview of their instantiations for the case of discrete data and discussing their pros and cons.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"93 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2022-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82095149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The use of machine learning in sport outcome prediction: A review 机器学习在运动结果预测中的应用综述
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-01-11 DOI: 10.1002/widm.1445
Ines Horvat
{"title":"The use of machine learning in sport outcome prediction: A review","authors":"Ines Horvat","doi":"10.1002/widm.1445","DOIUrl":"https://doi.org/10.1002/widm.1445","url":null,"abstract":"","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"117 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2022-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88239977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Blockchain networks: Data structures of Bitcoin, Monero, Zcash, Ethereum, Ripple, and Iota. 区块链网络:比特币、门罗币、Zcash、以太坊、Ripple和Iota的数据结构。
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2022-01-01 Epub Date: 2021-11-17 DOI: 10.1002/widm.1436
Cuneyt Gurcan Akcora, Yulia R Gel, Murat Kantarcioglu

Blockchain is an emerging technology that has enabled many applications, from cryptocurrencies to digital asset management and supply chains. Due to this surge of popularity, analyzing the data stored on blockchains poses a new critical challenge in data science. To assist data scientists in various analytic tasks for a blockchain, in this tutorial, we provide a systematic and comprehensive overview of the fundamental elements of blockchain network models. We discuss how we can abstract blockchain data as various types of networks and further use such associated network abstractions to reap important insights on blockchains' structure, organization, and functionality. This article is categorized under:Technologies > Data PreprocessingApplication Areas > Business and IndustryFundamental Concepts of Data and Knowledge > Data ConceptsFundamental Concepts of Data and Knowledge > Knowledge Representation.

区块链是一种新兴技术,已经实现了从加密货币到数字资产管理和供应链的许多应用。由于这种普及程度的激增,分析存储在区块链上的数据对数据科学提出了新的关键挑战。为了帮助数据科学家完成区块链的各种分析任务,在本教程中,我们对区块链网络模型的基本元素进行了系统和全面的概述。我们讨论了如何将区块链数据抽象为各种类型的网络,并进一步使用这些相关的网络抽象来获得关于区块链结构、组织和功能的重要见解。本文分类如下:技术>数据预处理>应用领域>商业和工业数据和知识的基本概念>数据概念>数据和知识的基本概念>知识表示。
{"title":"Blockchain networks: Data structures of Bitcoin, Monero, Zcash, Ethereum, Ripple, and Iota.","authors":"Cuneyt Gurcan Akcora,&nbsp;Yulia R Gel,&nbsp;Murat Kantarcioglu","doi":"10.1002/widm.1436","DOIUrl":"https://doi.org/10.1002/widm.1436","url":null,"abstract":"<p><p>Blockchain is an emerging technology that has enabled many applications, from cryptocurrencies to digital asset management and supply chains. Due to this surge of popularity, analyzing the data stored on blockchains poses a new critical challenge in data science. To assist data scientists in various analytic tasks for a blockchain, in this tutorial, we provide a systematic and comprehensive overview of the fundamental elements of blockchain network models. We discuss how we can abstract blockchain data as various types of networks and further use such associated network abstractions to reap important insights on blockchains' structure, organization, and functionality. This article is categorized under:Technologies > Data PreprocessingApplication Areas > Business and IndustryFundamental Concepts of Data and Knowledge > Data ConceptsFundamental Concepts of Data and Knowledge > Knowledge Representation.</p>","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"12 1","pages":"e1436"},"PeriodicalIF":7.8,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/9f/f6/WIDM-12-0.PMC9286592.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40613886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Process mining applications in the healthcare domain: A comprehensive review 流程挖掘在医疗保健领域的应用:全面回顾
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-12-28 DOI: 10.1002/widm.1442
A. Guzzo, Antonino Rullo, E. Vocaturo
Process mining (PM) is a well‐known research area that includes techniques, methodologies, and tools for analyzing processes in a variety of application domains. In the case of healthcare, processes are characterized by high variability in terms of activities, duration, and involved resources (e.g., physicians, nurses, administrators, machineries, etc.). Besides, the multitude of diseases that the patients housed in healthcare facilities suffer from makes medical contexts highly heterogeneous. As a result, understanding and analyzing healthcare processes are certainly not trivial tasks, and administrators and doctors look for tools and methods that can concretely support them in improving the healthcare services they are involved in. In this context, PM has been increasingly used for a wide range of applications as reported in some recent reviews. However, these reviews mainly focus on discussion on applications related to the clinical pathways, while a systematic review of all possible applications is absent. In this article, we selected 172 papers published in the last 10 years, that present applications of PM in the healthcare domain. The objective of this study is to help and guide researchers interested in the medical field to understand the main PM applications in the healthcare, but also to suggest new ways to develop promising and not yet fully investigated applications. Moreover, our study could be of interest for practitioners who are considering applications of PM, who can identify and choose PM algorithms, techniques, tools, methodologies, and approaches, toward what have been the experiences of success.
过程挖掘(PM)是一个众所周知的研究领域,包括分析各种应用领域中的过程的技术、方法和工具。在医疗保健方面,流程的特点是在活动、持续时间和涉及的资源(例如,医生、护士、管理员、机器等)方面具有高度可变性。此外,住在医疗机构的病人患有多种疾病,这使得医疗环境高度多样化。因此,理解和分析医疗保健流程当然不是一项简单的任务,管理人员和医生会寻找能够具体支持他们改善所参与的医疗保健服务的工具和方法。在这种情况下,正如最近的一些评论所报道的那样,PM已经越来越多地用于广泛的应用程序。然而,这些综述主要集中在与临床途径相关的应用讨论上,而缺乏对所有可能应用的系统综述。在本文中,我们选择了过去10年中发表的172篇论文,这些论文介绍了PM在医疗保健领域的应用。本研究的目的是帮助和指导对医学领域感兴趣的研究人员了解医疗保健中的主要PM应用,同时也提出了开发有前途但尚未完全研究的应用的新方法。此外,我们的研究可能对正在考虑项目管理应用的实践者有兴趣,他们可以识别和选择项目管理算法、技术、工具、方法论和方法,以获得成功的经验。
{"title":"Process mining applications in the healthcare domain: A comprehensive review","authors":"A. Guzzo, Antonino Rullo, E. Vocaturo","doi":"10.1002/widm.1442","DOIUrl":"https://doi.org/10.1002/widm.1442","url":null,"abstract":"Process mining (PM) is a well‐known research area that includes techniques, methodologies, and tools for analyzing processes in a variety of application domains. In the case of healthcare, processes are characterized by high variability in terms of activities, duration, and involved resources (e.g., physicians, nurses, administrators, machineries, etc.). Besides, the multitude of diseases that the patients housed in healthcare facilities suffer from makes medical contexts highly heterogeneous. As a result, understanding and analyzing healthcare processes are certainly not trivial tasks, and administrators and doctors look for tools and methods that can concretely support them in improving the healthcare services they are involved in. In this context, PM has been increasingly used for a wide range of applications as reported in some recent reviews. However, these reviews mainly focus on discussion on applications related to the clinical pathways, while a systematic review of all possible applications is absent. In this article, we selected 172 papers published in the last 10 years, that present applications of PM in the healthcare domain. The objective of this study is to help and guide researchers interested in the medical field to understand the main PM applications in the healthcare, but also to suggest new ways to develop promising and not yet fully investigated applications. Moreover, our study could be of interest for practitioners who are considering applications of PM, who can identify and choose PM algorithms, techniques, tools, methodologies, and approaches, toward what have been the experiences of success.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"2 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72951562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
A survey on federated learning in data mining 数据挖掘中的联邦学习研究综述
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-12-09 DOI: 10.1002/widm.1443
Bin Yu, Wenjie Mao, Yihan Lv, Chen Zhang, Yu Xie
Data mining is a process to extract unknown, hidden, and potentially useful information from data. But the problem of data island makes it arduous for people to collect and analyze scattered data, and there is also a privacy security issue when mining data. A collaboratively decentralized approach called federated learning unites multiple participants to generate a shareable global optimal model and keeps privacy‐sensitive data on local devices, which may bring great hope to us for solving the problems of decentralized data and privacy protection. Though federated learning has been widely used, few systematic studies have been conducted on the subject of federated learning in data mining. Hence, different from prior reviews in this field, we make a comprehensive summary and provide a novel taxonomy of the application of federated learning in data mining. This article starts by providing a thorough description of the relevant definitions and concepts, followed by an in‐depth investigation on the challenges faced by federated learning. In this context, we elaborate four taxonomies of major applications of federated learning in data mining, including education, healthcare, IoT, and intelligent transportation, and discuss them comprehensively. Finally, we discuss four promising research directions for further research, that is, privacy enhancement, improvement of communication efficiency, heterogeneous system processing, and reducing economic costs.
数据挖掘是从数据中提取未知、隐藏和潜在有用信息的过程。但是数据孤岛的问题给人们收集和分析分散的数据带来了困难,并且在挖掘数据时也存在隐私安全问题。一种称为联邦学习的协作式分散方法将多个参与者联合起来,生成可共享的全局最优模型,并将隐私敏感数据保存在本地设备上,这可能为我们解决分散数据和隐私保护问题带来很大希望。虽然联邦学习在数据挖掘中的应用已经非常广泛,但是关于联邦学习在数据挖掘中的应用还很少有系统的研究。因此,与该领域之前的综述不同,我们对联邦学习在数据挖掘中的应用进行了全面的总结,并提供了一个新的分类。本文首先对相关定义和概念进行了全面的描述,然后对联邦学习面临的挑战进行了深入的调查。在此背景下,我们详细阐述了联邦学习在数据挖掘中的四种主要应用分类,包括教育、医疗保健、物联网和智能交通,并对它们进行了全面讨论。最后,讨论了增强隐私、提高通信效率、异构系统处理和降低经济成本四个有前景的研究方向。
{"title":"A survey on federated learning in data mining","authors":"Bin Yu, Wenjie Mao, Yihan Lv, Chen Zhang, Yu Xie","doi":"10.1002/widm.1443","DOIUrl":"https://doi.org/10.1002/widm.1443","url":null,"abstract":"Data mining is a process to extract unknown, hidden, and potentially useful information from data. But the problem of data island makes it arduous for people to collect and analyze scattered data, and there is also a privacy security issue when mining data. A collaboratively decentralized approach called federated learning unites multiple participants to generate a shareable global optimal model and keeps privacy‐sensitive data on local devices, which may bring great hope to us for solving the problems of decentralized data and privacy protection. Though federated learning has been widely used, few systematic studies have been conducted on the subject of federated learning in data mining. Hence, different from prior reviews in this field, we make a comprehensive summary and provide a novel taxonomy of the application of federated learning in data mining. This article starts by providing a thorough description of the relevant definitions and concepts, followed by an in‐depth investigation on the challenges faced by federated learning. In this context, we elaborate four taxonomies of major applications of federated learning in data mining, including education, healthcare, IoT, and intelligent transportation, and discuss them comprehensively. Finally, we discuss four promising research directions for further research, that is, privacy enhancement, improvement of communication efficiency, heterogeneous system processing, and reducing economic costs.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"14 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-12-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79460335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
A novel methodology for Arabic news classification 一种新的阿拉伯语新闻分类方法
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-12-06 DOI: 10.1002/widm.1440
Marco Alfonse, M. Gawich
The automated news classification concerns the assignment of news to one or more predefined categories. The automated classified news helps the search engines to mine and categorize the type of news that the user asks for. Most of the researchers focused on the classification of English news and ignore the Arabic news due to the complexity of the Arabic morphology. This article presents a novel methodology to classify the Arabic news. It relies on the use of features extraction and the application of machine learning classifiers which are the Naive Bayes (NB), the Logistic Regression (LR), the Random Forest (RF), the Xtreme Gradient Boosting (XGB), the K‐Nearest Neighbors (KNN), the Stochastic Gradient Descent (SGD), the Decision Tree (DT), and the Multi‐Layer Perceptron (MLP). The methodology is applied to the Arabic news dataset provided by Mendeley. The accuracy of the classification is more than 95%.
自动新闻分类涉及将新闻分配到一个或多个预定义的类别。自动分类新闻帮助搜索引擎挖掘和分类用户所要求的新闻类型。由于阿拉伯语词法的复杂性,以往的研究大多集中在英语新闻的分类上,而忽略了阿拉伯语新闻的分类。本文提出了一种新的阿拉伯语新闻分类方法。它依赖于特征提取和机器学习分类器的应用,这些分类器是朴素贝叶斯(NB)、逻辑回归(LR)、随机森林(RF)、Xtreme梯度增强(XGB)、K近邻(KNN)、随机梯度下降(SGD)、决策树(DT)和多层感知器(MLP)。该方法应用于Mendeley提供的阿拉伯语新闻数据集。分类准确率在95%以上。
{"title":"A novel methodology for Arabic news classification","authors":"Marco Alfonse, M. Gawich","doi":"10.1002/widm.1440","DOIUrl":"https://doi.org/10.1002/widm.1440","url":null,"abstract":"The automated news classification concerns the assignment of news to one or more predefined categories. The automated classified news helps the search engines to mine and categorize the type of news that the user asks for. Most of the researchers focused on the classification of English news and ignore the Arabic news due to the complexity of the Arabic morphology. This article presents a novel methodology to classify the Arabic news. It relies on the use of features extraction and the application of machine learning classifiers which are the Naive Bayes (NB), the Logistic Regression (LR), the Random Forest (RF), the Xtreme Gradient Boosting (XGB), the K‐Nearest Neighbors (KNN), the Stochastic Gradient Descent (SGD), the Decision Tree (DT), and the Multi‐Layer Perceptron (MLP). The methodology is applied to the Arabic news dataset provided by Mendeley. The accuracy of the classification is more than 95%.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"81 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76038631","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
期刊
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1