首页 > 最新文献

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery最新文献

英文 中文
Unsupervised EHR‐based phenotyping via matrix and tensor decompositions 通过矩阵和张量分解无监督的基于EHR的表型
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2022-09-01 DOI: 10.1002/widm.1494
Florian Becker, A. Smilde, E. Acar
Computational phenotyping allows for unsupervised discovery of subgroups of patients as well as corresponding co‐occurring medical conditions from electronic health records (EHR). Typically, EHR data contains demographic information, diagnoses and laboratory results. Discovering (novel) phenotypes has the potential to be of prognostic and therapeutic value. Providing medical practitioners with transparent and interpretable results is an important requirement and an essential part for advancing precision medicine. Low‐rank data approximation methods such as matrix (e.g., nonnegative matrix factorization) and tensor decompositions (e.g., CANDECOMP/PARAFAC) have demonstrated that they can provide such transparent and interpretable insights. Recent developments have adapted low‐rank data approximation methods by incorporating different constraints and regularizations that facilitate interpretability further. In addition, they offer solutions for common challenges within EHR data such as high dimensionality, data sparsity and incompleteness. Especially extracting temporal phenotypes from longitudinal EHR has received much attention in recent years. In this paper, we provide a comprehensive review of low‐rank approximation‐based approaches for computational phenotyping. The existing literature is categorized into temporal versus static phenotyping approaches based on matrix versus tensor decompositions. Furthermore, we outline different approaches for the validation of phenotypes, that is, the assessment of clinical significance.
计算表型允许从电子健康记录(EHR)中无监督地发现患者亚组以及相应的共同发生的医疗状况。通常,电子病历数据包含人口统计信息、诊断和实验室结果。发现(新的)表型具有潜在的预后和治疗价值。为医疗从业者提供透明、可解释的结果是推进精准医疗的重要要求和重要组成部分。低秩数据近似方法,如矩阵(例如,非负矩阵分解)和张量分解(例如,CANDECOMP/PARAFAC)已经证明它们可以提供这样透明和可解释的见解。最近的发展已经通过结合不同的约束和正则化来适应低秩数据近似方法,从而进一步促进可解释性。此外,它们还为EHR数据中的常见挑战(如高维、数据稀疏和不完整)提供了解决方案。特别是从纵向电子病历中提取时间表型近年来受到广泛关注。在本文中,我们对基于低秩近似的计算表型方法进行了全面的综述。现有文献分为基于矩阵与张量分解的时间与静态表型方法。此外,我们概述了不同的方法来验证表型,即临床意义的评估。
{"title":"Unsupervised EHR‐based phenotyping via matrix and tensor decompositions","authors":"Florian Becker, A. Smilde, E. Acar","doi":"10.1002/widm.1494","DOIUrl":"https://doi.org/10.1002/widm.1494","url":null,"abstract":"Computational phenotyping allows for unsupervised discovery of subgroups of patients as well as corresponding co‐occurring medical conditions from electronic health records (EHR). Typically, EHR data contains demographic information, diagnoses and laboratory results. Discovering (novel) phenotypes has the potential to be of prognostic and therapeutic value. Providing medical practitioners with transparent and interpretable results is an important requirement and an essential part for advancing precision medicine. Low‐rank data approximation methods such as matrix (e.g., nonnegative matrix factorization) and tensor decompositions (e.g., CANDECOMP/PARAFAC) have demonstrated that they can provide such transparent and interpretable insights. Recent developments have adapted low‐rank data approximation methods by incorporating different constraints and regularizations that facilitate interpretability further. In addition, they offer solutions for common challenges within EHR data such as high dimensionality, data sparsity and incompleteness. Especially extracting temporal phenotypes from longitudinal EHR has received much attention in recent years. In this paper, we provide a comprehensive review of low‐rank approximation‐based approaches for computational phenotyping. The existing literature is categorized into temporal versus static phenotyping approaches based on matrix versus tensor decompositions. Furthermore, we outline different approaches for the validation of phenotypes, that is, the assessment of clinical significance.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2022-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72810907","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Short‐term photovoltaic power forecasting with adaptive stochastic configuration network ensemble 基于自适应随机配置网络集合的光伏短期发电预测
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2022-08-17 DOI: 10.1002/widm.1477
Xifeng Guo, Xinlu Wang, Yanshuang Ao, Wei Dai, Ye Gao
The volatility and intermittency of solar energy seriously restrict the development of the photovoltaic (PV) industry. Accurate forecast of short‐term PV power generation is essential for the optimal balance and dispatch of power plants in the smart grid. This article presents a machine learning approach for analyzing the volt‐ampere characteristics and influential factors on PV data. A correlation analysis is employed to discover some hidden characteristic variables. Then, an adaptive ensemble method with stochastic configuration networks as base models (AE‐SCN) is proposed to construct the PV prediction model, which integrates bagging and adaptive weighted data fusion algorithms. Compared with the original SCN, SCN ensemble (SCNE) and random vector functional‐link network (RVFLN), linear regression model, random forest model and autoregressive integrated moving average (ARMA) model, AE‐SCN performs favorably in the terms of the prediction accuracy.
太阳能的波动性和间歇性严重制约了光伏产业的发展。准确的光伏发电短期预测对于智能电网中电厂的优化平衡和调度至关重要。本文提出了一种机器学习方法来分析PV数据的伏安特性和影响因素。通过相关分析发现了一些隐藏的特征变量。然后,提出了一种以随机配置网络为基础模型的自适应集成方法(AE - SCN),该方法集成了bagging和自适应加权数据融合算法来构建PV预测模型。与原始的SCN、SCN集合(SCNE)和随机向量泛函数链网络(RVFLN)、线性回归模型、随机森林模型和自回归综合移动平均(ARMA)模型相比,AE - SCN在预测精度方面具有较好的优势。
{"title":"Short‐term photovoltaic power forecasting with adaptive stochastic configuration network ensemble","authors":"Xifeng Guo, Xinlu Wang, Yanshuang Ao, Wei Dai, Ye Gao","doi":"10.1002/widm.1477","DOIUrl":"https://doi.org/10.1002/widm.1477","url":null,"abstract":"The volatility and intermittency of solar energy seriously restrict the development of the photovoltaic (PV) industry. Accurate forecast of short‐term PV power generation is essential for the optimal balance and dispatch of power plants in the smart grid. This article presents a machine learning approach for analyzing the volt‐ampere characteristics and influential factors on PV data. A correlation analysis is employed to discover some hidden characteristic variables. Then, an adaptive ensemble method with stochastic configuration networks as base models (AE‐SCN) is proposed to construct the PV prediction model, which integrates bagging and adaptive weighted data fusion algorithms. Compared with the original SCN, SCN ensemble (SCNE) and random vector functional‐link network (RVFLN), linear regression model, random forest model and autoregressive integrated moving average (ARMA) model, AE‐SCN performs favorably in the terms of the prediction accuracy.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2022-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74289413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
On the application of machine learning in astronomy and astrophysics: A text‐mining‐based scientometric analysis 机器学习在天文学和天体物理学中的应用:基于文本挖掘的科学计量学分析
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2022-08-12 DOI: 10.1002/widm.1476
J. Rodríguez, I. Rodríguez-Rodríguez, Wai Lok Woo
Since the beginning of the 21st century, the fields of astronomy and astrophysics have experienced significant growth at observational and computational levels, leading to the acquisition of increasingly huge volumes of data. In order to process this vast quantity of information, artificial intelligence (AI) techniques are being combined with data mining to detect patterns with the aim of modeling, classifying or predicting the behavior of certain astronomical phenomena or objects. Parallel to the exponential development of the aforementioned techniques, the scientific output related to the application of AI and machine learning (ML) in astronomy and astrophysics has also experienced considerable growth in recent years. Therefore, the increasingly abundant articles make it difficult to monitor this field in terms of which research topics are the most prolific or novel, or which countries or authors are leading them. In this article, a text‐mining‐based scientometric analysis of scientific documents published over the last three decades on the application of AI and ML in the fields of astronomy and astrophysics is presented. The VOSviewer software and data from the Web of Science (WoS) are used to elucidate the evolution of publications in this research field, their distribution by country (including co‐authorship), the most relevant topics addressed, and the most cited elements and most significant co‐citations according to publication source and authorship. The obtained results demonstrate how application of AI/ML to the fields of astronomy/astrophysics represents an established and rapidly growing field of research that is crucial to obtaining scientific understanding of the universe.
自21世纪初以来,天文学和天体物理学领域在观测和计算水平上经历了显著的增长,从而获得了越来越多的海量数据。为了处理这些大量的信息,人工智能(AI)技术正在与数据挖掘相结合,以检测模式,目的是建模、分类或预测某些天文现象或物体的行为。在上述技术呈指数级发展的同时,与人工智能和机器学习(ML)在天文学和天体物理学中的应用相关的科学产出近年来也经历了相当大的增长。因此,越来越多的文章使得很难监控这个领域的哪些研究课题是最多产或最新颖的,或者哪些国家或作者是领先的。本文对过去三十年来发表的关于人工智能和机器学习在天文学和天体物理学领域应用的科学文献进行了基于文本挖掘的科学计量分析。使用VOSviewer软件和来自Web of Science (WoS)的数据来阐明该研究领域出版物的演变、国家分布(包括合作作者)、最相关的主题、根据出版物来源和作者被引用最多的元素和最重要的共同引用。获得的结果表明,AI/ML在天文学/天体物理学领域的应用代表了一个成熟且快速发展的研究领域,这对获得对宇宙的科学理解至关重要。
{"title":"On the application of machine learning in astronomy and astrophysics: A text‐mining‐based scientometric analysis","authors":"J. Rodríguez, I. Rodríguez-Rodríguez, Wai Lok Woo","doi":"10.1002/widm.1476","DOIUrl":"https://doi.org/10.1002/widm.1476","url":null,"abstract":"Since the beginning of the 21st century, the fields of astronomy and astrophysics have experienced significant growth at observational and computational levels, leading to the acquisition of increasingly huge volumes of data. In order to process this vast quantity of information, artificial intelligence (AI) techniques are being combined with data mining to detect patterns with the aim of modeling, classifying or predicting the behavior of certain astronomical phenomena or objects. Parallel to the exponential development of the aforementioned techniques, the scientific output related to the application of AI and machine learning (ML) in astronomy and astrophysics has also experienced considerable growth in recent years. Therefore, the increasingly abundant articles make it difficult to monitor this field in terms of which research topics are the most prolific or novel, or which countries or authors are leading them. In this article, a text‐mining‐based scientometric analysis of scientific documents published over the last three decades on the application of AI and ML in the fields of astronomy and astrophysics is presented. The VOSviewer software and data from the Web of Science (WoS) are used to elucidate the evolution of publications in this research field, their distribution by country (including co‐authorship), the most relevant topics addressed, and the most cited elements and most significant co‐citations according to publication source and authorship. The obtained results demonstrate how application of AI/ML to the fields of astronomy/astrophysics represents an established and rapidly growing field of research that is crucial to obtaining scientific understanding of the universe.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2022-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76999307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A survey on artificial intelligence in histopathology image analysis 人工智能在组织病理图像分析中的研究进展
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2022-07-27 DOI: 10.1002/widm.1474
M. Abdelsamea, Usama Zidan, Zakaria Senousy, M. Gaber, E. Rakha, Mohammad Ilyas
The increasing adoption of the whole slide image (WSI) technology in histopathology has dramatically transformed pathologists' workflow and allowed the use of computer systems in histopathology analysis. Extensive research in Artificial Intelligence (AI) with a huge progress has been conducted resulting in efficient, effective, and robust algorithms for several applications including cancer diagnosis, prognosis, and treatment. These algorithms offer highly accurate predictions but lack transparency, understandability, and actionability. Thus, explainable artificial intelligence (XAI) techniques are needed not only to understand the mechanism behind the decisions made by AI methods and increase user trust but also to broaden the use of AI algorithms in the clinical setting. From the survey of over 150 papers, we explore different AI algorithms that have been applied and contributed to the histopathology image analysis workflow. We first address the workflow of the histopathological process. We present an overview of various learning‐based, XAI, and actionable techniques relevant to deep learning methods in histopathological imaging. We also address the evaluation of XAI methods and the need to ensure their reliability on the field.
在组织病理学中越来越多地采用全幻灯片图像(WSI)技术,极大地改变了病理学家的工作流程,并允许在组织病理学分析中使用计算机系统。人工智能(AI)的广泛研究取得了巨大进展,为癌症诊断、预后和治疗等多种应用提供了高效、有效和稳健的算法。这些算法提供了高度准确的预测,但缺乏透明度、可理解性和可操作性。因此,可解释的人工智能(XAI)技术不仅需要理解人工智能方法做出决策背后的机制并增加用户信任,而且还需要扩大人工智能算法在临床环境中的使用。从对150多篇论文的调查中,我们探索了已经应用并有助于组织病理学图像分析工作流程的不同人工智能算法。我们首先讨论组织病理学过程的工作流程。我们概述了各种基于学习的、XAI的和与组织病理学成像中深度学习方法相关的可操作技术。我们还讨论了XAI方法的评估以及确保其在现场可靠性的必要性。
{"title":"A survey on artificial intelligence in histopathology image analysis","authors":"M. Abdelsamea, Usama Zidan, Zakaria Senousy, M. Gaber, E. Rakha, Mohammad Ilyas","doi":"10.1002/widm.1474","DOIUrl":"https://doi.org/10.1002/widm.1474","url":null,"abstract":"The increasing adoption of the whole slide image (WSI) technology in histopathology has dramatically transformed pathologists' workflow and allowed the use of computer systems in histopathology analysis. Extensive research in Artificial Intelligence (AI) with a huge progress has been conducted resulting in efficient, effective, and robust algorithms for several applications including cancer diagnosis, prognosis, and treatment. These algorithms offer highly accurate predictions but lack transparency, understandability, and actionability. Thus, explainable artificial intelligence (XAI) techniques are needed not only to understand the mechanism behind the decisions made by AI methods and increase user trust but also to broaden the use of AI algorithms in the clinical setting. From the survey of over 150 papers, we explore different AI algorithms that have been applied and contributed to the histopathology image analysis workflow. We first address the workflow of the histopathological process. We present an overview of various learning‐based, XAI, and actionable techniques relevant to deep learning methods in histopathological imaging. We also address the evaluation of XAI methods and the need to ensure their reliability on the field.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2022-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83485803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Open source intelligence extraction for terrorism‐related information: A review 恐怖主义相关信息的开源情报提取:综述
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2022-07-07 DOI: 10.1002/widm.1473
Megha Chaudhary, D. Bansal
In this contemporary era, where a large part of the world population is deluged by extensive use of the internet and social media, terrorists have found it a potential opportunity to execute their vicious plans. They have got a befitting medium to reach out to their targets to spread propaganda, disseminate training content, operate virtually, and further their goals. To restrain such activities, information over the internet in context of terrorism needs to be analyzed to channel it to appropriate measures in combating terrorism. Open Source Intelligence (OSINT) accounts for a felicitous solution to this problem, which is an emerging discipline of leveraging publicly accessible sources of information over the internet by effectively utilizing it to extract intelligence. The process of OSINT extraction is broadly observed to be in three phases (i) Data Acquisition, (ii) Data Enrichment, and (iii) Knowledge Inference. In the context of terrorism, researchers have given noticeable contributions in compliance with these three phases. However, a comprehensive review that delineates these research contributions into an integrated workflow of intelligence extraction has not been found. The paper presents the most current review in OSINT, reflecting how the various state‐of‐the‐art tools and techniques can be applied in extracting terrorism‐related textual information from publicly accessible sources. Various data mining and text analysis‐based techniques, that is, natural language processing, machine learning, and deep learning have been reviewed to extract and evaluate textual data. Additionally, towards the end of the paper, we discuss challenges and gaps observed in different phases of OSINT extraction.
在当今这个时代,世界上很大一部分人都被广泛使用的互联网和社交媒体所淹没,恐怖分子发现这是一个执行他们邪恶计划的潜在机会。他们有一个合适的媒介来接触他们的目标,传播宣传,传播培训内容,虚拟操作,并进一步实现他们的目标。为了限制此类活动,需要对恐怖主义背景下的互联网信息进行分析,以便将其引导到打击恐怖主义的适当措施中。开源情报(OSINT)为这个问题提供了一个合适的解决方案,它是一门新兴的学科,通过有效地利用互联网上可公开访问的信息源来提取情报。OSINT提取的过程大致分为三个阶段(i)数据获取,(ii)数据丰富和(iii)知识推理。在恐怖主义背景下,研究人员在这三个阶段做出了显著的贡献。然而,尚未发现将这些研究贡献描述为集成的情报提取工作流程的全面审查。本文介绍了OSINT中最新的评论,反映了如何将各种最先进的工具和技术应用于从可公开访问的来源提取与恐怖主义相关的文本信息。各种基于数据挖掘和文本分析的技术,即自然语言处理、机器学习和深度学习,已经被用于提取和评估文本数据。此外,在本文的最后,我们讨论了在OSINT提取的不同阶段观察到的挑战和差距。
{"title":"Open source intelligence extraction for terrorism‐related information: A review","authors":"Megha Chaudhary, D. Bansal","doi":"10.1002/widm.1473","DOIUrl":"https://doi.org/10.1002/widm.1473","url":null,"abstract":"In this contemporary era, where a large part of the world population is deluged by extensive use of the internet and social media, terrorists have found it a potential opportunity to execute their vicious plans. They have got a befitting medium to reach out to their targets to spread propaganda, disseminate training content, operate virtually, and further their goals. To restrain such activities, information over the internet in context of terrorism needs to be analyzed to channel it to appropriate measures in combating terrorism. Open Source Intelligence (OSINT) accounts for a felicitous solution to this problem, which is an emerging discipline of leveraging publicly accessible sources of information over the internet by effectively utilizing it to extract intelligence. The process of OSINT extraction is broadly observed to be in three phases (i) Data Acquisition, (ii) Data Enrichment, and (iii) Knowledge Inference. In the context of terrorism, researchers have given noticeable contributions in compliance with these three phases. However, a comprehensive review that delineates these research contributions into an integrated workflow of intelligence extraction has not been found. The paper presents the most current review in OSINT, reflecting how the various state‐of‐the‐art tools and techniques can be applied in extracting terrorism‐related textual information from publicly accessible sources. Various data mining and text analysis‐based techniques, that is, natural language processing, machine learning, and deep learning have been reviewed to extract and evaluate textual data. Additionally, towards the end of the paper, we discuss challenges and gaps observed in different phases of OSINT extraction.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2022-07-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79776571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Corporate investment prediction using a weighted temporal graph neural network 利用加权时间图神经网络进行企业投资预测
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2022-07-05 DOI: 10.1002/widm.1472
Jianing Li, X. Yao
Corporate investment is an important part of corporate financial decision‐making and affects the future profit and value of the corporation. Predicting corporate investment provides great significance for capital market investors to understand the future operation and development of a corporation. Many researchers have studied independent prediction methods. However, individual firms imitate each other's investment in the actual decision‐making process. This phenomenon of investment convergence indicates investment correlation among individual firms, which is ignored in these existing methods. In this article, we first identify key variables in multivariate sequences by our designed two‐way fixed effects model for precise corporate network construction. Then, we propose a weighted temporal graph neural network called weighted temporal graph neural network (WTGNN) for graph learning and investment prediction over the corporate network. WTGNN improves the graph convolution capability by weighted sampling with attention and multivariate time series aggregation. We conducted extensive experiments using real‐world financial reporting data. The results show that WTGNN can achieve excellent graph learning performance and outperforms existing methods in the investment prediction task.
企业投资是企业财务决策的重要组成部分,影响着企业未来的利润和价值。企业投资预测对于资本市场投资者了解企业未来的经营和发展具有重要意义。许多研究者研究了独立预测方法。然而,在实际的决策过程中,各个公司相互模仿对方的投资。这种投资趋同现象反映了个体企业之间的投资相关性,而这种相关性在现有的方法中被忽略了。在本文中,我们首先用我们设计的双向固定效应模型来识别多元序列中的关键变量,以精确构建企业网络。然后,我们提出了一种加权时态图神经网络,称为加权时态图神经网络(WTGNN),用于企业网络的图学习和投资预测。WTGNN通过带注意的加权采样和多元时间序列聚合来提高图卷积能力。我们使用真实世界的财务报告数据进行了广泛的实验。结果表明,WTGNN在投资预测任务中取得了优异的图学习性能,优于现有方法。
{"title":"Corporate investment prediction using a weighted temporal graph neural network","authors":"Jianing Li, X. Yao","doi":"10.1002/widm.1472","DOIUrl":"https://doi.org/10.1002/widm.1472","url":null,"abstract":"Corporate investment is an important part of corporate financial decision‐making and affects the future profit and value of the corporation. Predicting corporate investment provides great significance for capital market investors to understand the future operation and development of a corporation. Many researchers have studied independent prediction methods. However, individual firms imitate each other's investment in the actual decision‐making process. This phenomenon of investment convergence indicates investment correlation among individual firms, which is ignored in these existing methods. In this article, we first identify key variables in multivariate sequences by our designed two‐way fixed effects model for precise corporate network construction. Then, we propose a weighted temporal graph neural network called weighted temporal graph neural network (WTGNN) for graph learning and investment prediction over the corporate network. WTGNN improves the graph convolution capability by weighted sampling with attention and multivariate time series aggregation. We conducted extensive experiments using real‐world financial reporting data. The results show that WTGNN can achieve excellent graph learning performance and outperforms existing methods in the investment prediction task.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2022-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76978723","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A geometric framework for outlier detection in high‐dimensional data 高维数据异常点检测的几何框架
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2022-07-01 DOI: 10.1002/widm.1491
M. Herrmann, Florian Pfisterer, F. Scheipl
Outlier or anomaly detection is an important task in data analysis. We discuss the problem from a geometrical perspective and provide a framework which exploits the metric structure of a data set. Our approach rests on the manifold assumption, that is, that the observed, nominally high‐dimensional data lie on a much lower dimensional manifold and that this intrinsic structure can be inferred with manifold learning methods. We show that exploiting this structure significantly improves the detection of outlying observations in high dimensional data. We also suggest a novel, mathematically precise and widely applicable distinction between distributional and structural outliers based on the geometry and topology of the data manifold that clarifies conceptual ambiguities prevalent throughout the literature. Our experiments focus on functional data as one class of structured high‐dimensional data, but the framework we propose is completely general and we include image and graph data applications. Our results show that the outlier structure of high‐dimensional and non‐tabular data can be detected and visualized using manifold learning methods and quantified using standard outlier scoring methods applied to the manifold embedding vectors.
异常点检测是数据分析中的一项重要任务。我们从几何角度讨论了这个问题,并提供了一个利用数据集度量结构的框架。我们的方法基于流形假设,也就是说,观察到的,名义上的高维数据位于低维流形上,并且这种内在结构可以通过流形学习方法推断出来。我们表明,利用这种结构显着提高了高维数据中离群观测的检测。我们还提出了一种新颖的、数学上精确的、广泛适用的、基于数据流形的几何和拓扑的分布异常值和结构异常值之间的区别,这种区别澄清了整个文献中普遍存在的概念歧义。我们的实验集中于功能数据作为一类结构化高维数据,但我们提出的框架是完全通用的,我们包括图像和图形数据应用。我们的研究结果表明,高维和非表格数据的离群结构可以使用流形学习方法进行检测和可视化,并使用应用于流形嵌入向量的标准离群评分方法进行量化。
{"title":"A geometric framework for outlier detection in high‐dimensional data","authors":"M. Herrmann, Florian Pfisterer, F. Scheipl","doi":"10.1002/widm.1491","DOIUrl":"https://doi.org/10.1002/widm.1491","url":null,"abstract":"Outlier or anomaly detection is an important task in data analysis. We discuss the problem from a geometrical perspective and provide a framework which exploits the metric structure of a data set. Our approach rests on the manifold assumption, that is, that the observed, nominally high‐dimensional data lie on a much lower dimensional manifold and that this intrinsic structure can be inferred with manifold learning methods. We show that exploiting this structure significantly improves the detection of outlying observations in high dimensional data. We also suggest a novel, mathematically precise and widely applicable distinction between distributional and structural outliers based on the geometry and topology of the data manifold that clarifies conceptual ambiguities prevalent throughout the literature. Our experiments focus on functional data as one class of structured high‐dimensional data, but the framework we propose is completely general and we include image and graph data applications. Our results show that the outlier structure of high‐dimensional and non‐tabular data can be detected and visualized using manifold learning methods and quantified using standard outlier scoring methods applied to the manifold embedding vectors.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89413359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Epidemiological challenges in pandemic coronavirus disease (COVID-19): Role of artificial intelligence. 大流行冠状病毒病(COVID-19)的流行病学挑战:人工智能的作用。
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2022-07-01 DOI: 10.1002/widm.1462
Abhijit Dasgupta, Abhisek Bakshi, Srijani Mukherjee, Kuntal Das, Soumyajeet Talukdar, Pratyayee Chatterjee, Sagnik Mondal, Puspita Das, Subhrojit Ghosh, Archisman Som, Pritha Roy, Rima Kundu, Akash Sarkar, Arnab Biswas, Karnelia Paul, Sujit Basak, Krishnendu Manna, Chinmay Saha, Satinath Mukhopadhyay, Nitai P Bhattacharyya, Rajat K De

World is now experiencing a major health calamity due to the coronavirus disease (COVID-19) pandemic, caused by the severe acute respiratory syndrome coronavirus clade 2. The foremost challenge facing the scientific community is to explore the growth and transmission capability of the virus. Use of artificial intelligence (AI), such as deep learning, in (i) rapid disease detection from x-ray or computed tomography (CT) or high-resolution CT (HRCT) images, (ii) accurate prediction of the epidemic patterns and their saturation throughout the globe, (iii) forecasting the disease and psychological impact on the population from social networking data, and (iv) prediction of drug-protein interactions for repurposing the drugs, has attracted much attention. In the present study, we describe the role of various AI-based technologies for rapid and efficient detection from CT images complementing quantitative real-time polymerase chain reaction and immunodiagnostic assays. AI-based technologies to anticipate the current pandemic pattern, prevent the spread of disease, and face mask detection are also discussed. We inspect how the virus transmits depending on different factors. We investigate the deep learning technique to assess the affinity of the most probable drugs to treat COVID-19. This article is categorized under:Application Areas > Health CareAlgorithmic Development > Biological Data MiningTechnologies > Machine Learning.

由于由严重急性呼吸综合征冠状病毒分支2引起的冠状病毒(COVID-19)大流行,世界正在经历一场重大卫生灾难。科学界面临的首要挑战是探索病毒的生长和传播能力。人工智能(AI),如深度学习,在以下方面的应用(i)从x射线或计算机断层扫描(CT)或高分辨率CT (HRCT)图像中快速检测疾病,(ii)准确预测全球流行病模式及其饱和度,(iii)根据社交网络数据预测疾病和对人口的心理影响,以及(iv)预测药物-蛋白质相互作用以重新利用药物,引起了广泛关注。在本研究中,我们描述了各种基于人工智能的技术在CT图像快速有效检测中的作用,补充了定量实时聚合酶链反应和免疫诊断分析。还讨论了基于人工智能的预测当前大流行模式、防止疾病传播和口罩检测的技术。我们根据不同的因素来检查病毒的传播方式。我们研究了深度学习技术来评估最可能治疗COVID-19的药物的亲和力。本文分类如下:应用领域>医疗保健算法开发>生物数据挖掘技术>机器学习。
{"title":"Epidemiological challenges in pandemic coronavirus disease (COVID-19): Role of artificial intelligence.","authors":"Abhijit Dasgupta,&nbsp;Abhisek Bakshi,&nbsp;Srijani Mukherjee,&nbsp;Kuntal Das,&nbsp;Soumyajeet Talukdar,&nbsp;Pratyayee Chatterjee,&nbsp;Sagnik Mondal,&nbsp;Puspita Das,&nbsp;Subhrojit Ghosh,&nbsp;Archisman Som,&nbsp;Pritha Roy,&nbsp;Rima Kundu,&nbsp;Akash Sarkar,&nbsp;Arnab Biswas,&nbsp;Karnelia Paul,&nbsp;Sujit Basak,&nbsp;Krishnendu Manna,&nbsp;Chinmay Saha,&nbsp;Satinath Mukhopadhyay,&nbsp;Nitai P Bhattacharyya,&nbsp;Rajat K De","doi":"10.1002/widm.1462","DOIUrl":"https://doi.org/10.1002/widm.1462","url":null,"abstract":"<p><p>World is now experiencing a major health calamity due to the coronavirus disease (COVID-19) pandemic, caused by the severe acute respiratory syndrome coronavirus clade 2. The foremost challenge facing the scientific community is to explore the growth and transmission capability of the virus. Use of artificial intelligence (AI), such as deep learning, in (i) rapid disease detection from x-ray or computed tomography (CT) or high-resolution CT (HRCT) images, (ii) accurate prediction of the epidemic patterns and their saturation throughout the globe, (iii) forecasting the disease and psychological impact on the population from social networking data, and (iv) prediction of drug-protein interactions for repurposing the drugs, has attracted much attention. In the present study, we describe the role of various AI-based technologies for rapid and efficient detection from CT images complementing quantitative real-time polymerase chain reaction and immunodiagnostic assays. AI-based technologies to anticipate the current pandemic pattern, prevent the spread of disease, and face mask detection are also discussed. We inspect how the virus transmits depending on different factors. We investigate the deep learning technique to assess the affinity of the most probable drugs to treat COVID-19. This article is categorized under:Application Areas > Health CareAlgorithmic Development > Biological Data MiningTechnologies > Machine Learning.</p>","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9350133/pdf/WIDM-12-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10603683","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Privacy protection in smart meters using homomorphic encryption: An overview 使用同态加密的智能电表中的隐私保护:概述
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2022-06-23 DOI: 10.1002/widm.1469
Zita Abreu, Lucas Pereira
This article presents an overview of the literature on privacy protection in smart meters with a particular focus on homomorphic encryption (HE). Firstly, we introduce the concept of smart meters, the context in which they are inserted the main concerns and oppositions inherent to its use. Later, an overview of privacy protection is presented, emphasizing the need to safeguard the privacy of smart‐meter users by identifying, describing, and comparing the main approaches that seek to address this problem. Then, two privacy protection approaches based on HE are presented in more detail and additionally we present two possible application scenarios. Finally, the article concludes with a brief overview of the unsolved challenges in HE and the most promising future research directions.
本文概述了智能电表中隐私保护的文献,特别关注同态加密(HE)。首先,我们介绍了智能电表的概念,它们被插入的背景,其使用固有的主要问题和反对意见。随后,对隐私保护进行概述,强调需要通过识别、描述和比较寻求解决这一问题的主要方法来保护智能电表用户的隐私。然后,详细介绍了基于HE的两种隐私保护方法,并提出了两种可能的应用场景。最后,本文简要概述了高等教育尚未解决的挑战和未来最有希望的研究方向。
{"title":"Privacy protection in smart meters using homomorphic encryption: An overview","authors":"Zita Abreu, Lucas Pereira","doi":"10.1002/widm.1469","DOIUrl":"https://doi.org/10.1002/widm.1469","url":null,"abstract":"This article presents an overview of the literature on privacy protection in smart meters with a particular focus on homomorphic encryption (HE). Firstly, we introduce the concept of smart meters, the context in which they are inserted the main concerns and oppositions inherent to its use. Later, an overview of privacy protection is presented, emphasizing the need to safeguard the privacy of smart‐meter users by identifying, describing, and comparing the main approaches that seek to address this problem. Then, two privacy protection approaches based on HE are presented in more detail and additionally we present two possible application scenarios. Finally, the article concludes with a brief overview of the unsolved challenges in HE and the most promising future research directions.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2022-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89177868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Data mining in predictive maintenance systems: A taxonomy and systematic review 预测性维护系统中的数据挖掘:分类和系统回顾
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2022-06-22 DOI: 10.1002/widm.1471
Aurora Esteban, A. Zafra, Sebastián Ventura
Predictive maintenance is a field of study whose main objective is to optimize the timing and type of maintenance to perform on various industrial systems. This aim involves maximizing the availability time of the monitored system and minimizing the number of resources used in maintenance. Predictive maintenance is currently undergoing a revolution thanks to advances in industrial systems monitoring within the Industry 4.0 paradigm. Likewise, advances in artificial intelligence and data mining allow the processing of a great amount of data to provide more accurate and advanced predictive models. In this context, many actors have become interested in predictive maintenance research, becoming one of the most active areas of research in computing, where academia and industry converge. The objective of this paper is to conduct a systematic literature review that provides an overview of the current state of research concerning predictive maintenance from a data mining perspective. The review presents a first taxonomy that implies different phases considered in any data mining process to solve a predictive maintenance problem, relating the predictive maintenance tasks with the main data mining tasks to solve them. Finally, the paper presents significant challenges and future research directions in terms of the potential of data mining applied to predictive maintenance.
预测性维护是一个研究领域,其主要目标是优化各种工业系统的维护时间和类型。这个目标包括最大化被监控系统的可用时间,最小化维护中使用的资源数量。由于工业4.0模式下工业系统监控的进步,预测性维护目前正在经历一场革命。同样,人工智能和数据挖掘的进步允许处理大量数据,以提供更准确和先进的预测模型。在这种情况下,许多参与者都对预测性维护研究感兴趣,这成为学术界和工业界融合的计算研究中最活跃的领域之一。本文的目的是进行系统的文献综述,从数据挖掘的角度对预测性维护的研究现状进行概述。本文提出了第一个分类法,该分类法暗示了在解决预测性维护问题的任何数据挖掘过程中考虑的不同阶段,将预测性维护任务与解决预测性维护问题的主要数据挖掘任务联系起来。最后,就数据挖掘在预测性维护中的应用潜力提出了重大挑战和未来的研究方向。
{"title":"Data mining in predictive maintenance systems: A taxonomy and systematic review","authors":"Aurora Esteban, A. Zafra, Sebastián Ventura","doi":"10.1002/widm.1471","DOIUrl":"https://doi.org/10.1002/widm.1471","url":null,"abstract":"Predictive maintenance is a field of study whose main objective is to optimize the timing and type of maintenance to perform on various industrial systems. This aim involves maximizing the availability time of the monitored system and minimizing the number of resources used in maintenance. Predictive maintenance is currently undergoing a revolution thanks to advances in industrial systems monitoring within the Industry 4.0 paradigm. Likewise, advances in artificial intelligence and data mining allow the processing of a great amount of data to provide more accurate and advanced predictive models. In this context, many actors have become interested in predictive maintenance research, becoming one of the most active areas of research in computing, where academia and industry converge. The objective of this paper is to conduct a systematic literature review that provides an overview of the current state of research concerning predictive maintenance from a data mining perspective. The review presents a first taxonomy that implies different phases considered in any data mining process to solve a predictive maintenance problem, relating the predictive maintenance tasks with the main data mining tasks to solve them. Finally, the paper presents significant challenges and future research directions in terms of the potential of data mining applied to predictive maintenance.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2022-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88443029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
期刊
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1