首页 > 最新文献

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery最新文献

英文 中文
Multimodal sentimental analysis for social media applications: A comprehensive review 社交媒体应用的多模态情感分析:综合综述
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-05-31 DOI: 10.1002/widm.1415
Ganesh Chandrasekaran, Tu N. Nguyen, Jude Hemanth D.
The analysis of sentiments is essential in identifying and classifying opinions regarding a source material that is, a product or service. The analysis of these sentiments finds a variety of applications like product reviews, opinion polls, movie reviews on YouTube, news video analysis, and health care applications including stress and depression analysis. The traditional approach of sentiment analysis which is based on text involves the collection of large textual data and different algorithms to extract the sentiment information from it. But multimodal sentimental analysis provides methods to carry out opinion analysis based on the combination of video, audio, and text which goes a way beyond the conventional text‐based sentimental analysis in understanding human behaviors. The remarkable increase in the use of social media provides a large collection of multimodal data that reflects the user's sentiment on certain aspects. This multimodal sentimental analysis approach helps in classifying the polarity (positive, negative, and neutral) of the individual sentiments. Our work aims to present a survey of recent developments in analyzing the multimodal sentiments (involving text, audio, and video/image) which involve human–machine interaction and challenges involved in analyzing them. A detailed survey on sentimental dataset, feature extraction algorithms, data fusion methods, and efficiency of different classification techniques are presented in this work.
情感分析对于识别和分类有关源材料(即产品或服务)的意见至关重要。对这些情绪的分析发现了各种各样的应用,如产品评论、民意调查、YouTube上的电影评论、新闻视频分析,以及包括压力和抑郁分析在内的医疗保健应用。传统的基于文本的情感分析方法涉及大量文本数据的收集和不同的算法从中提取情感信息。但多模态情感分析提供了基于视频、音频和文本结合的观点分析方法,这在理解人类行为方面超越了传统的基于文本的情感分析。社交媒体使用的显著增加提供了大量的多模式数据,这些数据反映了用户在某些方面的情绪。这种多模态情感分析方法有助于对个人情感的极性(积极、消极和中性)进行分类。我们的工作旨在对涉及人机交互的多模态情感(包括文本、音频和视频/图像)分析的最新发展以及分析它们所面临的挑战进行调查。本文详细介绍了情感数据集、特征提取算法、数据融合方法以及不同分类技术的效率。
{"title":"Multimodal sentimental analysis for social media applications: A comprehensive review","authors":"Ganesh Chandrasekaran, Tu N. Nguyen, Jude Hemanth D.","doi":"10.1002/widm.1415","DOIUrl":"https://doi.org/10.1002/widm.1415","url":null,"abstract":"The analysis of sentiments is essential in identifying and classifying opinions regarding a source material that is, a product or service. The analysis of these sentiments finds a variety of applications like product reviews, opinion polls, movie reviews on YouTube, news video analysis, and health care applications including stress and depression analysis. The traditional approach of sentiment analysis which is based on text involves the collection of large textual data and different algorithms to extract the sentiment information from it. But multimodal sentimental analysis provides methods to carry out opinion analysis based on the combination of video, audio, and text which goes a way beyond the conventional text‐based sentimental analysis in understanding human behaviors. The remarkable increase in the use of social media provides a large collection of multimodal data that reflects the user's sentiment on certain aspects. This multimodal sentimental analysis approach helps in classifying the polarity (positive, negative, and neutral) of the individual sentiments. Our work aims to present a survey of recent developments in analyzing the multimodal sentiments (involving text, audio, and video/image) which involve human–machine interaction and challenges involved in analyzing them. A detailed survey on sentimental dataset, feature extraction algorithms, data fusion methods, and efficiency of different classification techniques are presented in this work.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"22 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89843715","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 46
Big data analytics in single‐cell transcriptomics: Five grand opportunities 单细胞转录组学中的大数据分析:五大机遇
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-05-11 DOI: 10.1002/widm.1414
Namrata Bhattacharya, C. Nelson, Gaurav Ahuja, Debarka Sengupta
Single‐cell omics technologies provide biologists with a new dimension for systematically dissecting the underlying complexities within biological systems. These powerful technologies have triggered a wave of rapid development and deployment of new computational tools capable of teasing out critical insights by analysis of large volumes of omics data at single‐cell resolution. Some of the key advancements include identifying molecular signatures imparting cellular identities, their evolutionary relationships, identifying novel and rare cell‐types, and establishing a direct link between cellular genotypes and phenotypes. With the sharp increase in the throughput of single‐cell platforms, the demand for efficient computational algorithms has become prominent. As such, devising novel computational strategies is critical to ensure optimal use of this wealth of molecular data for gaining newer insights into cellular biology. Here we discuss some of the grand opportunities of computational breakthroughs which would accelerate single‐cell research. These are: predicting cellular identity, single‐cell guided in silico drug screening for precision medicine, transfer learning methods to handle sparsity and heterogeneity of expression data, establishing genotype–phenotype relationships at single‐cell resolution, and developing computational platforms for handling big data.
单细胞组学技术为生物学家系统地剖析生物系统内潜在的复杂性提供了一个新的维度。这些强大的技术引发了一波新的计算工具的快速发展和部署,这些工具能够通过分析单细胞分辨率的大量组学数据来梳理出关键的见解。一些关键的进展包括识别分子特征,赋予细胞身份,它们的进化关系,识别新的和罕见的细胞类型,并建立细胞基因型和表型之间的直接联系。随着单细胞平台吞吐量的急剧增加,对高效计算算法的需求日益突出。因此,设计新颖的计算策略对于确保最佳地利用这些丰富的分子数据以获得对细胞生物学的新见解至关重要。在这里,我们讨论了一些计算突破的重大机会,这些突破将加速单细胞研究。这些包括:预测细胞身份,用于精准医学的单细胞引导的硅药物筛选,处理表达数据的稀疏性和异质性的迁移学习方法,在单细胞分辨率下建立基因型-表型关系,以及开发处理大数据的计算平台。
{"title":"Big data analytics in single‐cell transcriptomics: Five grand opportunities","authors":"Namrata Bhattacharya, C. Nelson, Gaurav Ahuja, Debarka Sengupta","doi":"10.1002/widm.1414","DOIUrl":"https://doi.org/10.1002/widm.1414","url":null,"abstract":"Single‐cell omics technologies provide biologists with a new dimension for systematically dissecting the underlying complexities within biological systems. These powerful technologies have triggered a wave of rapid development and deployment of new computational tools capable of teasing out critical insights by analysis of large volumes of omics data at single‐cell resolution. Some of the key advancements include identifying molecular signatures imparting cellular identities, their evolutionary relationships, identifying novel and rare cell‐types, and establishing a direct link between cellular genotypes and phenotypes. With the sharp increase in the throughput of single‐cell platforms, the demand for efficient computational algorithms has become prominent. As such, devising novel computational strategies is critical to ensure optimal use of this wealth of molecular data for gaining newer insights into cellular biology. Here we discuss some of the grand opportunities of computational breakthroughs which would accelerate single‐cell research. These are: predicting cellular identity, single‐cell guided in silico drug screening for precision medicine, transfer learning methods to handle sparsity and heterogeneity of expression data, establishing genotype–phenotype relationships at single‐cell resolution, and developing computational platforms for handling big data.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"61 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-05-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78700399","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Incorporating domain ontology information into clustering in heterogeneous networks 领域本体信息在异构网络聚类中的应用
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-05-10 DOI: 10.1002/widm.1413
Yue Huang
Clustering of structure‐rich heterogeneous information networks composed of multiple types of objects and relationships, which has become a challenge in data mining. Most of the existing clustering heterogeneous network methods focus on the internal information of the dataset while ignoring the domain knowledge outside the dataset. However, in real‐world scenarios, domain knowledge can often offer valuable information for clustering. In this study, we propose a three‐layer model OntoHeteClus, which is able to cluster multitype objects in star‐structured heterogeneous networks by considering both the dataset itself and the background information quantified via the ontology. OntoHeteClus first evaluates the similarity between central objects according to formalized domain ontology information, based on which central objects are subsequently clustered. Finally, attribute objects are clustered according to the central object clustering result. A numerical example is presented to illustrate the modeling concept and working principle of the proposed method, and experiments on a real‐world dataset demonstrate the effectiveness of the proposed algorithms.
由多种类型的对象和关系组成的结构丰富的异构信息网络的聚类已成为数据挖掘中的一个挑战。现有的聚类异构网络方法大多关注数据集的内部信息,而忽略了数据集外部的领域知识。然而,在现实世界的场景中,领域知识通常可以为聚类提供有价值的信息。在这项研究中,我们提出了一个三层模型OntoHeteClus,该模型通过考虑数据集本身和通过本体量化的背景信息,能够在星形结构异构网络中聚类多类型对象。OntoHeteClus首先根据形式化的领域本体信息评估中心对象之间的相似性,然后在此基础上对中心对象进行聚类。最后,根据中心对象聚类结果对属性对象进行聚类。给出了一个数值算例来说明该方法的建模概念和工作原理,并在一个真实数据集上进行了实验,验证了该算法的有效性。
{"title":"Incorporating domain ontology information into clustering in heterogeneous networks","authors":"Yue Huang","doi":"10.1002/widm.1413","DOIUrl":"https://doi.org/10.1002/widm.1413","url":null,"abstract":"Clustering of structure‐rich heterogeneous information networks composed of multiple types of objects and relationships, which has become a challenge in data mining. Most of the existing clustering heterogeneous network methods focus on the internal information of the dataset while ignoring the domain knowledge outside the dataset. However, in real‐world scenarios, domain knowledge can often offer valuable information for clustering. In this study, we propose a three‐layer model OntoHeteClus, which is able to cluster multitype objects in star‐structured heterogeneous networks by considering both the dataset itself and the background information quantified via the ontology. OntoHeteClus first evaluates the similarity between central objects according to formalized domain ontology information, based on which central objects are subsequently clustered. Finally, attribute objects are clustered according to the central object clustering result. A numerical example is presented to illustrate the modeling concept and working principle of the proposed method, and experiments on a real‐world dataset demonstrate the effectiveness of the proposed algorithms.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"14 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78636892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automatic segmentation to characterize anthropometric parameters and cardiovascular indicators in children 自动分割表征儿童人体测量参数和心血管指标
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-05-03 DOI: 10.1002/widm.1411
D. G. Goroso, Alvaro Fraga, Michel Macedo, Carla Fernanda de Miranda Rodrigues, Bruno Mendes de Oliveira Silva, W. Watanabe, D. P. D. Silva, R. R. Silva, J. Puglisi, James Marcin, M. Dharmar
A new predictive model to classify childhood obesity was implemented using machine learning techniques. The first step was to calculate the most relevant anthropomorphic and cardiovascular parameters of 187 children through principal component analysis (PCA) and cluster classification. Then Naïve‐Bayes method classified these children into six groups using anthropometric Z Score, measurements of abdominal obesity, and arterial pressure: Group I (20.32% of total): composed mainly by accentuated malnutrition and malnutrition children; Group II (36.36%): composed primarily by eutrophic children; Group III (21.4%): constituted by eutrophic plus overweight children; Group IV (14.97%): comprised mainly by overweight and obese children; Group V (5.34%): Obese and overweight children; and Group VI (1.6%): obese at risk children. From Group II to VI, the proportion of pre‐hypertensive and hypertensive children increased monotonically from 5 to 33%. This classification modes was tested on 66 children that were not originally included with a success rate of 97%. This predictive model will facilitate future longitudinal studies of obesity in children and will help plan interventions and evaluations of their results.
使用机器学习技术实现了一种新的预测模型来对儿童肥胖进行分类。第一步是通过主成分分析(PCA)和聚类分类计算187名儿童最相关的人格化和心血管参数。然后Naïve‐Bayes方法根据人体测量Z评分、腹部肥胖和动脉压将这些儿童分为六组:第一组(占总数的20.32%):主要由重度营养不良和营养不良儿童组成;II组(36.36%):主要由富营养化儿童组成;III组(21.4%):由富营养化加超重儿童组成;第四组(14.97%):主要由超重和肥胖儿童组成;V组(5.34%):肥胖和超重儿童;第六组(1.6%):肥胖高危儿童。从II组到VI组,高血压前期和高血压患儿的比例从5%单调增加到33%。这种分类模式对66名未被纳入的儿童进行了测试,成功率为97%。这一预测模型将促进未来儿童肥胖的纵向研究,并将有助于计划干预措施和评估其结果。
{"title":"Automatic segmentation to characterize anthropometric parameters and cardiovascular indicators in children","authors":"D. G. Goroso, Alvaro Fraga, Michel Macedo, Carla Fernanda de Miranda Rodrigues, Bruno Mendes de Oliveira Silva, W. Watanabe, D. P. D. Silva, R. R. Silva, J. Puglisi, James Marcin, M. Dharmar","doi":"10.1002/widm.1411","DOIUrl":"https://doi.org/10.1002/widm.1411","url":null,"abstract":"A new predictive model to classify childhood obesity was implemented using machine learning techniques. The first step was to calculate the most relevant anthropomorphic and cardiovascular parameters of 187 children through principal component analysis (PCA) and cluster classification. Then Naïve‐Bayes method classified these children into six groups using anthropometric Z Score, measurements of abdominal obesity, and arterial pressure: Group I (20.32% of total): composed mainly by accentuated malnutrition and malnutrition children; Group II (36.36%): composed primarily by eutrophic children; Group III (21.4%): constituted by eutrophic plus overweight children; Group IV (14.97%): comprised mainly by overweight and obese children; Group V (5.34%): Obese and overweight children; and Group VI (1.6%): obese at risk children. From Group II to VI, the proportion of pre‐hypertensive and hypertensive children increased monotonically from 5 to 33%. This classification modes was tested on 66 children that were not originally included with a success rate of 97%. This predictive model will facilitate future longitudinal studies of obesity in children and will help plan interventions and evaluations of their results.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"07 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79987951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The development of regional smart energy systems in the World and China: The concepts, practices, and a new perspective 世界与中国区域智能能源系统的发展:概念、实践与新视角
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-04-26 DOI: 10.1002/widm.1409
Yunlong Zhao, Linwei Ma, Zheng Li, W. Ni
To realize a low‐carbon and sustainable energy transition, smart energy systems (SES) assisted by data and information technology are regarded as promising solutions for energy system integration (ESI) and have been put into regional practices. However, there is still lacking attention on the development of multiregional smart energy systems (MRSES), which include three or more areas. This article aims to analyze concepts and practices of SES and enlighten a new perspective of MRSES. The conceptual evolution and regional practices of SES in the world were first reviewed, and it was found out that SES does not means the end of the conceptual evolution of ESI. Current regional practices are still limited in small areas, being typically remote areas, urban areas, and industrial areas. Secondly, the review of concepts and practices of SES in China indicate that the understanding of SES concepts are still confusing in national scale, and the apparent regional disparity in China is calling attention on the development of MRSES. Finally, a preliminary concept of MRSES was proposed and its perspective in China and the world, which is composed by four connected sub‐SES and named as a coordinated development of “smart energy farms + smart energy towns + smart energy industrial parks + smart energy transportation networks” was discussed. The former three sub‐SES are identified according to various economic characteristics and resources endowment in different regions, and they are all connected by the forth sub‐SES. Although this concept is still preliminary, it provides an imagination of future large‐scale SES, and the realization of this concept needs further breakthrough of data technology.
为了实现低碳和可持续的能源转型,以数据和信息技术为辅助的智能能源系统(SES)被认为是能源系统集成(ESI)的有前途的解决方案,并已投入到区域实践中。然而,对包括三个或更多领域的多区域智能能源系统(MRSES)的发展仍然缺乏关注。本文旨在分析社会经济评价的概念和实践,启发社会经济评价的新视角。本文首先回顾了全球范围内SES的概念演变和区域实践,发现SES并不意味着ESI概念演变的终结。目前的区域做法仍然局限于小范围,通常是偏远地区、城市地区和工业区。其次,通过对中国SES概念和实践的回顾,可以发现在全国范围内对SES概念的理解还很混乱,而且中国明显的区域差异正在引起人们对MRSES发展的关注。最后,提出了“智慧能源农场+智慧能源小镇+智慧能源产业园+智慧能源交通网络”协同发展的“智慧能源农场+智慧能源小镇+智慧能源产业园+智慧能源交通网络”的概念,并对其在中国和世界的发展前景进行了初步探讨。前三个子经济系统是根据不同地区的经济特征和资源禀赋来划分的,它们都由第四子经济系统连接起来。虽然这一概念还处于初级阶段,但它提供了对未来大规模SES的想象,这一概念的实现需要数据技术的进一步突破。
{"title":"The development of regional smart energy systems in the World and China: The concepts, practices, and a new perspective","authors":"Yunlong Zhao, Linwei Ma, Zheng Li, W. Ni","doi":"10.1002/widm.1409","DOIUrl":"https://doi.org/10.1002/widm.1409","url":null,"abstract":"To realize a low‐carbon and sustainable energy transition, smart energy systems (SES) assisted by data and information technology are regarded as promising solutions for energy system integration (ESI) and have been put into regional practices. However, there is still lacking attention on the development of multiregional smart energy systems (MRSES), which include three or more areas. This article aims to analyze concepts and practices of SES and enlighten a new perspective of MRSES. The conceptual evolution and regional practices of SES in the world were first reviewed, and it was found out that SES does not means the end of the conceptual evolution of ESI. Current regional practices are still limited in small areas, being typically remote areas, urban areas, and industrial areas. Secondly, the review of concepts and practices of SES in China indicate that the understanding of SES concepts are still confusing in national scale, and the apparent regional disparity in China is calling attention on the development of MRSES. Finally, a preliminary concept of MRSES was proposed and its perspective in China and the world, which is composed by four connected sub‐SES and named as a coordinated development of “smart energy farms + smart energy towns + smart energy industrial parks + smart energy transportation networks” was discussed. The former three sub‐SES are identified according to various economic characteristics and resources endowment in different regions, and they are all connected by the forth sub‐SES. Although this concept is still preliminary, it provides an imagination of future large‐scale SES, and the realization of this concept needs further breakthrough of data technology.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"34 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74657713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
A 2021 update on cancer image analytics with deep learning 2021年关于深度学习癌症图像分析的最新进展
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-04-22 DOI: 10.1002/widm.1410
N. Kurian, A. Sethi, Anil Reddy Konduru, A. Mahajan, S. Rane
Deep learning (DL)‐based interpretation of medical images has reached a critical juncture of expanding outside research projects into translational ones, and is ready to make its way to the clinics. Advances over the last decade in data availability, DL techniques, as well as computing capabilities have accelerated this journey. Through this journey, today we have a better understanding of the challenges to and pitfalls of wider adoption of DL into clinical care, which, according to us, should and will drive the advances in this field in the next few years. The most important among these challenges are the lack of an appropriately digitized environment within healthcare institutions, the lack of adequate open and representative datasets on which DL algorithms can be trained and tested, and the lack of robustness of widely used DL training algorithms to certain pervasive pathological characteristics of medical images and repositories. In this review, we provide an overview of the role of imaging in oncology, the different techniques that are shaping the way DL algorithms are being made ready for clinical use, and also the problems that DL techniques still need to address before DL can find a home in clinics. Finally, we also provide a summary of how DL can potentially drive the adoption of digital pathology, vendor neutral archives, and picture archival and communication systems. We caution that the respective researchers may find the coverage of their own fields to be at a high‐level. This is so by design as this format is meant to only introduce those looking in from outside of deep learning and medical research, respectively, to gain an appreciation for the main concerns and limitations of these two fields instead of telling them something new about their own.
基于深度学习(DL)的医学图像解释已经达到了将外部研究项目扩展到翻译项目的关键时刻,并准备好进入诊所。过去十年中,数据可用性、深度学习技术以及计算能力的进步加速了这一进程。通过这段旅程,今天我们对在临床护理中广泛采用深度学习的挑战和陷阱有了更好的了解,我们认为,这应该并且将在未来几年内推动这一领域的进步。这些挑战中最重要的是医疗机构缺乏适当的数字化环境,缺乏足够的开放和代表性数据集,可以在其上训练和测试DL算法,以及广泛使用的DL训练算法对医学图像和存储库的某些普遍病理特征缺乏鲁棒性。在这篇综述中,我们概述了成像在肿瘤学中的作用,不同的技术正在塑造DL算法为临床使用做准备的方式,以及在DL技术在临床中找到一个家之前DL技术仍然需要解决的问题。最后,我们还总结了深度学习如何潜在地推动数字病理学、供应商中立档案、图片存档和通信系统的采用。我们提醒,各自的研究人员可能会发现他们自己领域的覆盖率处于高水平。这是经过设计的,因为这种形式的目的是只介绍那些从深度学习和医学研究之外的人,分别了解这两个领域的主要关注点和局限性,而不是告诉他们一些关于他们自己的新东西。
{"title":"A 2021 update on cancer image analytics with deep learning","authors":"N. Kurian, A. Sethi, Anil Reddy Konduru, A. Mahajan, S. Rane","doi":"10.1002/widm.1410","DOIUrl":"https://doi.org/10.1002/widm.1410","url":null,"abstract":"Deep learning (DL)‐based interpretation of medical images has reached a critical juncture of expanding outside research projects into translational ones, and is ready to make its way to the clinics. Advances over the last decade in data availability, DL techniques, as well as computing capabilities have accelerated this journey. Through this journey, today we have a better understanding of the challenges to and pitfalls of wider adoption of DL into clinical care, which, according to us, should and will drive the advances in this field in the next few years. The most important among these challenges are the lack of an appropriately digitized environment within healthcare institutions, the lack of adequate open and representative datasets on which DL algorithms can be trained and tested, and the lack of robustness of widely used DL training algorithms to certain pervasive pathological characteristics of medical images and repositories. In this review, we provide an overview of the role of imaging in oncology, the different techniques that are shaping the way DL algorithms are being made ready for clinical use, and also the problems that DL techniques still need to address before DL can find a home in clinics. Finally, we also provide a summary of how DL can potentially drive the adoption of digital pathology, vendor neutral archives, and picture archival and communication systems. We caution that the respective researchers may find the coverage of their own fields to be at a high‐level. This is so by design as this format is meant to only introduce those looking in from outside of deep learning and medical research, respectively, to gain an appreciation for the main concerns and limitations of these two fields instead of telling them something new about their own.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"19 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83549899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
Table understanding approaches for extracting knowledge from heterogeneous tables 从异构表中提取知识的表理解方法
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-03-28 DOI: 10.1002/widm.1407
Sara Bonfitto, E. Casiraghi, M. Mesiti
Table understanding methods extract, transform, and interpret the information contained in tabular data embedded in documents/files of different formats. Such automatic understanding would allow to exploit tabular information with the aim of accurately answering queries, or integrating heterogeneous repositories of information in a common knowledge base, or exchanging information among different sources. The purpose of this survey is to provide a comprehensive analysis of the research efforts so far devoted to the problem of table understanding and to describe systems that support the transformation of heterogeneous tables into meaningful information.
表理解方法提取、转换和解释嵌入在不同格式的文档/文件中的表格数据中包含的信息。这种自动理解将允许利用表格信息,以准确地回答查询,或在公共知识库中集成异构信息库,或在不同来源之间交换信息。本调查的目的是对迄今为止致力于表理解问题的研究工作进行全面分析,并描述支持将异构表转换为有意义信息的系统。
{"title":"Table understanding approaches for extracting knowledge from heterogeneous tables","authors":"Sara Bonfitto, E. Casiraghi, M. Mesiti","doi":"10.1002/widm.1407","DOIUrl":"https://doi.org/10.1002/widm.1407","url":null,"abstract":"Table understanding methods extract, transform, and interpret the information contained in tabular data embedded in documents/files of different formats. Such automatic understanding would allow to exploit tabular information with the aim of accurately answering queries, or integrating heterogeneous repositories of information in a common knowledge base, or exchanging information among different sources. The purpose of this survey is to provide a comprehensive analysis of the research efforts so far devoted to the problem of table understanding and to describe systems that support the transformation of heterogeneous tables into meaningful information.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"686 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76876799","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 16
Data mining for energy systems: Review and prospect 能源系统数据挖掘:回顾与展望
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-03-24 DOI: 10.1002/widm.1406
Wenxuan Liu, Junhua Zhao, Dianhui Wang
An in‐depth study on big data mining is urgently needed for the next‐generation energy systems, which are characterized by a deep integration of cyber, physical, and social components. This paper presents an initial discussion on big data mining and its applications in intelligent energy systems. New progress in big data mining, such as deep learning, transfer learning, randomized learning, granular computing, and multisource data fusion, is introduced first. Some applications of data mining in energy systems, such as load forecasting and modeling, integrated power and transportation system, and electricity market forecasting and simulation, are discussed then. Moreover, some research problems in energy system data mining, such as cyber–physical–social system modeling and super‐resolution perception for smart meter data, which require further attention in the future, are also discussed.
下一代能源系统的特点是网络、物理和社会组件的深度集成,迫切需要对大数据挖掘进行深入研究。本文对大数据挖掘及其在智能能源系统中的应用进行了初步探讨。首先介绍了大数据挖掘的新进展,如深度学习、迁移学习、随机学习、颗粒计算、多源数据融合等。讨论了数据挖掘在负荷预测与建模、电力运输一体化系统、电力市场预测与仿真等能源系统中的应用。此外,还讨论了能源系统数据挖掘中需要进一步关注的一些研究问题,如网络-物理-社会系统建模和智能电表数据的超分辨率感知。
{"title":"Data mining for energy systems: Review and prospect","authors":"Wenxuan Liu, Junhua Zhao, Dianhui Wang","doi":"10.1002/widm.1406","DOIUrl":"https://doi.org/10.1002/widm.1406","url":null,"abstract":"An in‐depth study on big data mining is urgently needed for the next‐generation energy systems, which are characterized by a deep integration of cyber, physical, and social components. This paper presents an initial discussion on big data mining and its applications in intelligent energy systems. New progress in big data mining, such as deep learning, transfer learning, randomized learning, granular computing, and multisource data fusion, is introduced first. Some applications of data mining in energy systems, such as load forecasting and modeling, integrated power and transportation system, and electricity market forecasting and simulation, are discussed then. Moreover, some research problems in energy system data mining, such as cyber–physical–social system modeling and super‐resolution perception for smart meter data, which require further attention in the future, are also discussed.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"1 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77079889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Foundational ontologies, ontology‐driven conceptual modeling, and their multiple benefits to data mining 基础本体、本体驱动的概念建模及其对数据挖掘的多重好处
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-03-24 DOI: 10.1002/widm.1408
G. Amaral, F. Baião, G. Guizzardi
For many years, the role played by domain knowledge in all stages of knowledge discovery has been recognized. However, the real‐world semantics embedded in data is often still not fully considered in traditional data mining methods. In this article, we argue that the quality of data mining results is directly related to the extent that they reflect important properties of real‐world entities represented therein. Analyzing and characterizing the nature of these entities is the very business of the area of formal ontology. We briefly elaborate on two particular types of artifacts produced by this area: foundational ontologies and ontology‐driven conceptual modeling languages grounded on them. We then elaborate on the benefits they can bring to several activities in a data mining process.
多年来,领域知识在知识发现的各个阶段所起的作用已得到公认。然而,在传统的数据挖掘方法中,嵌入在数据中的真实世界语义往往仍然没有得到充分的考虑。在本文中,我们认为数据挖掘结果的质量与它们反映其中所代表的现实世界实体的重要属性的程度直接相关。分析和描述这些实体的性质是形式本体领域的重要工作。我们简要地阐述了该领域产生的两种特定类型的工件:基础本体和基于它们的本体驱动的概念建模语言。然后详细说明它们可以为数据挖掘过程中的几个活动带来的好处。
{"title":"Foundational ontologies, ontology‐driven conceptual modeling, and their multiple benefits to data mining","authors":"G. Amaral, F. Baião, G. Guizzardi","doi":"10.1002/widm.1408","DOIUrl":"https://doi.org/10.1002/widm.1408","url":null,"abstract":"For many years, the role played by domain knowledge in all stages of knowledge discovery has been recognized. However, the real‐world semantics embedded in data is often still not fully considered in traditional data mining methods. In this article, we argue that the quality of data mining results is directly related to the extent that they reflect important properties of real‐world entities represented therein. Analyzing and characterizing the nature of these entities is the very business of the area of formal ontology. We briefly elaborate on two particular types of artifacts produced by this area: foundational ontologies and ontology‐driven conceptual modeling languages grounded on them. We then elaborate on the benefits they can bring to several activities in a data mining process.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"15 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87055707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Validation of cluster analysis results on validation data: A systematic framework 验证数据上聚类分析结果的验证:一个系统框架
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-03-01 DOI: 10.1002/widm.1444
Theresa Ullmann, C. Hennig, A. Boulesteix
Cluster analysis refers to a wide range of data analytic techniques for class discovery and is popular in many application fields. To assess the quality of a clustering result, different cluster validation procedures have been proposed in the literature. While there is extensive work on classical validation techniques, such as internal and external validation, less attention has been given to validating and replicating a clustering result using a validation dataset. Such a dataset may be part of the original dataset, which is separated before analysis begins, or it could be an independently collected dataset. We present a systematic, structured review of the existing literature about this topic. For this purpose, we outline a formal framework that covers most existing approaches for validating clustering results on validation data. In particular, we review classical validation techniques such as internal and external validation, stability analysis, and visual validation, and show how they can be interpreted in terms of our framework. We define and formalize different types of validation of clustering results on a validation dataset, and give examples of how clustering studies from the applied literature that used a validation dataset can be seen as instances of our framework.
聚类分析是一种广泛的用于类发现的数据分析技术,在许多应用领域都很流行。为了评估聚类结果的质量,文献中提出了不同的聚类验证程序。虽然在经典验证技术(如内部和外部验证)上有大量的工作,但使用验证数据集验证和复制聚类结果的关注较少。这样的数据集可能是原始数据集的一部分,在分析开始之前被分离,或者它可以是一个独立收集的数据集。我们提出了一个系统的,结构化的审查现有的文献关于这一主题。为此,我们概述了一个正式的框架,它涵盖了大多数现有的在验证数据上验证聚类结果的方法。特别地,我们回顾了经典的验证技术,如内部和外部验证、稳定性分析和可视化验证,并展示了如何根据我们的框架对它们进行解释。我们在验证数据集上定义和形式化了不同类型的聚类结果验证,并给出了使用验证数据集的应用文献中的聚类研究如何被视为我们框架的实例的例子。
{"title":"Validation of cluster analysis results on validation data: A systematic framework","authors":"Theresa Ullmann, C. Hennig, A. Boulesteix","doi":"10.1002/widm.1444","DOIUrl":"https://doi.org/10.1002/widm.1444","url":null,"abstract":"Cluster analysis refers to a wide range of data analytic techniques for class discovery and is popular in many application fields. To assess the quality of a clustering result, different cluster validation procedures have been proposed in the literature. While there is extensive work on classical validation techniques, such as internal and external validation, less attention has been given to validating and replicating a clustering result using a validation dataset. Such a dataset may be part of the original dataset, which is separated before analysis begins, or it could be an independently collected dataset. We present a systematic, structured review of the existing literature about this topic. For this purpose, we outline a formal framework that covers most existing approaches for validating clustering results on validation data. In particular, we review classical validation techniques such as internal and external validation, stability analysis, and visual validation, and show how they can be interpreted in terms of our framework. We define and formalize different types of validation of clustering results on a validation dataset, and give examples of how clustering studies from the applied literature that used a validation dataset can be seen as instances of our framework.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"26 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89773996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
期刊
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1