首页 > 最新文献

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery最新文献

英文 中文
Predicting the ratings of Amazon products using Big Data 利用大数据预测亚马逊产品的评级
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2020-12-12 DOI: 10.1002/widm.1400
Jongwook Woo, Monika Mishra
This paper aims to apply several machine learning (ML) models to the massive dataset present in the area of e‐commerce from Amazon to analyze and predict ratings and to recommend products. For this purpose, we have used both traditional and Big Data algorithms. As the Amazon product review dataset is large, we present Big Data architecture suitable massive dataset for storing and computation, which is not possible with the traditional architecture. Furthermore, the dataset contains 15 attributes and has about 7 million records. With the dataset, we develop several models in Oracle Big Data and Azure Cloud Computing services to predict the review rating and recommendation for the items at Amazon. We present a comparative conclusion in terms of the accuracy as well as the efficiency with Spark ML—the Big Data architecture, and Azure ML—the traditional architecture.
本文旨在将几个机器学习(ML)模型应用于亚马逊电子商务领域的大量数据集,以分析和预测评级并推荐产品。为此,我们使用了传统算法和大数据算法。由于亚马逊产品评论数据量较大,我们提出了适合海量数据存储和计算的大数据架构,这是传统架构无法实现的。此外,该数据集包含15个属性,大约有700万条记录。利用这些数据集,我们在Oracle大数据和Azure云计算服务中开发了几个模型来预测亚马逊上商品的评论评级和推荐。我们对大数据架构Spark ml和传统架构Azure ml的准确率和效率进行了比较。
{"title":"Predicting the ratings of Amazon products using Big Data","authors":"Jongwook Woo, Monika Mishra","doi":"10.1002/widm.1400","DOIUrl":"https://doi.org/10.1002/widm.1400","url":null,"abstract":"This paper aims to apply several machine learning (ML) models to the massive dataset present in the area of e‐commerce from Amazon to analyze and predict ratings and to recommend products. For this purpose, we have used both traditional and Big Data algorithms. As the Amazon product review dataset is large, we present Big Data architecture suitable massive dataset for storing and computation, which is not possible with the traditional architecture. Furthermore, the dataset contains 15 attributes and has about 7 million records. With the dataset, we develop several models in Oracle Big Data and Azure Cloud Computing services to predict the review rating and recommendation for the items at Amazon. We present a comparative conclusion in terms of the accuracy as well as the efficiency with Spark ML—the Big Data architecture, and Azure ML—the traditional architecture.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2020-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82931124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Predictive analysis of real‐time strategy games: A graph mining approach 实时策略游戏的预测分析:一种图挖掘方法
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2020-12-08 DOI: 10.1002/widm.1398
Isam A. Alobaidi, J. Leopold, Ali Allami, Nathan Eloe, Dustin Tanksley
Machine learning and computational intelligence have facilitated the development of recommendation systems for a broad range of domains. Such recommendations are based on contextual information that is explicitly provided or pervasively collected. Recommendation systems often improve decision‐making or increase the efficacy of a task. Real‐time strategy (RTS) video games are not only a popular entertainment medium, they also are an abstraction of many real‐world applications where the aim is to increase your resources and decrease those of your opponent. Using predictive analytics, which examines past examples of success and failure, we can learn how to predict positive outcomes for such scenarios. The goal of our research is to develop an accurate predictive recommendation system for multiplayer strategic games to determine recommendations for moves that a player should, and should not, make and thereby provide a competitive advantage. Herein we compare two techniques, frequent and discriminative subgraph mining, in terms of the error rates associated with their predictions in this context. As proof of concept, we present the results of an experiment that utilizes our strategies for two particular RTS games.
机器学习和计算智能促进了广泛领域推荐系统的发展。这些建议是基于明确提供或广泛收集的上下文信息。推荐系统通常可以改善决策或提高任务的效率。即时战略(RTS)电子游戏不仅是一种流行的娱乐媒介,也是许多现实世界应用的抽象,其目标是增加你的资源并减少对手的资源。使用预测分析,它检查过去的成功和失败的例子,我们可以学习如何预测这种情况下的积极结果。我们的研究目标是为多人策略游戏开发一个准确的预测推荐系统,以确定玩家应该或不应该采取的行动建议,从而提供竞争优势。在这里,我们比较了两种技术,频繁和判别子图挖掘,在这种情况下,与他们的预测相关的错误率。作为概念的证明,我们在两个特定的RTS游戏中使用了我们的策略。
{"title":"Predictive analysis of real‐time strategy games: A graph mining approach","authors":"Isam A. Alobaidi, J. Leopold, Ali Allami, Nathan Eloe, Dustin Tanksley","doi":"10.1002/widm.1398","DOIUrl":"https://doi.org/10.1002/widm.1398","url":null,"abstract":"Machine learning and computational intelligence have facilitated the development of recommendation systems for a broad range of domains. Such recommendations are based on contextual information that is explicitly provided or pervasively collected. Recommendation systems often improve decision‐making or increase the efficacy of a task. Real‐time strategy (RTS) video games are not only a popular entertainment medium, they also are an abstraction of many real‐world applications where the aim is to increase your resources and decrease those of your opponent. Using predictive analytics, which examines past examples of success and failure, we can learn how to predict positive outcomes for such scenarios. The goal of our research is to develop an accurate predictive recommendation system for multiplayer strategic games to determine recommendations for moves that a player should, and should not, make and thereby provide a competitive advantage. Herein we compare two techniques, frequent and discriminative subgraph mining, in terms of the error rates associated with their predictions in this context. As proof of concept, we present the results of an experiment that utilizes our strategies for two particular RTS games.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89373013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Smartphones for public transport planning and recommendation in developing countries—A review 智能手机在发展中国家的公共交通规划和建议——综述
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2020-11-13 DOI: 10.1002/widm.1397
Rohit Verma, Sandip Chakraborty
In this era of connected systems that have penetrated everywhere, transport units have become a significant source of data, collected from commuters, vehicles, drivers, or any section being touched by the transport system. These data, which have both spatial as well as temporal aspects, is utilized for a plethora of services like travel assistant systems, multi‐modal transport solutions, real‐time travel information, smart parking, autonomous vehicles, to name a few. With the current buzz of sustainable transport, the use of public transport systems have been popularized owing to the economic and environmental savings. In this review article, we provide a highlight of works, which have tried to utilize techniques to improve multiple sections of the public transport system, primarily focusing on developing economies, thus improving the overall commute experience at various countries.
在这个互联系统无处不在的时代,交通单元已经成为一个重要的数据来源,从通勤者、车辆、司机或交通系统接触的任何部分收集数据。这些数据既有空间方面的,也有时间方面的,可用于多种服务,如旅行辅助系统、多式联运解决方案、实时旅行信息、智能停车、自动驾驶汽车等。随着当前可持续交通的热议,公共交通系统的使用由于经济和环境的节省而得到普及。在这篇综述文章中,我们提供了一些作品的亮点,这些作品试图利用技术来改善公共交通系统的多个部分,主要集中在发展中经济体,从而改善各国的整体通勤体验。
{"title":"Smartphones for public transport planning and recommendation in developing countries—A review","authors":"Rohit Verma, Sandip Chakraborty","doi":"10.1002/widm.1397","DOIUrl":"https://doi.org/10.1002/widm.1397","url":null,"abstract":"In this era of connected systems that have penetrated everywhere, transport units have become a significant source of data, collected from commuters, vehicles, drivers, or any section being touched by the transport system. These data, which have both spatial as well as temporal aspects, is utilized for a plethora of services like travel assistant systems, multi‐modal transport solutions, real‐time travel information, smart parking, autonomous vehicles, to name a few. With the current buzz of sustainable transport, the use of public transport systems have been popularized owing to the economic and environmental savings. In this review article, we provide a highlight of works, which have tried to utilize techniques to improve multiple sections of the public transport system, primarily focusing on developing economies, thus improving the overall commute experience at various countries.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2020-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86012919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction model for recurrence of hepatocellular carcinoma after resection by using neighbor2vec based algorithms 基于neighbor2vec算法的肝癌切除术后复发预测模型
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2020-11-12 DOI: 10.1002/widm.1390
Yuankui Cao, Junqing Fan, Hong-xin Cao, Yunliang Chen, Jie Li, Jianxin Li, Shenmin Zhang
Liver cancer has become the third cause that leads to the cancer death. For hepatocellular carcinoma (HCC), as the highly malignant type of liver cancer, its recurrence rate after operation is still very high because there is no reliable clinical data to provide better advice for patients after operation. To solve the challenging issue, in this work, we design a novel prediction model for recurrence of HCC using neighbor2vec based algorithm. It consists of three stages: (a) In the preparation stage, the Pearson correlation coefficient was used to explore the independent predictors of HCC recurrence, (b) due to the low correlation between individual dimension and prediction target, K‐nearest neighbors (KNN) were found as a K‐vectors list for each patient (neighbor2vec), (c) all vectors lists were applied as the input of machine learning methods such as logistic regression, KNN, decision tree, naive Bayes (NB), and deep neural network to establish the neighbor2vec based prediction model. From the experimental results on the real data from Shandong Provincial Hospital in China, the proposed neighbor2vec based prediction model outperforms all the other models. Especially, the NB model with neighbor2vec achieves up to 83.02, 82.86, 77.6%, in terms of accuracy, recall rates, and precision.
肝癌已成为导致癌症死亡的第三大原因。对于肝细胞癌(HCC),作为高度恶性的肝癌类型,其术后复发率仍然很高,因为没有可靠的临床资料为术后患者提供更好的建议。为了解决这个具有挑战性的问题,在这项工作中,我们设计了一个基于neighbor2vec算法的肝癌复发预测模型。它包括三个阶段:(a)在准备阶段,使用Pearson相关系数来探索HCC复发的独立预测因子;(b)由于个体维度与预测目标之间的相关性较低,我们找到每个患者的K‐最近邻(KNN)作为K‐向量列表(neighbor2vec); (c)将所有向量列表作为机器学习方法的输入,如逻辑回归、KNN、决策树、朴素贝叶斯(NB)、并利用深度神经网络建立基于neighbor2vec的预测模型。从山东省立医院实际数据的实验结果来看,本文提出的基于neighbor2vec的预测模型优于其他所有模型。其中,基于neighbor2vec的NB模型在准确率、召回率和准确率方面分别达到了83.02、82.86、77.6%。
{"title":"Prediction model for recurrence of hepatocellular carcinoma after resection by using neighbor2vec based algorithms","authors":"Yuankui Cao, Junqing Fan, Hong-xin Cao, Yunliang Chen, Jie Li, Jianxin Li, Shenmin Zhang","doi":"10.1002/widm.1390","DOIUrl":"https://doi.org/10.1002/widm.1390","url":null,"abstract":"Liver cancer has become the third cause that leads to the cancer death. For hepatocellular carcinoma (HCC), as the highly malignant type of liver cancer, its recurrence rate after operation is still very high because there is no reliable clinical data to provide better advice for patients after operation. To solve the challenging issue, in this work, we design a novel prediction model for recurrence of HCC using neighbor2vec based algorithm. It consists of three stages: (a) In the preparation stage, the Pearson correlation coefficient was used to explore the independent predictors of HCC recurrence, (b) due to the low correlation between individual dimension and prediction target, K‐nearest neighbors (KNN) were found as a K‐vectors list for each patient (neighbor2vec), (c) all vectors lists were applied as the input of machine learning methods such as logistic regression, KNN, decision tree, naive Bayes (NB), and deep neural network to establish the neighbor2vec based prediction model. From the experimental results on the real data from Shandong Provincial Hospital in China, the proposed neighbor2vec based prediction model outperforms all the other models. Especially, the NB model with neighbor2vec achieves up to 83.02, 82.86, 77.6%, in terms of accuracy, recall rates, and precision.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2020-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82102420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scholarly data mining: A systematic review of its applications 学术数据挖掘:对其应用的系统回顾
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2020-11-10 DOI: 10.1002/widm.1395
Amna Dridi, M. Gaber, R. Azad, Jagdev Bhogal
During the last few decades, the widespread growth of scholarly networks and digital libraries has resulted in an explosion of publicly available scholarly data in various forms such as authors, papers, citations, conferences, and journals. This has created interest in the domain of big scholarly data analysis that analyses worldwide dissemination of scientific findings from different perspectives. Although the study of big scholarly data is relatively new, some studies have emerged on how to investigate scholarly data usage in different disciplines. These studies motivate investigating the scholarly data generated via academic technologies such as scholarly networks and digital libraries for building scalable approaches for retrieving, recommending, and analyzing the scholarly content. We have analyzed these studies following a systematic methodology, classifying them into different applications based on literature features and highlighting the machine learning techniques used for this purpose. We also discuss open challenges that remain unsolved to foster future research in the field of scholarly data mining.
在过去的几十年里,学术网络和数字图书馆的广泛发展导致了各种形式的公开学术数据的爆炸式增长,如作者、论文、引文、会议和期刊。这引起了人们对从不同角度分析全球科学发现传播的大学术数据分析领域的兴趣。虽然对学术大数据的研究相对较新,但已经出现了一些关于如何调查不同学科学术数据使用情况的研究。这些研究促使研究通过学术网络和数字图书馆等学术技术产生的学术数据,以建立可扩展的方法来检索、推荐和分析学术内容。我们按照系统的方法分析了这些研究,根据文献特征将它们分类为不同的应用,并强调了为此目的使用的机器学习技术。我们还讨论了尚未解决的公开挑战,以促进学术数据挖掘领域的未来研究。
{"title":"Scholarly data mining: A systematic review of its applications","authors":"Amna Dridi, M. Gaber, R. Azad, Jagdev Bhogal","doi":"10.1002/widm.1395","DOIUrl":"https://doi.org/10.1002/widm.1395","url":null,"abstract":"During the last few decades, the widespread growth of scholarly networks and digital libraries has resulted in an explosion of publicly available scholarly data in various forms such as authors, papers, citations, conferences, and journals. This has created interest in the domain of big scholarly data analysis that analyses worldwide dissemination of scientific findings from different perspectives. Although the study of big scholarly data is relatively new, some studies have emerged on how to investigate scholarly data usage in different disciplines. These studies motivate investigating the scholarly data generated via academic technologies such as scholarly networks and digital libraries for building scalable approaches for retrieving, recommending, and analyzing the scholarly content. We have analyzed these studies following a systematic methodology, classifying them into different applications based on literature features and highlighting the machine learning techniques used for this purpose. We also discuss open challenges that remain unsolved to foster future research in the field of scholarly data mining.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2020-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77296303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Predicting land surface temperature with geographically weighed regression and deep learning 基于地理加权回归和深度学习的地表温度预测
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2020-11-03 DOI: 10.1002/widm.1396
Hongfei Jia, De-He Yang, Weiping Deng, Qing Wei, Wenliang Jiang
For prediction of urban remote sensing surface temperature, cloud, cloud shadow and snow contamination lead to the failure of surface temperature inversion and vegetation‐related index calculation. A time series prediction framework of urban surface temperature under cloud interference is proposed in this paper. This is helpful to solve the problem of the impact of data loss on surface temperature prediction. Spatial and temporal variation trends of surface temperature and vegetation index are analyzed using Landsat 7/8 remote sensing data of 2010 to 2019 from Beijing. The geographically weighed regression (GWR) method is used to realize the simulation of surface temperature based on the current date. The deep learning prediction network based on convolution and long short‐term memory (LSTM) networks was constructed to predict the spatial distribution of surface temperature on the next observation date. The time series analysis shows that the NDBI is less than −0.2, which indicates that there may be cloud contamination. The land surface temperature (LST) modeling results show that the precision of estimation using GWR method on impervious surface and water bodies is superior compared to the vegetation area. For LST prediction using deep learning methods, the result of the prediction on surface temperature space distribution was relatively good. The purpose of this study is to make up for the missing data affected by cloud, snow, and other interference factors, and to be applied to the prediction of the spatial and temporal distributions of LST.
在城市遥感地表温度预测中,云、云影和雪污染导致地表温度反演和植被相关指数计算失败。提出了云干扰下城市地表温度的时间序列预测框架。这有助于解决数据丢失对地表温度预测的影响问题。利用2010 - 2019年北京地区Landsat 7/8遥感数据,分析了地表温度和植被指数的时空变化趋势。采用地理加权回归(GWR)方法,实现了基于当前数据的地表温度模拟。构建基于卷积和长短期记忆(LSTM)网络的深度学习预测网络,预测下一个观测日期的地表温度空间分布。时间序列分析表明,NDBI小于- 0.2,表明可能存在云污染。地表温度(LST)模拟结果表明,GWR方法在不透水地表和水体的估算精度优于植被区。对于深度学习方法的地表温度预测,对地表温度空间分布的预测效果较好。本研究的目的是弥补受云、雪等干扰因素影响的缺失数据,并将其应用于地表温度的时空分布预测。
{"title":"Predicting land surface temperature with geographically weighed regression and deep learning","authors":"Hongfei Jia, De-He Yang, Weiping Deng, Qing Wei, Wenliang Jiang","doi":"10.1002/widm.1396","DOIUrl":"https://doi.org/10.1002/widm.1396","url":null,"abstract":"For prediction of urban remote sensing surface temperature, cloud, cloud shadow and snow contamination lead to the failure of surface temperature inversion and vegetation‐related index calculation. A time series prediction framework of urban surface temperature under cloud interference is proposed in this paper. This is helpful to solve the problem of the impact of data loss on surface temperature prediction. Spatial and temporal variation trends of surface temperature and vegetation index are analyzed using Landsat 7/8 remote sensing data of 2010 to 2019 from Beijing. The geographically weighed regression (GWR) method is used to realize the simulation of surface temperature based on the current date. The deep learning prediction network based on convolution and long short‐term memory (LSTM) networks was constructed to predict the spatial distribution of surface temperature on the next observation date. The time series analysis shows that the NDBI is less than −0.2, which indicates that there may be cloud contamination. The land surface temperature (LST) modeling results show that the precision of estimation using GWR method on impervious surface and water bodies is superior compared to the vegetation area. For LST prediction using deep learning methods, the result of the prediction on surface temperature space distribution was relatively good. The purpose of this study is to make up for the missing data affected by cloud, snow, and other interference factors, and to be applied to the prediction of the spatial and temporal distributions of LST.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2020-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73122661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
A survey of biodiversity informatics: Concepts, practices, and challenges 生物多样性信息学综述:概念、实践和挑战
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2020-11-02 DOI: 10.1002/widm.1394
Luiz M. R. Gadelha, Pedro C. de Siracusa, E. Dalcin, Luís Alexandre Estevão da Silva, D. A. Augusto, Eduardo Krempser, Helen Michelle Affe, R. L. Costa, Maria Luiza Mondelli, P. Meirelles, F. Thompson, M. Chame, A. Ziviani, M. F. Siqueira
The unprecedented size of the human population, along with its associated economic activities, has an ever‐increasing impact on global environments. Across the world, countries are concerned about the growing resource consumption and the capacity of ecosystems to provide resources. To effectively conserve biodiversity, it is essential to make indicators and knowledge openly available to decision‐makers in ways that they can effectively use them. The development and deployment of tools and techniques to generate these indicators require having access to trustworthy data from biological collections, field surveys and automated sensors, molecular data, and historic academic literature. The transformation of these raw data into synthesized information that is fit for use requires going through many refinement steps. The methodologies and techniques applied to manage and analyze these data constitute an area usually called biodiversity informatics. Biodiversity data follow a life cycle consisting of planning, collection, certification, description, preservation, discovery, integration, and analysis. Researchers, whether producers or consumers of biodiversity data, will likely perform activities related to at least one of these steps. This article explores each stage of the life cycle of biodiversity data, discussing its methodologies, tools, and challenges.
前所未有的人口规模,以及与之相关的经济活动,对全球环境的影响越来越大。在世界范围内,各国都对不断增长的资源消耗和生态系统提供资源的能力感到担忧。为了有效地保护生物多样性,必须向决策者公开提供指标和知识,使他们能够有效地利用这些指标和知识。开发和部署生成这些指标的工具和技术需要获得来自生物收集、实地调查和自动化传感器、分子数据和历史学术文献的可靠数据。将这些原始数据转换为适合使用的合成信息需要经过许多细化步骤。用于管理和分析这些数据的方法和技术构成了一个通常称为生物多样性信息学的领域。生物多样性数据遵循规划、收集、认证、描述、保存、发现、整合和分析的生命周期。研究人员,无论是生物多样性数据的生产者还是消费者,都可能进行与这些步骤中的至少一个相关的活动。本文探讨了生物多样性数据生命周期的每个阶段,讨论了其方法、工具和挑战。
{"title":"A survey of biodiversity informatics: Concepts, practices, and challenges","authors":"Luiz M. R. Gadelha, Pedro C. de Siracusa, E. Dalcin, Luís Alexandre Estevão da Silva, D. A. Augusto, Eduardo Krempser, Helen Michelle Affe, R. L. Costa, Maria Luiza Mondelli, P. Meirelles, F. Thompson, M. Chame, A. Ziviani, M. F. Siqueira","doi":"10.1002/widm.1394","DOIUrl":"https://doi.org/10.1002/widm.1394","url":null,"abstract":"The unprecedented size of the human population, along with its associated economic activities, has an ever‐increasing impact on global environments. Across the world, countries are concerned about the growing resource consumption and the capacity of ecosystems to provide resources. To effectively conserve biodiversity, it is essential to make indicators and knowledge openly available to decision‐makers in ways that they can effectively use them. The development and deployment of tools and techniques to generate these indicators require having access to trustworthy data from biological collections, field surveys and automated sensors, molecular data, and historic academic literature. The transformation of these raw data into synthesized information that is fit for use requires going through many refinement steps. The methodologies and techniques applied to manage and analyze these data constitute an area usually called biodiversity informatics. Biodiversity data follow a life cycle consisting of planning, collection, certification, description, preservation, discovery, integration, and analysis. Researchers, whether producers or consumers of biodiversity data, will likely perform activities related to at least one of these steps. This article explores each stage of the life cycle of biodiversity data, discussing its methodologies, tools, and challenges.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91270269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 24
A historical perspective of explainable Artificial Intelligence 可解释人工智能的历史视角
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2020-10-19 DOI: 10.1002/widm.1391
R. Confalonieri, Ludovik Çoba, Benedikt Wagner, Tarek R. Besold
Explainability in Artificial Intelligence (AI) has been revived as a topic of active research by the need of conveying safety and trust to users in the “how” and “why” of automated decision‐making in different applications such as autonomous driving, medical diagnosis, or banking and finance. While explainability in AI has recently received significant attention, the origins of this line of work go back several decades to when AI systems were mainly developed as (knowledge‐based) expert systems. Since then, the definition, understanding, and implementation of explainability have been picked up in several lines of research work, namely, expert systems, machine learning, recommender systems, and in approaches to neural‐symbolic learning and reasoning, mostly happening during different periods of AI history. In this article, we present a historical perspective of Explainable Artificial Intelligence. We discuss how explainability was mainly conceived in the past, how it is understood in the present and, how it might be understood in the future. We conclude the article by proposing criteria for explanations that we believe will play a crucial role in the development of human‐understandable explainable systems.
人工智能(AI)的可解释性已经成为一个活跃的研究主题,因为需要在自动驾驶、医疗诊断、银行和金融等不同应用中,向用户传达自动化决策的“如何”和“为什么”的安全性和信任。虽然人工智能的可解释性最近受到了极大的关注,但这条工作线的起源可以追溯到几十年前,当时人工智能系统主要是作为(基于知识的)专家系统开发的。从那时起,可解释性的定义、理解和实现已经在几个研究工作领域中出现,即专家系统、机器学习、推荐系统以及神经符号学习和推理方法,主要发生在人工智能历史的不同时期。在这篇文章中,我们提出了可解释的人工智能的历史观点。我们将讨论可解释性在过去是如何被构想出来的,它在现在是如何被理解的,以及它在未来可能是如何被理解的。我们通过提出解释标准来结束本文,我们认为这些标准将在人类可理解的可解释系统的发展中发挥关键作用。
{"title":"A historical perspective of explainable Artificial Intelligence","authors":"R. Confalonieri, Ludovik Çoba, Benedikt Wagner, Tarek R. Besold","doi":"10.1002/widm.1391","DOIUrl":"https://doi.org/10.1002/widm.1391","url":null,"abstract":"Explainability in Artificial Intelligence (AI) has been revived as a topic of active research by the need of conveying safety and trust to users in the “how” and “why” of automated decision‐making in different applications such as autonomous driving, medical diagnosis, or banking and finance. While explainability in AI has recently received significant attention, the origins of this line of work go back several decades to when AI systems were mainly developed as (knowledge‐based) expert systems. Since then, the definition, understanding, and implementation of explainability have been picked up in several lines of research work, namely, expert systems, machine learning, recommender systems, and in approaches to neural‐symbolic learning and reasoning, mostly happening during different periods of AI history. In this article, we present a historical perspective of Explainable Artificial Intelligence. We discuss how explainability was mainly conceived in the past, how it is understood in the present and, how it might be understood in the future. We conclude the article by proposing criteria for explanations that we believe will play a crucial role in the development of human‐understandable explainable systems.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2020-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74649044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 134
Data mining privacy preserving: Research agenda 数据挖掘隐私保护:研究议程
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2020-10-18 DOI: 10.1002/widm.1392
Inda Kreso, Amra Kapo, L. Turulja
In the modern days, the amount of the data and information is increasing along with their accessibility and availability, due to the Internet and social media. To be able to search this vast data set and to discover unknown useful data patterns and predictions, the data mining method is used. Data mining allows for unrelated data to be connected in a meaningful way, to analyze the data, and to represent the results in the form of useful data patterns and predictions that help and predict future behavior. The process of data mining can potentially violate sensitive and personal data. Individual privacy is under attack if some of the information leaks and reveals the identity of a person whose personal data were used in the data mining process. There are many privacy‐preserving data mining (PPDM) techniques and methods that have a task to preserve the privacy and sensitive data while providing accurate data mining results at the same time. PPDM techniques and methods incorporate different approaches that protect data in the process of data mining. The methodology that was used in this article is the systematic literature review and bibliometric analysis. This article identifieds the current trends, techniques, and methods that are being used in the privacy‐preserving data mining field to make a clear and concise classification of the PPDM methods and techniques with possibly identifying new methods and techniques that were not included in the previous classification, and to emphasize the future research directions.
在现代,由于互联网和社交媒体,数据和信息的数量随着其可访问性和可用性而增加。为了能够搜索这个庞大的数据集并发现未知的有用数据模式和预测,使用了数据挖掘方法。数据挖掘允许以有意义的方式将不相关的数据连接起来,分析数据,并以有用的数据模式和预测的形式表示结果,以帮助和预测未来的行为。数据挖掘的过程可能会泄露敏感的个人数据。如果在数据挖掘过程中某些信息泄露并暴露了个人数据被使用者的身份,那么个人隐私就受到了攻击。有许多隐私保护数据挖掘(PPDM)技术和方法,其任务是在提供准确的数据挖掘结果的同时保护隐私和敏感数据。PPDM技术和方法结合了在数据挖掘过程中保护数据的不同方法。本文采用的研究方法是系统的文献综述和文献计量分析。本文通过对当前隐私保护数据挖掘领域中使用的趋势、技术和方法的分析,对PPDM方法和技术进行了清晰、简明的分类,并可能识别出以前分类中未包括的新方法和技术,并强调了未来的研究方向。
{"title":"Data mining privacy preserving: Research agenda","authors":"Inda Kreso, Amra Kapo, L. Turulja","doi":"10.1002/widm.1392","DOIUrl":"https://doi.org/10.1002/widm.1392","url":null,"abstract":"In the modern days, the amount of the data and information is increasing along with their accessibility and availability, due to the Internet and social media. To be able to search this vast data set and to discover unknown useful data patterns and predictions, the data mining method is used. Data mining allows for unrelated data to be connected in a meaningful way, to analyze the data, and to represent the results in the form of useful data patterns and predictions that help and predict future behavior. The process of data mining can potentially violate sensitive and personal data. Individual privacy is under attack if some of the information leaks and reveals the identity of a person whose personal data were used in the data mining process. There are many privacy‐preserving data mining (PPDM) techniques and methods that have a task to preserve the privacy and sensitive data while providing accurate data mining results at the same time. PPDM techniques and methods incorporate different approaches that protect data in the process of data mining. The methodology that was used in this article is the systematic literature review and bibliometric analysis. This article identifieds the current trends, techniques, and methods that are being used in the privacy‐preserving data mining field to make a clear and concise classification of the PPDM methods and techniques with possibly identifying new methods and techniques that were not included in the previous classification, and to emphasize the future research directions.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2020-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85689551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Privacy preserving big data analytics: A critical analysis of state‐of‐the‐art 保护隐私的大数据分析:对最新技术的批判性分析
IF 7.8 2区 计算机科学 Q1 Computer Science Pub Date : 2020-10-07 DOI: 10.1002/widm.1387
Md. Ileas Pramanik, Raymond Y. K. Lau, Md. Sakir Hossain, Md-Mizanur Rahoman, Sumon Kumar Debnath, Md. Golam Rashed, Md. Zasim Uddin
In the era of “big data,” a huge number of people, devices, and sensors are connected via digital networks and the cross‐plays among these entities generate enormous valuable data that facilitate organizations to innovate and grow. However, the data deluge also raises serious privacy concerns which may cause a regulatory backlash and hinder further organizational innovation. To address the challenge of information privacy, researchers have explored privacy‐preserving methodologies in the past two decades. However, a thorough study of privacy preserving big data analytics is missing in existing literature. The main contributions of this article include a systematic evaluation of various privacy preservation approaches and a critical analysis of the state‐of‐the‐art privacy preserving big data analytics methodologies. More specifically, we propose a four‐dimensional framework for analyzing and designing the next generation of privacy preserving big data analytics approaches. Besides, we contribute to pinpoint the potential opportunities and challenges of applying privacy preserving big data analytics to business settings. We provide five recommendations of effectively applying privacy‐preserving big data analytics to businesses. To the best of our knowledge, this is the first systematic study about state‐of‐the‐art in privacy‐preserving big data analytics. The managerial implication of our study is that organizations can apply the results of our critical analysis to strengthen their strategic deployment of big data analytics in business settings, and hence to better leverage big data for sustainable organizational innovation and growth.
在“大数据”时代,大量的人、设备和传感器通过数字网络连接在一起,这些实体之间的相互作用产生了巨大的有价值的数据,促进了组织的创新和发展。然而,数据泛滥也引发了严重的隐私问题,这可能会引发监管反弹,阻碍进一步的组织创新。为了解决信息隐私的挑战,研究人员在过去的二十年中探索了隐私保护方法。然而,现有文献中缺乏对保护隐私的大数据分析的深入研究。本文的主要贡献包括对各种隐私保护方法的系统评估,以及对最先进的隐私保护大数据分析方法的批判性分析。更具体地说,我们提出了一个用于分析和设计下一代隐私保护大数据分析方法的四维框架。此外,我们还致力于指出将保护隐私的大数据分析应用于商业环境的潜在机遇和挑战。我们提供了五条建议,以有效地将保护隐私的大数据分析应用于企业。据我们所知,这是第一个关于保护隐私的大数据分析技术的系统研究。我们研究的管理意义在于,组织可以应用我们的批判性分析结果来加强他们在商业环境中对大数据分析的战略部署,从而更好地利用大数据来实现可持续的组织创新和增长。
{"title":"Privacy preserving big data analytics: A critical analysis of state‐of‐the‐art","authors":"Md. Ileas Pramanik, Raymond Y. K. Lau, Md. Sakir Hossain, Md-Mizanur Rahoman, Sumon Kumar Debnath, Md. Golam Rashed, Md. Zasim Uddin","doi":"10.1002/widm.1387","DOIUrl":"https://doi.org/10.1002/widm.1387","url":null,"abstract":"In the era of “big data,” a huge number of people, devices, and sensors are connected via digital networks and the cross‐plays among these entities generate enormous valuable data that facilitate organizations to innovate and grow. However, the data deluge also raises serious privacy concerns which may cause a regulatory backlash and hinder further organizational innovation. To address the challenge of information privacy, researchers have explored privacy‐preserving methodologies in the past two decades. However, a thorough study of privacy preserving big data analytics is missing in existing literature. The main contributions of this article include a systematic evaluation of various privacy preservation approaches and a critical analysis of the state‐of‐the‐art privacy preserving big data analytics methodologies. More specifically, we propose a four‐dimensional framework for analyzing and designing the next generation of privacy preserving big data analytics approaches. Besides, we contribute to pinpoint the potential opportunities and challenges of applying privacy preserving big data analytics to business settings. We provide five recommendations of effectively applying privacy‐preserving big data analytics to businesses. To the best of our knowledge, this is the first systematic study about state‐of‐the‐art in privacy‐preserving big data analytics. The managerial implication of our study is that organizations can apply the results of our critical analysis to strengthen their strategic deployment of big data analytics in business settings, and hence to better leverage big data for sustainable organizational innovation and growth.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":null,"pages":null},"PeriodicalIF":7.8,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85718985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
期刊
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1