Isam A. Alobaidi, J. Leopold, Ali Allami, Nathan Eloe, Dustin Tanksley
Machine learning and computational intelligence have facilitated the development of recommendation systems for a broad range of domains. Such recommendations are based on contextual information that is explicitly provided or pervasively collected. Recommendation systems often improve decision‐making or increase the efficacy of a task. Real‐time strategy (RTS) video games are not only a popular entertainment medium, they also are an abstraction of many real‐world applications where the aim is to increase your resources and decrease those of your opponent. Using predictive analytics, which examines past examples of success and failure, we can learn how to predict positive outcomes for such scenarios. The goal of our research is to develop an accurate predictive recommendation system for multiplayer strategic games to determine recommendations for moves that a player should, and should not, make and thereby provide a competitive advantage. Herein we compare two techniques, frequent and discriminative subgraph mining, in terms of the error rates associated with their predictions in this context. As proof of concept, we present the results of an experiment that utilizes our strategies for two particular RTS games.
{"title":"Predictive analysis of real‐time strategy games: A graph mining approach","authors":"Isam A. Alobaidi, J. Leopold, Ali Allami, Nathan Eloe, Dustin Tanksley","doi":"10.1002/widm.1398","DOIUrl":"https://doi.org/10.1002/widm.1398","url":null,"abstract":"Machine learning and computational intelligence have facilitated the development of recommendation systems for a broad range of domains. Such recommendations are based on contextual information that is explicitly provided or pervasively collected. Recommendation systems often improve decision‐making or increase the efficacy of a task. Real‐time strategy (RTS) video games are not only a popular entertainment medium, they also are an abstraction of many real‐world applications where the aim is to increase your resources and decrease those of your opponent. Using predictive analytics, which examines past examples of success and failure, we can learn how to predict positive outcomes for such scenarios. The goal of our research is to develop an accurate predictive recommendation system for multiplayer strategic games to determine recommendations for moves that a player should, and should not, make and thereby provide a competitive advantage. Herein we compare two techniques, frequent and discriminative subgraph mining, in terms of the error rates associated with their predictions in this context. As proof of concept, we present the results of an experiment that utilizes our strategies for two particular RTS games.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"24 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89373013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this era of connected systems that have penetrated everywhere, transport units have become a significant source of data, collected from commuters, vehicles, drivers, or any section being touched by the transport system. These data, which have both spatial as well as temporal aspects, is utilized for a plethora of services like travel assistant systems, multi‐modal transport solutions, real‐time travel information, smart parking, autonomous vehicles, to name a few. With the current buzz of sustainable transport, the use of public transport systems have been popularized owing to the economic and environmental savings. In this review article, we provide a highlight of works, which have tried to utilize techniques to improve multiple sections of the public transport system, primarily focusing on developing economies, thus improving the overall commute experience at various countries.
{"title":"Smartphones for public transport planning and recommendation in developing countries—A review","authors":"Rohit Verma, Sandip Chakraborty","doi":"10.1002/widm.1397","DOIUrl":"https://doi.org/10.1002/widm.1397","url":null,"abstract":"In this era of connected systems that have penetrated everywhere, transport units have become a significant source of data, collected from commuters, vehicles, drivers, or any section being touched by the transport system. These data, which have both spatial as well as temporal aspects, is utilized for a plethora of services like travel assistant systems, multi‐modal transport solutions, real‐time travel information, smart parking, autonomous vehicles, to name a few. With the current buzz of sustainable transport, the use of public transport systems have been popularized owing to the economic and environmental savings. In this review article, we provide a highlight of works, which have tried to utilize techniques to improve multiple sections of the public transport system, primarily focusing on developing economies, thus improving the overall commute experience at various countries.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"14 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2020-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86012919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Liver cancer has become the third cause that leads to the cancer death. For hepatocellular carcinoma (HCC), as the highly malignant type of liver cancer, its recurrence rate after operation is still very high because there is no reliable clinical data to provide better advice for patients after operation. To solve the challenging issue, in this work, we design a novel prediction model for recurrence of HCC using neighbor2vec based algorithm. It consists of three stages: (a) In the preparation stage, the Pearson correlation coefficient was used to explore the independent predictors of HCC recurrence, (b) due to the low correlation between individual dimension and prediction target, K‐nearest neighbors (KNN) were found as a K‐vectors list for each patient (neighbor2vec), (c) all vectors lists were applied as the input of machine learning methods such as logistic regression, KNN, decision tree, naive Bayes (NB), and deep neural network to establish the neighbor2vec based prediction model. From the experimental results on the real data from Shandong Provincial Hospital in China, the proposed neighbor2vec based prediction model outperforms all the other models. Especially, the NB model with neighbor2vec achieves up to 83.02, 82.86, 77.6%, in terms of accuracy, recall rates, and precision.
{"title":"Prediction model for recurrence of hepatocellular carcinoma after resection by using neighbor2vec based algorithms","authors":"Yuankui Cao, Junqing Fan, Hong-xin Cao, Yunliang Chen, Jie Li, Jianxin Li, Shenmin Zhang","doi":"10.1002/widm.1390","DOIUrl":"https://doi.org/10.1002/widm.1390","url":null,"abstract":"Liver cancer has become the third cause that leads to the cancer death. For hepatocellular carcinoma (HCC), as the highly malignant type of liver cancer, its recurrence rate after operation is still very high because there is no reliable clinical data to provide better advice for patients after operation. To solve the challenging issue, in this work, we design a novel prediction model for recurrence of HCC using neighbor2vec based algorithm. It consists of three stages: (a) In the preparation stage, the Pearson correlation coefficient was used to explore the independent predictors of HCC recurrence, (b) due to the low correlation between individual dimension and prediction target, K‐nearest neighbors (KNN) were found as a K‐vectors list for each patient (neighbor2vec), (c) all vectors lists were applied as the input of machine learning methods such as logistic regression, KNN, decision tree, naive Bayes (NB), and deep neural network to establish the neighbor2vec based prediction model. From the experimental results on the real data from Shandong Provincial Hospital in China, the proposed neighbor2vec based prediction model outperforms all the other models. Especially, the NB model with neighbor2vec achieves up to 83.02, 82.86, 77.6%, in terms of accuracy, recall rates, and precision.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"3 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2020-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82102420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
During the last few decades, the widespread growth of scholarly networks and digital libraries has resulted in an explosion of publicly available scholarly data in various forms such as authors, papers, citations, conferences, and journals. This has created interest in the domain of big scholarly data analysis that analyses worldwide dissemination of scientific findings from different perspectives. Although the study of big scholarly data is relatively new, some studies have emerged on how to investigate scholarly data usage in different disciplines. These studies motivate investigating the scholarly data generated via academic technologies such as scholarly networks and digital libraries for building scalable approaches for retrieving, recommending, and analyzing the scholarly content. We have analyzed these studies following a systematic methodology, classifying them into different applications based on literature features and highlighting the machine learning techniques used for this purpose. We also discuss open challenges that remain unsolved to foster future research in the field of scholarly data mining.
{"title":"Scholarly data mining: A systematic review of its applications","authors":"Amna Dridi, M. Gaber, R. Azad, Jagdev Bhogal","doi":"10.1002/widm.1395","DOIUrl":"https://doi.org/10.1002/widm.1395","url":null,"abstract":"During the last few decades, the widespread growth of scholarly networks and digital libraries has resulted in an explosion of publicly available scholarly data in various forms such as authors, papers, citations, conferences, and journals. This has created interest in the domain of big scholarly data analysis that analyses worldwide dissemination of scientific findings from different perspectives. Although the study of big scholarly data is relatively new, some studies have emerged on how to investigate scholarly data usage in different disciplines. These studies motivate investigating the scholarly data generated via academic technologies such as scholarly networks and digital libraries for building scalable approaches for retrieving, recommending, and analyzing the scholarly content. We have analyzed these studies following a systematic methodology, classifying them into different applications based on literature features and highlighting the machine learning techniques used for this purpose. We also discuss open challenges that remain unsolved to foster future research in the field of scholarly data mining.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"86 1 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2020-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77296303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
For prediction of urban remote sensing surface temperature, cloud, cloud shadow and snow contamination lead to the failure of surface temperature inversion and vegetation‐related index calculation. A time series prediction framework of urban surface temperature under cloud interference is proposed in this paper. This is helpful to solve the problem of the impact of data loss on surface temperature prediction. Spatial and temporal variation trends of surface temperature and vegetation index are analyzed using Landsat 7/8 remote sensing data of 2010 to 2019 from Beijing. The geographically weighed regression (GWR) method is used to realize the simulation of surface temperature based on the current date. The deep learning prediction network based on convolution and long short‐term memory (LSTM) networks was constructed to predict the spatial distribution of surface temperature on the next observation date. The time series analysis shows that the NDBI is less than −0.2, which indicates that there may be cloud contamination. The land surface temperature (LST) modeling results show that the precision of estimation using GWR method on impervious surface and water bodies is superior compared to the vegetation area. For LST prediction using deep learning methods, the result of the prediction on surface temperature space distribution was relatively good. The purpose of this study is to make up for the missing data affected by cloud, snow, and other interference factors, and to be applied to the prediction of the spatial and temporal distributions of LST.
{"title":"Predicting land surface temperature with geographically weighed regression and deep learning","authors":"Hongfei Jia, De-He Yang, Weiping Deng, Qing Wei, Wenliang Jiang","doi":"10.1002/widm.1396","DOIUrl":"https://doi.org/10.1002/widm.1396","url":null,"abstract":"For prediction of urban remote sensing surface temperature, cloud, cloud shadow and snow contamination lead to the failure of surface temperature inversion and vegetation‐related index calculation. A time series prediction framework of urban surface temperature under cloud interference is proposed in this paper. This is helpful to solve the problem of the impact of data loss on surface temperature prediction. Spatial and temporal variation trends of surface temperature and vegetation index are analyzed using Landsat 7/8 remote sensing data of 2010 to 2019 from Beijing. The geographically weighed regression (GWR) method is used to realize the simulation of surface temperature based on the current date. The deep learning prediction network based on convolution and long short‐term memory (LSTM) networks was constructed to predict the spatial distribution of surface temperature on the next observation date. The time series analysis shows that the NDBI is less than −0.2, which indicates that there may be cloud contamination. The land surface temperature (LST) modeling results show that the precision of estimation using GWR method on impervious surface and water bodies is superior compared to the vegetation area. For LST prediction using deep learning methods, the result of the prediction on surface temperature space distribution was relatively good. The purpose of this study is to make up for the missing data affected by cloud, snow, and other interference factors, and to be applied to the prediction of the spatial and temporal distributions of LST.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"40 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2020-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73122661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luiz M. R. Gadelha, Pedro C. de Siracusa, E. Dalcin, Luís Alexandre Estevão da Silva, D. A. Augusto, Eduardo Krempser, Helen Michelle Affe, R. L. Costa, Maria Luiza Mondelli, P. Meirelles, F. Thompson, M. Chame, A. Ziviani, M. F. Siqueira
The unprecedented size of the human population, along with its associated economic activities, has an ever‐increasing impact on global environments. Across the world, countries are concerned about the growing resource consumption and the capacity of ecosystems to provide resources. To effectively conserve biodiversity, it is essential to make indicators and knowledge openly available to decision‐makers in ways that they can effectively use them. The development and deployment of tools and techniques to generate these indicators require having access to trustworthy data from biological collections, field surveys and automated sensors, molecular data, and historic academic literature. The transformation of these raw data into synthesized information that is fit for use requires going through many refinement steps. The methodologies and techniques applied to manage and analyze these data constitute an area usually called biodiversity informatics. Biodiversity data follow a life cycle consisting of planning, collection, certification, description, preservation, discovery, integration, and analysis. Researchers, whether producers or consumers of biodiversity data, will likely perform activities related to at least one of these steps. This article explores each stage of the life cycle of biodiversity data, discussing its methodologies, tools, and challenges.
{"title":"A survey of biodiversity informatics: Concepts, practices, and challenges","authors":"Luiz M. R. Gadelha, Pedro C. de Siracusa, E. Dalcin, Luís Alexandre Estevão da Silva, D. A. Augusto, Eduardo Krempser, Helen Michelle Affe, R. L. Costa, Maria Luiza Mondelli, P. Meirelles, F. Thompson, M. Chame, A. Ziviani, M. F. Siqueira","doi":"10.1002/widm.1394","DOIUrl":"https://doi.org/10.1002/widm.1394","url":null,"abstract":"The unprecedented size of the human population, along with its associated economic activities, has an ever‐increasing impact on global environments. Across the world, countries are concerned about the growing resource consumption and the capacity of ecosystems to provide resources. To effectively conserve biodiversity, it is essential to make indicators and knowledge openly available to decision‐makers in ways that they can effectively use them. The development and deployment of tools and techniques to generate these indicators require having access to trustworthy data from biological collections, field surveys and automated sensors, molecular data, and historic academic literature. The transformation of these raw data into synthesized information that is fit for use requires going through many refinement steps. The methodologies and techniques applied to manage and analyze these data constitute an area usually called biodiversity informatics. Biodiversity data follow a life cycle consisting of planning, collection, certification, description, preservation, discovery, integration, and analysis. Researchers, whether producers or consumers of biodiversity data, will likely perform activities related to at least one of these steps. This article explores each stage of the life cycle of biodiversity data, discussing its methodologies, tools, and challenges.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"46 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91270269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Confalonieri, Ludovik Çoba, Benedikt Wagner, Tarek R. Besold
Explainability in Artificial Intelligence (AI) has been revived as a topic of active research by the need of conveying safety and trust to users in the “how” and “why” of automated decision‐making in different applications such as autonomous driving, medical diagnosis, or banking and finance. While explainability in AI has recently received significant attention, the origins of this line of work go back several decades to when AI systems were mainly developed as (knowledge‐based) expert systems. Since then, the definition, understanding, and implementation of explainability have been picked up in several lines of research work, namely, expert systems, machine learning, recommender systems, and in approaches to neural‐symbolic learning and reasoning, mostly happening during different periods of AI history. In this article, we present a historical perspective of Explainable Artificial Intelligence. We discuss how explainability was mainly conceived in the past, how it is understood in the present and, how it might be understood in the future. We conclude the article by proposing criteria for explanations that we believe will play a crucial role in the development of human‐understandable explainable systems.
{"title":"A historical perspective of explainable Artificial Intelligence","authors":"R. Confalonieri, Ludovik Çoba, Benedikt Wagner, Tarek R. Besold","doi":"10.1002/widm.1391","DOIUrl":"https://doi.org/10.1002/widm.1391","url":null,"abstract":"Explainability in Artificial Intelligence (AI) has been revived as a topic of active research by the need of conveying safety and trust to users in the “how” and “why” of automated decision‐making in different applications such as autonomous driving, medical diagnosis, or banking and finance. While explainability in AI has recently received significant attention, the origins of this line of work go back several decades to when AI systems were mainly developed as (knowledge‐based) expert systems. Since then, the definition, understanding, and implementation of explainability have been picked up in several lines of research work, namely, expert systems, machine learning, recommender systems, and in approaches to neural‐symbolic learning and reasoning, mostly happening during different periods of AI history. In this article, we present a historical perspective of Explainable Artificial Intelligence. We discuss how explainability was mainly conceived in the past, how it is understood in the present and, how it might be understood in the future. We conclude the article by proposing criteria for explanations that we believe will play a crucial role in the development of human‐understandable explainable systems.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"39 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2020-10-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74649044","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In the modern days, the amount of the data and information is increasing along with their accessibility and availability, due to the Internet and social media. To be able to search this vast data set and to discover unknown useful data patterns and predictions, the data mining method is used. Data mining allows for unrelated data to be connected in a meaningful way, to analyze the data, and to represent the results in the form of useful data patterns and predictions that help and predict future behavior. The process of data mining can potentially violate sensitive and personal data. Individual privacy is under attack if some of the information leaks and reveals the identity of a person whose personal data were used in the data mining process. There are many privacy‐preserving data mining (PPDM) techniques and methods that have a task to preserve the privacy and sensitive data while providing accurate data mining results at the same time. PPDM techniques and methods incorporate different approaches that protect data in the process of data mining. The methodology that was used in this article is the systematic literature review and bibliometric analysis. This article identifieds the current trends, techniques, and methods that are being used in the privacy‐preserving data mining field to make a clear and concise classification of the PPDM methods and techniques with possibly identifying new methods and techniques that were not included in the previous classification, and to emphasize the future research directions.
{"title":"Data mining privacy preserving: Research agenda","authors":"Inda Kreso, Amra Kapo, L. Turulja","doi":"10.1002/widm.1392","DOIUrl":"https://doi.org/10.1002/widm.1392","url":null,"abstract":"In the modern days, the amount of the data and information is increasing along with their accessibility and availability, due to the Internet and social media. To be able to search this vast data set and to discover unknown useful data patterns and predictions, the data mining method is used. Data mining allows for unrelated data to be connected in a meaningful way, to analyze the data, and to represent the results in the form of useful data patterns and predictions that help and predict future behavior. The process of data mining can potentially violate sensitive and personal data. Individual privacy is under attack if some of the information leaks and reveals the identity of a person whose personal data were used in the data mining process. There are many privacy‐preserving data mining (PPDM) techniques and methods that have a task to preserve the privacy and sensitive data while providing accurate data mining results at the same time. PPDM techniques and methods incorporate different approaches that protect data in the process of data mining. The methodology that was used in this article is the systematic literature review and bibliometric analysis. This article identifieds the current trends, techniques, and methods that are being used in the privacy‐preserving data mining field to make a clear and concise classification of the PPDM methods and techniques with possibly identifying new methods and techniques that were not included in the previous classification, and to emphasize the future research directions.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"31 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2020-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85689551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md. Ileas Pramanik, Raymond Y. K. Lau, Md. Sakir Hossain, Md-Mizanur Rahoman, Sumon Kumar Debnath, Md. Golam Rashed, Md. Zasim Uddin
In the era of “big data,” a huge number of people, devices, and sensors are connected via digital networks and the cross‐plays among these entities generate enormous valuable data that facilitate organizations to innovate and grow. However, the data deluge also raises serious privacy concerns which may cause a regulatory backlash and hinder further organizational innovation. To address the challenge of information privacy, researchers have explored privacy‐preserving methodologies in the past two decades. However, a thorough study of privacy preserving big data analytics is missing in existing literature. The main contributions of this article include a systematic evaluation of various privacy preservation approaches and a critical analysis of the state‐of‐the‐art privacy preserving big data analytics methodologies. More specifically, we propose a four‐dimensional framework for analyzing and designing the next generation of privacy preserving big data analytics approaches. Besides, we contribute to pinpoint the potential opportunities and challenges of applying privacy preserving big data analytics to business settings. We provide five recommendations of effectively applying privacy‐preserving big data analytics to businesses. To the best of our knowledge, this is the first systematic study about state‐of‐the‐art in privacy‐preserving big data analytics. The managerial implication of our study is that organizations can apply the results of our critical analysis to strengthen their strategic deployment of big data analytics in business settings, and hence to better leverage big data for sustainable organizational innovation and growth.
{"title":"Privacy preserving big data analytics: A critical analysis of state‐of‐the‐art","authors":"Md. Ileas Pramanik, Raymond Y. K. Lau, Md. Sakir Hossain, Md-Mizanur Rahoman, Sumon Kumar Debnath, Md. Golam Rashed, Md. Zasim Uddin","doi":"10.1002/widm.1387","DOIUrl":"https://doi.org/10.1002/widm.1387","url":null,"abstract":"In the era of “big data,” a huge number of people, devices, and sensors are connected via digital networks and the cross‐plays among these entities generate enormous valuable data that facilitate organizations to innovate and grow. However, the data deluge also raises serious privacy concerns which may cause a regulatory backlash and hinder further organizational innovation. To address the challenge of information privacy, researchers have explored privacy‐preserving methodologies in the past two decades. However, a thorough study of privacy preserving big data analytics is missing in existing literature. The main contributions of this article include a systematic evaluation of various privacy preservation approaches and a critical analysis of the state‐of‐the‐art privacy preserving big data analytics methodologies. More specifically, we propose a four‐dimensional framework for analyzing and designing the next generation of privacy preserving big data analytics approaches. Besides, we contribute to pinpoint the potential opportunities and challenges of applying privacy preserving big data analytics to business settings. We provide five recommendations of effectively applying privacy‐preserving big data analytics to businesses. To the best of our knowledge, this is the first systematic study about state‐of‐the‐art in privacy‐preserving big data analytics. The managerial implication of our study is that organizations can apply the results of our critical analysis to strengthen their strategic deployment of big data analytics in business settings, and hence to better leverage big data for sustainable organizational innovation and growth.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"45 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2020-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85718985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This Expression of Concern is for the above article, published online on August 2, 2018 in Wiley Online Library (wileyonlinelibrary.com), and has been published by agreement between the journal Editor-in-Chief, Dr. Witold Pedrycz, and Wiley Periodicals LLC. The expression of concern has been agreed due to concerns raised regarding possible misrepresentation of the data set of facial images used in this article. Based on information provided by the authors, data collection for the above mentioned article took place in 2014. However, it has subsequently been noted that images from this same data set have purportedly been used in both a 2010 article (1) and a 2018 article (in which the data reported was purportedly collected in 2012) (2), which were co-authored by the corresponding author of the above mentioned article. The journal therefore has concerns about when data collection actually took place. Additionally, Figure 1 in the above mentioned article appears to be the same as Figure 1 of the 2018 article (2), though there is no citation, and permission was not obtained to reuse the figure. Unfortunately, the authors have been unable to provide any further information to the journal to help clarify when data collection took place. As a result, the journal has decided to issue an Expression of Concern to readers.
{"title":"Expression of Concern: Wang, C., Zhang, Q., Liu, W., Liu, Y. & Miao, L. Facial feature discovery for ethnicity recognition. WIREs Data Mining Knowl. Discov. 9, e1278 (2019). https://doi.org/10.1002/widm.1278","authors":"","doi":"10.1002/widm.1386","DOIUrl":"https://doi.org/10.1002/widm.1386","url":null,"abstract":"This Expression of Concern is for the above article, published online on August 2, 2018 in Wiley Online Library (wileyonlinelibrary.com), and has been published by agreement between the journal Editor-in-Chief, Dr. Witold Pedrycz, and Wiley Periodicals LLC. The expression of concern has been agreed due to concerns raised regarding possible misrepresentation of the data set of facial images used in this article. Based on information provided by the authors, data collection for the above mentioned article took place in 2014. However, it has subsequently been noted that images from this same data set have purportedly been used in both a 2010 article (1) and a 2018 article (in which the data reported was purportedly collected in 2012) (2), which were co-authored by the corresponding author of the above mentioned article. The journal therefore has concerns about when data collection actually took place. Additionally, Figure 1 in the above mentioned article appears to be the same as Figure 1 of the 2018 article (2), though there is no citation, and permission was not obtained to reuse the figure. Unfortunately, the authors have been unable to provide any further information to the journal to help clarify when data collection took place. As a result, the journal has decided to issue an Expression of Concern to readers.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"1 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2020-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88971581","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}