首页 > 最新文献

Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery最新文献

英文 中文
Deepfake attribution: On the source identification of artificially generated images Deepfake归因:关于人工生成图像的来源识别
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-12-03 DOI: 10.1002/widm.1438
Brandon Khoo, Raphaël C.-W. Phan, C. H. Lim
Synthetic media or "deepfakes" are making great advances in visual quality, diversity, and verisimilitude, empowered by large‐scale publicly accessible datasets and rapid technical progress in deep generative modeling. Heralding a paradigm shift in how online content is trusted, researchers in digital image forensics have responded with different proposals to reliably detect AI‐generated images in the wild. However, binary classification of image authenticity is insufficient to regulate the ethical usage of deepfake technology as new applications are developed. This article provides an overview of the major innovations in synthetic forgery detection as of 2020, while highlighting the recent shift in research towards ways to attribute AI‐generated images to their generative sources with evidence. We define the various categories of deepfakes in existence, the subtle processing traces and fingerprints that distinguish AI‐generated images from reality and each other, and the different degrees of attribution possible with current understanding of generative algorithms. Additionally, we describe the limitations of synthetic image recognition methods in practice, the counter‐forensic attacks devised to exploit these limitations, and directions for new research to assure the long‐term relevance of deepfake forensics. Reliable, explainable, and generalizable attribution methods would hold malicious users accountable for AI‐enabled disinformation, grant plausible deniability to appropriate users, and facilitate intellectual property protection of deepfake technology.
由于大规模公开访问的数据集和深度生成建模的快速技术进步,合成媒体或“深度伪造”在视觉质量、多样性和真实感方面取得了巨大进步。数字图像取证的研究人员提出了不同的建议,以可靠地检测人工智能生成的图像,这预示着在线内容如何被信任的范式转变。然而,随着新应用的发展,图像真实性的二元分类不足以规范深度伪造技术的伦理使用。本文概述了截至2020年合成伪造检测的主要创新,同时强调了最近研究转向将人工智能生成的图像归因于其有证据的生成源的方法。我们定义了存在的各种类型的深度伪造,区分人工智能生成的图像与现实和彼此的微妙处理痕迹和指纹,以及当前对生成算法的理解可能的不同程度的归因。此外,我们还描述了合成图像识别方法在实践中的局限性,利用这些局限性设计的反取证攻击,以及新研究的方向,以确保深度伪造取证的长期相关性。可靠的、可解释的和可推广的归因方法将使恶意用户对人工智能支持的虚假信息负责,为适当的用户提供合理的推诿责任,并促进深度假冒技术的知识产权保护。
{"title":"Deepfake attribution: On the source identification of artificially generated images","authors":"Brandon Khoo, Raphaël C.-W. Phan, C. H. Lim","doi":"10.1002/widm.1438","DOIUrl":"https://doi.org/10.1002/widm.1438","url":null,"abstract":"Synthetic media or \"deepfakes\" are making great advances in visual quality, diversity, and verisimilitude, empowered by large‐scale publicly accessible datasets and rapid technical progress in deep generative modeling. Heralding a paradigm shift in how online content is trusted, researchers in digital image forensics have responded with different proposals to reliably detect AI‐generated images in the wild. However, binary classification of image authenticity is insufficient to regulate the ethical usage of deepfake technology as new applications are developed. This article provides an overview of the major innovations in synthetic forgery detection as of 2020, while highlighting the recent shift in research towards ways to attribute AI‐generated images to their generative sources with evidence. We define the various categories of deepfakes in existence, the subtle processing traces and fingerprints that distinguish AI‐generated images from reality and each other, and the different degrees of attribution possible with current understanding of generative algorithms. Additionally, we describe the limitations of synthetic image recognition methods in practice, the counter‐forensic attacks devised to exploit these limitations, and directions for new research to assure the long‐term relevance of deepfake forensics. Reliable, explainable, and generalizable attribution methods would hold malicious users accountable for AI‐enabled disinformation, grant plausible deniability to appropriate users, and facilitate intellectual property protection of deepfake technology.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"6 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85441320","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Explaining artificial intelligence with visual analytics in healthcare 用视觉分析解释医疗保健中的人工智能
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-11-28 DOI: 10.1002/widm.1427
Jeroen Ooge, G. Štiglic, K. Verbert
To make predictions and explore large datasets, healthcare is increasingly applying advanced algorithms of artificial intelligence. However, to make well‐considered and trustworthy decisions, healthcare professionals require ways to gain insights in these algorithms' outputs. One approach is visual analytics, which integrates humans in decision‐making through visualizations that facilitate interaction with algorithms. Although many visual analytics systems have been developed for healthcare, a clear overview of their explanation techniques is lacking. Therefore, we review 71 visual analytics systems for healthcare, and analyze how they explain advanced algorithms through visualization, interaction, shepherding, and direct explanation. Based on our analysis, we outline research opportunities and challenges to further guide the exciting rapprochement of visual analytics and healthcare.
为了进行预测和探索大型数据集,医疗保健越来越多地应用先进的人工智能算法。然而,为了做出经过深思熟虑和值得信赖的决策,医疗保健专业人员需要从这些算法的输出中获得见解。一种方法是视觉分析,它通过可视化来促进与算法的交互,将人类整合到决策中。尽管已经为医疗保健开发了许多可视化分析系统,但缺乏对其解释技术的清晰概述。因此,我们回顾了71个用于医疗保健的可视化分析系统,并分析了它们如何通过可视化、交互、引导和直接解释来解释高级算法。根据我们的分析,我们概述了研究的机遇和挑战,以进一步指导视觉分析和医疗保健的令人兴奋的和解。
{"title":"Explaining artificial intelligence with visual analytics in healthcare","authors":"Jeroen Ooge, G. Štiglic, K. Verbert","doi":"10.1002/widm.1427","DOIUrl":"https://doi.org/10.1002/widm.1427","url":null,"abstract":"To make predictions and explore large datasets, healthcare is increasingly applying advanced algorithms of artificial intelligence. However, to make well‐considered and trustworthy decisions, healthcare professionals require ways to gain insights in these algorithms' outputs. One approach is visual analytics, which integrates humans in decision‐making through visualizations that facilitate interaction with algorithms. Although many visual analytics systems have been developed for healthcare, a clear overview of their explanation techniques is lacking. Therefore, we review 71 visual analytics systems for healthcare, and analyze how they explain advanced algorithms through visualization, interaction, shepherding, and direct explanation. Based on our analysis, we outline research opportunities and challenges to further guide the exciting rapprochement of visual analytics and healthcare.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"290 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77503803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Deep learning in histopathology: A review 组织病理学中的深度学习:综述
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-11-21 DOI: 10.1002/widm.1439
S. Banerji, S. Mitra
Histopathology is diagnosis based on visual examination of tissue sections under a microscope. With the growing number of digitally scanned tissue slide images, computer‐based segmentation and classification of these images is a high‐demand area of research. Convolutional neural networks (CNNs) constitute the most popular classification architecture for a variety of image classification problems. However, applying CNNs to histology slides is not a trivial task and has several challenges, ranging from variations in the colors of slides to excessive high resolution and lack of proper labeling. In this advanced review, we introduce the application of CNN‐based architectures to digital histological image analysis, discuss some problems associated with such analysis, and look at possible solutions.
组织病理学是基于显微镜下组织切片的视觉检查的诊断。随着数字扫描组织切片图像数量的增加,基于计算机的图像分割和分类是一个高需求的研究领域。卷积神经网络(cnn)构成了各种图像分类问题中最流行的分类架构。然而,将cnn应用于组织学切片并不是一项微不足道的任务,并且存在一些挑战,从切片颜色的变化到过高的分辨率和缺乏适当的标记。在这篇高级综述中,我们介绍了基于CNN的架构在数字组织学图像分析中的应用,讨论了与此类分析相关的一些问题,并探讨了可能的解决方案。
{"title":"Deep learning in histopathology: A review","authors":"S. Banerji, S. Mitra","doi":"10.1002/widm.1439","DOIUrl":"https://doi.org/10.1002/widm.1439","url":null,"abstract":"Histopathology is diagnosis based on visual examination of tissue sections under a microscope. With the growing number of digitally scanned tissue slide images, computer‐based segmentation and classification of these images is a high‐demand area of research. Convolutional neural networks (CNNs) constitute the most popular classification architecture for a variety of image classification problems. However, applying CNNs to histology slides is not a trivial task and has several challenges, ranging from variations in the colors of slides to excessive high resolution and lack of proper labeling. In this advanced review, we introduce the application of CNN‐based architectures to digital histological image analysis, discuss some problems associated with such analysis, and look at possible solutions.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"127 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84912360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
Computational resources in healthcare 医疗保健中的计算资源
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-11-12 DOI: 10.1002/widm.1437
Neelam Sharma, Naorem Leimarembi Devi, Satakshi Gupta, G. Raghava
Healthcare is the most important component in the life of all human beings as each individual wish to have happy, healthy, and wealthy life‐span. Most of the branches of science are dedicated to improve the healthcare. In the era of knowledge mining, informatics is playing a crucial role in different branches of research. Thus, a wide range of informatics‐based fields have emerged in the last three decades that include medical informatics, bioinformatics, cheminformatics, pharmacoinformatics, immunoinformatics, and clinical informatics. In the past, a number of reviews have been focused on the application of an informatics‐based field in the healthcare. In this review, an attempt is made to summarize the major computational resources developed in any informatics‐based field that have an application in healthcare. This review enlists computational resources in following groups ‐ drug discovery, toxicity prediction, vaccine designing, disease biomarkers, and Internet of Things. We mainly focused on freely available, functional resources like data repositories, prediction models, standalone software, mobile apps, and web services. In order to provide service to the community, we developed a health portal that maintain links related to healthcare http://webs.iiitd.edu.in/.
医疗保健是人类生活中最重要的组成部分,因为每个人都希望拥有幸福、健康和富裕的生活。大多数科学分支都致力于改善医疗保健。在知识挖掘时代,信息学在各个研究领域发挥着至关重要的作用。因此,在过去的三十年中出现了广泛的基于信息学的领域,包括医学信息学、生物信息学、化学信息学、药物信息学、免疫信息学和临床信息学。在过去,许多评论都集中在基于信息学的领域在医疗保健中的应用。在这篇综述中,试图总结在任何基于信息学的领域中开发的主要计算资源,这些资源在医疗保健中有应用。本文综述了以下领域的计算资源——药物发现、毒性预测、疫苗设计、疾病生物标志物和物联网。我们主要关注免费可用的功能资源,如数据存储库、预测模型、独立软件、移动应用程序和web服务。为了向社区提供服务,我们开发了一个健康门户网站,维护与医疗保健相关的链接http://webs.iiitd.edu.in/。
{"title":"Computational resources in healthcare","authors":"Neelam Sharma, Naorem Leimarembi Devi, Satakshi Gupta, G. Raghava","doi":"10.1002/widm.1437","DOIUrl":"https://doi.org/10.1002/widm.1437","url":null,"abstract":"Healthcare is the most important component in the life of all human beings as each individual wish to have happy, healthy, and wealthy life‐span. Most of the branches of science are dedicated to improve the healthcare. In the era of knowledge mining, informatics is playing a crucial role in different branches of research. Thus, a wide range of informatics‐based fields have emerged in the last three decades that include medical informatics, bioinformatics, cheminformatics, pharmacoinformatics, immunoinformatics, and clinical informatics. In the past, a number of reviews have been focused on the application of an informatics‐based field in the healthcare. In this review, an attempt is made to summarize the major computational resources developed in any informatics‐based field that have an application in healthcare. This review enlists computational resources in following groups ‐ drug discovery, toxicity prediction, vaccine designing, disease biomarkers, and Internet of Things. We mainly focused on freely available, functional resources like data repositories, prediction models, standalone software, mobile apps, and web services. In order to provide service to the community, we developed a health portal that maintain links related to healthcare http://webs.iiitd.edu.in/.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"19 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82040784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Discovery of behavioral patterns in online social commerce practice 发现在线社交商务实践中的行为模式
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-10-27 DOI: 10.1002/widm.1433
Xiaoyun Jia, Ruili Wang, James H. Liu, Chuntao Jiang
Discovery of behavioral patterns in online social commerce practice becomes important in this digital era. In this article, we propose a systematic approach to behavioral pattern discovery, and apply it in an emerging online social commerce venue: live streaming. We investigate behavioral patterns in gifting encouragement in live streaming to understand online social commerce practice. Our proposed approach is based on multiple triangulation, including data source triangulation (i.e., streamers, viewers, and actual behavior) and data collection method triangulation (i.e., interviews, focus groups, and observations). Through multiple triangulation, four behavioral patterns of gifting encouragement are discovered: (i) requesting a certain gift for providing a particular service, (ii) creating a raffle, (iii) eliciting competition between individuals, and (iv) eliciting competition between groups. This research reveals the special behavioral patterns in live streaming, and thus increases our knowledge of social commerce practices. This research provides a systematic approach to discover online behavioral patterns, and provides practical implications in live streaming platforms, especially in marketing and platform design.
在这个数字时代,发现在线社交商务实践中的行为模式变得非常重要。在本文中,我们提出了一种系统的行为模式发现方法,并将其应用于新兴的在线社交商务场所:直播。我们调查了在直播中给予鼓励的行为模式,以了解在线社交商务实践。我们提出的方法是基于多重三角剖分,包括数据源三角剖分(即,主播,观众和实际行为)和数据收集方法三角剖分(即,访谈,焦点小组和观察)。通过多重三角测量,发现了四种鼓励赠与的行为模式:(i)为提供特定的服务而要求某种礼物,(ii)创造抽奖,(iii)引发个人之间的竞争,(iv)引发群体之间的竞争。本研究揭示了直播中特殊的行为模式,从而增加了我们对社交商务实践的认识。本研究提供了一种系统的方法来发现在线行为模式,并为直播平台提供了实际意义,特别是在营销和平台设计方面。
{"title":"Discovery of behavioral patterns in online social commerce practice","authors":"Xiaoyun Jia, Ruili Wang, James H. Liu, Chuntao Jiang","doi":"10.1002/widm.1433","DOIUrl":"https://doi.org/10.1002/widm.1433","url":null,"abstract":"Discovery of behavioral patterns in online social commerce practice becomes important in this digital era. In this article, we propose a systematic approach to behavioral pattern discovery, and apply it in an emerging online social commerce venue: live streaming. We investigate behavioral patterns in gifting encouragement in live streaming to understand online social commerce practice. Our proposed approach is based on multiple triangulation, including data source triangulation (i.e., streamers, viewers, and actual behavior) and data collection method triangulation (i.e., interviews, focus groups, and observations). Through multiple triangulation, four behavioral patterns of gifting encouragement are discovered: (i) requesting a certain gift for providing a particular service, (ii) creating a raffle, (iii) eliciting competition between individuals, and (iv) eliciting competition between groups. This research reveals the special behavioral patterns in live streaming, and thus increases our knowledge of social commerce practices. This research provides a systematic approach to discover online behavioral patterns, and provides practical implications in live streaming platforms, especially in marketing and platform design.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"53 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75309252","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A critical review of state‐of‐the‐art chatbot designs and applications 对最先进的聊天机器人设计和应用进行了评述
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-10-25 DOI: 10.1002/widm.1434
Bei Luo, Raymond Y. K. Lau, Chunping Li, Yain-Whar Si
Chatbots are intelligent conversational agents that can interact with users through natural languages. As chatbots can perform a variety of tasks, many companies have committed numerous resources to develop and deploy chatbots to enhance various business processes. However, we lack an up‐to‐date critical review that thoroughly examines both state‐of‐the‐art technologies and innovative applications of chatbots. In this review, we not only critically analyze the various computational approaches used to develop state‐of‐the‐art chatbots, but also thoroughly review the usability and applications of chatbots for various business sectors. We also identify gaps in chatbot‐related studies and propose new research directions to address the shortcomings of existing studies and applications. Our review advances both academic research and practical business applications of state‐of‐the‐art chatbots. We provide guidance for practitioners to fully realize the business value of chatbots and assist in making sensible decisions related to the development and deployment of chatbots in various business contexts. Researchers interested in the design and development of chatbots can also gain useful insights from our critical review and identify fruitful research topics and future research directions based on the research gaps discussed herein.
聊天机器人是智能对话代理,可以通过自然语言与用户进行交互。由于聊天机器人可以执行各种任务,许多公司已经投入了大量资源来开发和部署聊天机器人,以增强各种业务流程。然而,我们缺乏一份最新的批判性审查,以彻底检查最先进的技术和聊天机器人的创新应用。在这篇综述中,我们不仅批判性地分析了用于开发最先进聊天机器人的各种计算方法,而且还全面回顾了聊天机器人在各个业务部门的可用性和应用。我们还确定了聊天机器人相关研究中的空白,并提出了新的研究方向,以解决现有研究和应用的不足。我们的综述推进了最先进的聊天机器人的学术研究和实际商业应用。我们为从业者提供指导,以充分实现聊天机器人的业务价值,并帮助他们做出与各种业务环境中聊天机器人的开发和部署相关的明智决策。对聊天机器人的设计和开发感兴趣的研究人员也可以从我们的批判性评论中获得有用的见解,并根据本文讨论的研究差距确定富有成效的研究主题和未来的研究方向。
{"title":"A critical review of state‐of‐the‐art chatbot designs and applications","authors":"Bei Luo, Raymond Y. K. Lau, Chunping Li, Yain-Whar Si","doi":"10.1002/widm.1434","DOIUrl":"https://doi.org/10.1002/widm.1434","url":null,"abstract":"Chatbots are intelligent conversational agents that can interact with users through natural languages. As chatbots can perform a variety of tasks, many companies have committed numerous resources to develop and deploy chatbots to enhance various business processes. However, we lack an up‐to‐date critical review that thoroughly examines both state‐of‐the‐art technologies and innovative applications of chatbots. In this review, we not only critically analyze the various computational approaches used to develop state‐of‐the‐art chatbots, but also thoroughly review the usability and applications of chatbots for various business sectors. We also identify gaps in chatbot‐related studies and propose new research directions to address the shortcomings of existing studies and applications. Our review advances both academic research and practical business applications of state‐of‐the‐art chatbots. We provide guidance for practitioners to fully realize the business value of chatbots and assist in making sensible decisions related to the development and deployment of chatbots in various business contexts. Researchers interested in the design and development of chatbots can also gain useful insights from our critical review and identify fruitful research topics and future research directions based on the research gaps discussed herein.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"30 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90828654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 72
Themes in data mining, big data, and crime analytics 数据挖掘、大数据和犯罪分析的主题
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-10-18 DOI: 10.1002/widm.1432
G. Oatley
This article examines the impact of new AI‐related technologies in data mining and big data on important research questions in crime analytics. Because the field is so broad, the review focuses on a selection of the most important topics. Challenges for information management, and in turn law and society, include: AI‐powered predictive policing; big data for legal and adversarial decisions; bias using big data and analytics in profiling and predicting criminality; forecasting crime risk and crime rates; and, regulating AI systems.
本文探讨了数据挖掘和大数据中新的人工智能相关技术对犯罪分析中重要研究问题的影响。由于这个领域是如此的广泛,本评论集中在一些最重要的主题上。信息管理以及法律和社会面临的挑战包括:人工智能驱动的预测性警务;法律和对抗决策的大数据;在分析和预测犯罪行为时使用大数据和分析的偏见;预测犯罪风险和犯罪率;以及调节人工智能系统。
{"title":"Themes in data mining, big data, and crime analytics","authors":"G. Oatley","doi":"10.1002/widm.1432","DOIUrl":"https://doi.org/10.1002/widm.1432","url":null,"abstract":"This article examines the impact of new AI‐related technologies in data mining and big data on important research questions in crime analytics. Because the field is so broad, the review focuses on a selection of the most important topics. Challenges for information management, and in turn law and society, include: AI‐powered predictive policing; big data for legal and adversarial decisions; bias using big data and analytics in profiling and predicting criminality; forecasting crime risk and crime rates; and, regulating AI systems.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"431 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79585296","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 17
Predicting home sale prices: A review of existing methods and illustration of data stream methods for improved performance 预测房屋销售价格:现有方法的回顾和数据流方法的说明,以提高性能
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-10-18 DOI: 10.1002/widm.1435
Donghui Shi, J. Guan, J. Zurada, Alan Levitan
The need for accurate and unbiased assessment of residential real property has always been important not only to financial institutions lending on or holding such assets but also to municipalities that rely on property taxes as their critical source of revenue. The common methodology for predicting residential property sale price is based on traditional multiple regression in spite of known issues. Machine learning methods have been proposed as an alternative approach but the results are far from satisfactory. A review of existing studies and relevant issues can help researchers better assess the pros and cons of the approaches in this important stream of research and move the field forward. This article provides such a review. In our review, we have noticed that common to both the regression‐based methods and machine learning methods are the use of batch‐mode learning. Thus in addition to providing a review of recent research on batch‐based residential property prediction models, this article also explores a new approach to constructing residential property price prediction models by treating past sale records as an evolving data stream. The results of our study show that the data stream approach outperforms the traditional regression method and demonstrate the potential of data stream methods in improving prediction models for residential property prices.
对住宅房地产进行准确、公正的评估,不仅对贷款或持有此类资产的金融机构很重要,对依赖房产税作为关键收入来源的市政当局也很重要。尽管存在已知的问题,但预测住宅物业销售价格的常用方法是基于传统的多元回归。机器学习方法已经被提出作为一种替代方法,但结果远不能令人满意。对现有研究和相关问题的回顾可以帮助研究人员更好地评估这一重要研究流中方法的优缺点,并推动该领域向前发展。本文提供了这样的回顾。在我们的回顾中,我们注意到基于回归的方法和机器学习方法的共同点是使用批处理模式学习。因此,除了对基于批量的住宅物业预测模型的最新研究进行回顾外,本文还探索了一种通过将过去的销售记录视为不断发展的数据流来构建住宅物业价格预测模型的新方法。我们的研究结果表明,数据流方法优于传统的回归方法,并展示了数据流方法在改进住宅物业价格预测模型方面的潜力。
{"title":"Predicting home sale prices: A review of existing methods and illustration of data stream methods for improved performance","authors":"Donghui Shi, J. Guan, J. Zurada, Alan Levitan","doi":"10.1002/widm.1435","DOIUrl":"https://doi.org/10.1002/widm.1435","url":null,"abstract":"The need for accurate and unbiased assessment of residential real property has always been important not only to financial institutions lending on or holding such assets but also to municipalities that rely on property taxes as their critical source of revenue. The common methodology for predicting residential property sale price is based on traditional multiple regression in spite of known issues. Machine learning methods have been proposed as an alternative approach but the results are far from satisfactory. A review of existing studies and relevant issues can help researchers better assess the pros and cons of the approaches in this important stream of research and move the field forward. This article provides such a review. In our review, we have noticed that common to both the regression‐based methods and machine learning methods are the use of batch‐mode learning. Thus in addition to providing a review of recent research on batch‐based residential property prediction models, this article also explores a new approach to constructing residential property price prediction models by treating past sale records as an evolving data stream. The results of our study show that the data stream approach outperforms the traditional regression method and demonstrate the potential of data stream methods in improving prediction models for residential property prices.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"21 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87352821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multivariate temporal data analysis ‐ a review 多变量时间数据分析-综述
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-10-11 DOI: 10.1002/widm.1430
Robert Moskovitch
The information technology revolution, especially with the adoption of the Internet of Things, longitudinal data in many domains become more available and accessible for secondary analysis. Such data provide meaningful opportunities to understand process in many domains along time, but also challenges. A main challenge is the heterogeneity of the temporal variables due to the different types of data, whether a measurement or an event, and type of samplings: fixed or irregular. Other variables can be also events that may or not have duration. In this review, we discuss the various types of temporal data, and the various relevant analysis methods. Starting with fixed frequency variables, with forecasting and time series methods, and proceeding with sequential data, and sequential patterns mining, and time intervals mining for events having various time duration. Also the use of various deep learning based architectures for temporal data is discussed. The challenge of heterogeneous multivariate temporal data analysis and discuss various options to deal with it, focusing on an increasingly used option of transforming the data into symbolic time intervals through temporal abstraction and the use of time intervals related patterns discovery for temporal knowledge discovery, clustering, classification prediction, and more. Finally, we discuss the overview of the field, and areas in which more studies and contributions are needed.
信息技术革命,特别是随着物联网的采用,许多领域的纵向数据变得更容易获得,并可用于二次分析。随着时间的推移,这些数据为理解许多领域的流程提供了有意义的机会,但也带来了挑战。一个主要的挑战是由于不同类型的数据(无论是测量还是事件)和采样类型(固定或不规则)造成的时间变量的异质性。其他变量也可以是事件,可能有持续时间,也可能没有持续时间。在本文中,我们讨论了各种类型的时间数据,以及各种相关的分析方法。从固定频率变量开始,使用预测和时间序列方法,然后继续使用顺序数据、顺序模式挖掘和时间间隔挖掘,以挖掘具有不同持续时间的事件。此外,还讨论了各种基于深度学习的结构对时态数据的使用。异构多变量时间数据分析的挑战,并讨论了处理它的各种选项,重点关注通过时间抽象将数据转换为符号时间间隔的日益使用的选项,以及使用与时间间隔相关的模式发现进行时间知识发现、聚类、分类预测等。最后,我们讨论了该领域的概况,以及需要进一步研究和贡献的领域。
{"title":"Multivariate temporal data analysis ‐ a review","authors":"Robert Moskovitch","doi":"10.1002/widm.1430","DOIUrl":"https://doi.org/10.1002/widm.1430","url":null,"abstract":"The information technology revolution, especially with the adoption of the Internet of Things, longitudinal data in many domains become more available and accessible for secondary analysis. Such data provide meaningful opportunities to understand process in many domains along time, but also challenges. A main challenge is the heterogeneity of the temporal variables due to the different types of data, whether a measurement or an event, and type of samplings: fixed or irregular. Other variables can be also events that may or not have duration. In this review, we discuss the various types of temporal data, and the various relevant analysis methods. Starting with fixed frequency variables, with forecasting and time series methods, and proceeding with sequential data, and sequential patterns mining, and time intervals mining for events having various time duration. Also the use of various deep learning based architectures for temporal data is discussed. The challenge of heterogeneous multivariate temporal data analysis and discuss various options to deal with it, focusing on an increasingly used option of transforming the data into symbolic time intervals through temporal abstraction and the use of time intervals related patterns discovery for temporal knowledge discovery, clustering, classification prediction, and more. Finally, we discuss the overview of the field, and areas in which more studies and contributions are needed.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"5 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80586892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Time series analysis via network science: Concepts and algorithms 通过网络科学进行时间序列分析:概念和算法
IF 7.8 2区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Pub Date : 2021-10-11 DOI: 10.1002/widm.1404
V. Silva, Maria Eduarda Silva, P. Ribeiro, Fernando M. A. Silva
There is nowadays a constant flux of data being generated and collected in all types of real world systems. These data sets are often indexed by time, space, or both requiring appropriate approaches to analyze the data. In univariate settings, time series analysis is a mature field. However, in multivariate contexts, time series analysis still presents many limitations. In order to address these issues, the last decade has brought approaches based on network science. These methods involve transforming an initial time series data set into one or more networks, which can be analyzed in depth to provide insight into the original time series. This review provides a comprehensive overview of existing mapping methods for transforming time series into networks for a wide audience of researchers and practitioners in machine learning, data mining, and time series. Our main contribution is a structured review of existing methodologies, identifying their main characteristics, and their differences. We describe the main conceptual approaches, provide authoritative references and give insight into their advantages and limitations in a unified way and language. We first describe the case of univariate time series, which can be mapped to single layer networks, and we divide the current mappings based on the underlying concept: visibility, transition, and proximity. We then proceed with multivariate time series discussing both single layer and multiple layer approaches. Although still very recent, this research area has much potential and with this survey we intend to pave the way for future research on the topic.
如今,在各种类型的现实世界系统中,不断有数据生成和收集。这些数据集通常按时间、空间或两者进行索引,需要适当的方法来分析数据。在单变量环境下,时间序列分析是一个成熟的领域。然而,在多变量环境下,时间序列分析仍然存在许多局限性。为了解决这些问题,过去十年出现了基于网络科学的方法。这些方法包括将初始时间序列数据集转换为一个或多个网络,可以对其进行深入分析,以提供对原始时间序列的洞察。本综述为机器学习、数据挖掘和时间序列领域的研究人员和实践者提供了将时间序列转换为网络的现有映射方法的全面概述。我们的主要贡献是对现有方法进行结构化的回顾,确定它们的主要特征及其差异。我们以统一的方式和语言描述了主要的概念方法,提供了权威的参考,并洞察了它们的优点和局限性。我们首先描述了可以映射到单层网络的单变量时间序列的情况,并根据基本概念:可见性、过渡和接近性划分当前映射。然后我们继续讨论多变量时间序列,包括单层和多层方法。虽然这个研究领域还很新,但它有很大的潜力,通过这项调查,我们打算为未来的研究铺平道路。
{"title":"Time series analysis via network science: Concepts and algorithms","authors":"V. Silva, Maria Eduarda Silva, P. Ribeiro, Fernando M. A. Silva","doi":"10.1002/widm.1404","DOIUrl":"https://doi.org/10.1002/widm.1404","url":null,"abstract":"There is nowadays a constant flux of data being generated and collected in all types of real world systems. These data sets are often indexed by time, space, or both requiring appropriate approaches to analyze the data. In univariate settings, time series analysis is a mature field. However, in multivariate contexts, time series analysis still presents many limitations. In order to address these issues, the last decade has brought approaches based on network science. These methods involve transforming an initial time series data set into one or more networks, which can be analyzed in depth to provide insight into the original time series. This review provides a comprehensive overview of existing mapping methods for transforming time series into networks for a wide audience of researchers and practitioners in machine learning, data mining, and time series. Our main contribution is a structured review of existing methodologies, identifying their main characteristics, and their differences. We describe the main conceptual approaches, provide authoritative references and give insight into their advantages and limitations in a unified way and language. We first describe the case of univariate time series, which can be mapped to single layer networks, and we divide the current mappings based on the underlying concept: visibility, transition, and proximity. We then proceed with multivariate time series discussing both single layer and multiple layer approaches. Although still very recent, this research area has much potential and with this survey we intend to pave the way for future research on the topic.","PeriodicalId":48970,"journal":{"name":"Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery","volume":"36 1","pages":""},"PeriodicalIF":7.8,"publicationDate":"2021-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87801125","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 25
期刊
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1