不完整数据分析

Applications of Pattern Recognition Pub Date : 2020-11-04 DOI:10.5772/INTECHOPEN.94068

Bo-Wei Chen, Jia-Ching Wang

{"title":"不完整数据分析","authors":"Bo-Wei Chen, Jia-Ching Wang","doi":"10.5772/INTECHOPEN.94068","DOIUrl":null,"url":null,"abstract":"This chapter discusses missing-value problems from the perspective of machine learning. Missing values frequently occur during data acquisition. When a dataset contains missing values, nonvectorial data are generated. This subsequently causes a serious problem in pattern recognition models because nonvectorial data need further data wrangling before models are built. In view of such, this chapter reviews the methodologies of related works and examines their empirical effectiveness. At present, a great deal of effort has been devoted in this field, and those works can be roughly divided into two types — Multiple imputation and single imputation, where the latter can be further classified into subcategories. They include deletion, fixed-value replacement, K-Nearest Neighbors, regression, tree-based algorithms, and latent component-based approaches. In this chapter, those approaches are introduced and commented. Finally, numerical examples are provided along with recommendations on future development.","PeriodicalId":169871,"journal":{"name":"Applications of Pattern Recognition","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Incomplete Data Analysis\",\"authors\":\"Bo-Wei Chen, Jia-Ching Wang\",\"doi\":\"10.5772/INTECHOPEN.94068\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This chapter discusses missing-value problems from the perspective of machine learning. Missing values frequently occur during data acquisition. When a dataset contains missing values, nonvectorial data are generated. This subsequently causes a serious problem in pattern recognition models because nonvectorial data need further data wrangling before models are built. In view of such, this chapter reviews the methodologies of related works and examines their empirical effectiveness. At present, a great deal of effort has been devoted in this field, and those works can be roughly divided into two types — Multiple imputation and single imputation, where the latter can be further classified into subcategories. They include deletion, fixed-value replacement, K-Nearest Neighbors, regression, tree-based algorithms, and latent component-based approaches. In this chapter, those approaches are introduced and commented. Finally, numerical examples are provided along with recommendations on future development.\",\"PeriodicalId\":169871,\"journal\":{\"name\":\"Applications of Pattern Recognition\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applications of Pattern Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5772/INTECHOPEN.94068\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applications of Pattern Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5772/INTECHOPEN.94068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本章从机器学习的角度讨论了缺失值问题。在数据采集过程中经常会出现丢失值的情况。当数据集包含缺失值时，将生成非矢量数据。这在模式识别模型中导致了一个严重的问题，因为在模型建立之前，非向量数据需要进一步的数据整理。鉴于此，本章回顾了相关工作的方法，并检验了它们的实证有效性。目前，这方面的工作已经做了大量的工作，大致可以分为两类，即多次归算和单次归算，单次归算又可以进一步细分。它们包括删除、固定值替换、k近邻、回归、基于树的算法和基于潜在组件的方法。本章对这些方法进行了介绍和评述。最后给出了数值算例，并对今后的发展提出了建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Incomplete Data Analysis

This chapter discusses missing-value problems from the perspective of machine learning. Missing values frequently occur during data acquisition. When a dataset contains missing values, nonvectorial data are generated. This subsequently causes a serious problem in pattern recognition models because nonvectorial data need further data wrangling before models are built. In view of such, this chapter reviews the methodologies of related works and examines their empirical effectiveness. At present, a great deal of effort has been devoted in this field, and those works can be roughly divided into two types — Multiple imputation and single imputation, where the latter can be further classified into subcategories. They include deletion, fixed-value replacement, K-Nearest Neighbors, regression, tree-based algorithms, and latent component-based approaches. In this chapter, those approaches are introduced and commented. Finally, numerical examples are provided along with recommendations on future development.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Applications of Pattern Recognition

自引率

0.00%

发文量

期刊最新文献

Incomplete Data Analysis