{"title":"机器学习:数据预处理","authors":"Michael G. Pecht, Myeongsu Kang","doi":"10.1002/9781119515326.CH5","DOIUrl":null,"url":null,"abstract":"In prognostics and health management (PHM), data pre‐processing generally involves the following tasks: data cleansing, normalization, feature discovery, and imbalanced data management. Data cleansing is the process of detecting and correcting corrupt or inaccurate data. Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work. Feature extraction, also known as dimensionality reduction, is the transformation of high‐dimensional data into a meaningful representation of reduced dimensionality, which should have a dimensionality that corresponds to the intrinsic dimensionality of the data. Linear discriminant analysis (LDA) is commonly used as a dimensionality reduction technique in the data pre‐processing step for classification and machine learning applications. Feature selection, also called variable selection/attribute selection, is the process of selecting a subset of relevant features for use in model construction. The synthetic minority oversampling technique (SMOTE) algorithm produces artificial data based on the feature space similarities between minority data points.","PeriodicalId":163377,"journal":{"name":"Prognostics and Health Management of Electronics","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Machine Learning: Data Pre-processing\",\"authors\":\"Michael G. Pecht, Myeongsu Kang\",\"doi\":\"10.1002/9781119515326.CH5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In prognostics and health management (PHM), data pre‐processing generally involves the following tasks: data cleansing, normalization, feature discovery, and imbalanced data management. Data cleansing is the process of detecting and correcting corrupt or inaccurate data. Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work. Feature extraction, also known as dimensionality reduction, is the transformation of high‐dimensional data into a meaningful representation of reduced dimensionality, which should have a dimensionality that corresponds to the intrinsic dimensionality of the data. Linear discriminant analysis (LDA) is commonly used as a dimensionality reduction technique in the data pre‐processing step for classification and machine learning applications. Feature selection, also called variable selection/attribute selection, is the process of selecting a subset of relevant features for use in model construction. The synthetic minority oversampling technique (SMOTE) algorithm produces artificial data based on the feature space similarities between minority data points.\",\"PeriodicalId\":163377,\"journal\":{\"name\":\"Prognostics and Health Management of Electronics\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-08-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Prognostics and Health Management of Electronics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/9781119515326.CH5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Prognostics and Health Management of Electronics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/9781119515326.CH5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In prognostics and health management (PHM), data pre‐processing generally involves the following tasks: data cleansing, normalization, feature discovery, and imbalanced data management. Data cleansing is the process of detecting and correcting corrupt or inaccurate data. Feature engineering is the process of using domain knowledge of the data to create features that make machine learning algorithms work. Feature extraction, also known as dimensionality reduction, is the transformation of high‐dimensional data into a meaningful representation of reduced dimensionality, which should have a dimensionality that corresponds to the intrinsic dimensionality of the data. Linear discriminant analysis (LDA) is commonly used as a dimensionality reduction technique in the data pre‐processing step for classification and machine learning applications. Feature selection, also called variable selection/attribute selection, is the process of selecting a subset of relevant features for use in model construction. The synthetic minority oversampling technique (SMOTE) algorithm produces artificial data based on the feature space similarities between minority data points.