{"title":"基于主成分分析的文本分类特征提取","authors":"Safae Lhazmir, Ismail El Moudden, A. Kobbane","doi":"10.23919/PEMWN.2017.8308030","DOIUrl":null,"url":null,"abstract":"Over the past 20 years, data has increased in a large scale in various fields. Internet of Things (IoT), for instance, comprises billions of devices and the data streams coming from these devices challenge the traditional approaches to data management and contribute to the emerging paradigm of big data. To be able to handle such data adequately, it is necessary to reduce their dimensionality to a size more compatible with the resolution methods, even if this reduction can lead to a slight loss of information. The aim of this paper is to study the potential of dimensionality reduction in text categorization of a publicly available dataset CNAE-9.","PeriodicalId":383978,"journal":{"name":"2017 International Conference on Performance Evaluation and Modeling in Wired and Wireless Networks (PEMWN)","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Feature extraction based on principal component analysis for text categorization\",\"authors\":\"Safae Lhazmir, Ismail El Moudden, A. Kobbane\",\"doi\":\"10.23919/PEMWN.2017.8308030\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Over the past 20 years, data has increased in a large scale in various fields. Internet of Things (IoT), for instance, comprises billions of devices and the data streams coming from these devices challenge the traditional approaches to data management and contribute to the emerging paradigm of big data. To be able to handle such data adequately, it is necessary to reduce their dimensionality to a size more compatible with the resolution methods, even if this reduction can lead to a slight loss of information. The aim of this paper is to study the potential of dimensionality reduction in text categorization of a publicly available dataset CNAE-9.\",\"PeriodicalId\":383978,\"journal\":{\"name\":\"2017 International Conference on Performance Evaluation and Modeling in Wired and Wireless Networks (PEMWN)\",\"volume\":\"77 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 International Conference on Performance Evaluation and Modeling in Wired and Wireless Networks (PEMWN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/PEMWN.2017.8308030\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 International Conference on Performance Evaluation and Modeling in Wired and Wireless Networks (PEMWN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/PEMWN.2017.8308030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Feature extraction based on principal component analysis for text categorization
Over the past 20 years, data has increased in a large scale in various fields. Internet of Things (IoT), for instance, comprises billions of devices and the data streams coming from these devices challenge the traditional approaches to data management and contribute to the emerging paradigm of big data. To be able to handle such data adequately, it is necessary to reduce their dimensionality to a size more compatible with the resolution methods, even if this reduction can lead to a slight loss of information. The aim of this paper is to study the potential of dimensionality reduction in text categorization of a publicly available dataset CNAE-9.