{"title":"非结构化文本文档候选特征提取与分类研究进展","authors":"P. P. Shelke, Aditya A Pardeshi","doi":"10.1109/ICCMC48092.2020.ICCMC-00017","DOIUrl":null,"url":null,"abstract":"Word is a primary unit in the sentences, which contains some extra information. This extra information is crucial in the candidate feature categorization progression. To gain such information the established techniques mine the candidate feature via n gram and noun phrase based approaches, but such approaches ignore the grammatical structure, which laid in huge quantity of insubstantial features. This paper inspects various mechanisms for feature mining and various issues are explored. A system is propounded which is based on tree structure for the candidate feature mining and branches of the tree are extracted using part-of-speech (POS) labelling for candidate phrase. To avoided redundant phrases, filtering is recommended. Finally, machine learning is used for the progression of feature categorization.","PeriodicalId":130581,"journal":{"name":"2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC)","volume":"240 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Review on Candidate Feature Extraction and Categorization for Unstructured Text Document\",\"authors\":\"P. P. Shelke, Aditya A Pardeshi\",\"doi\":\"10.1109/ICCMC48092.2020.ICCMC-00017\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Word is a primary unit in the sentences, which contains some extra information. This extra information is crucial in the candidate feature categorization progression. To gain such information the established techniques mine the candidate feature via n gram and noun phrase based approaches, but such approaches ignore the grammatical structure, which laid in huge quantity of insubstantial features. This paper inspects various mechanisms for feature mining and various issues are explored. A system is propounded which is based on tree structure for the candidate feature mining and branches of the tree are extracted using part-of-speech (POS) labelling for candidate phrase. To avoided redundant phrases, filtering is recommended. Finally, machine learning is used for the progression of feature categorization.\",\"PeriodicalId\":130581,\"journal\":{\"name\":\"2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC)\",\"volume\":\"240 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCMC48092.2020.ICCMC-00017\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCMC48092.2020.ICCMC-00017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Review on Candidate Feature Extraction and Categorization for Unstructured Text Document
Word is a primary unit in the sentences, which contains some extra information. This extra information is crucial in the candidate feature categorization progression. To gain such information the established techniques mine the candidate feature via n gram and noun phrase based approaches, but such approaches ignore the grammatical structure, which laid in huge quantity of insubstantial features. This paper inspects various mechanisms for feature mining and various issues are explored. A system is propounded which is based on tree structure for the candidate feature mining and branches of the tree are extracted using part-of-speech (POS) labelling for candidate phrase. To avoided redundant phrases, filtering is recommended. Finally, machine learning is used for the progression of feature categorization.