{"title":"基于卷积神经网络的文本文档表示与分类","authors":"Shikha Mundra, Ankit Mundra, Anshul Saigal, Punit Gupta","doi":"10.1109/PDGC50313.2020.9315752","DOIUrl":null,"url":null,"abstract":"Understanding Actual meaning of a natural written language document is easy for a human but to enable a machine to do the same task require an accurate document representation as a machine do not have the same common sense as human have. For the task of document classification, it is required that text must be converted to numerical vector and recently, word embedding approaches are giving acceptable results in terms of word representation at global context level. In this study author has experimented with news dataset of multiple domain and compared the classification performance obtained from traditional bag of word model to word2vec model and found that word2vec is giving promising results in case of large vocabulary with low dimensionality which will help to classify the data dynamically as demonstrated in section experimental result.","PeriodicalId":347216,"journal":{"name":"2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Text document representation and classification using Convolution Neural Network\",\"authors\":\"Shikha Mundra, Ankit Mundra, Anshul Saigal, Punit Gupta\",\"doi\":\"10.1109/PDGC50313.2020.9315752\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Understanding Actual meaning of a natural written language document is easy for a human but to enable a machine to do the same task require an accurate document representation as a machine do not have the same common sense as human have. For the task of document classification, it is required that text must be converted to numerical vector and recently, word embedding approaches are giving acceptable results in terms of word representation at global context level. In this study author has experimented with news dataset of multiple domain and compared the classification performance obtained from traditional bag of word model to word2vec model and found that word2vec is giving promising results in case of large vocabulary with low dimensionality which will help to classify the data dynamically as demonstrated in section experimental result.\",\"PeriodicalId\":347216,\"journal\":{\"name\":\"2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC)\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDGC50313.2020.9315752\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDGC50313.2020.9315752","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Text document representation and classification using Convolution Neural Network
Understanding Actual meaning of a natural written language document is easy for a human but to enable a machine to do the same task require an accurate document representation as a machine do not have the same common sense as human have. For the task of document classification, it is required that text must be converted to numerical vector and recently, word embedding approaches are giving acceptable results in terms of word representation at global context level. In this study author has experimented with news dataset of multiple domain and compared the classification performance obtained from traditional bag of word model to word2vec model and found that word2vec is giving promising results in case of large vocabulary with low dimensionality which will help to classify the data dynamically as demonstrated in section experimental result.