{"title":"改进主题建模的文档表示","authors":"P. V. Poojitha, R. Menon","doi":"10.2139/ssrn.3733546","DOIUrl":null,"url":null,"abstract":"Each and every day we are collecting lots of information from web applications. So it is difficult to understand or detect what the whole information is all about. To detect, understand and summarise the whole information we need some specific tools and techniques like topic modelling which helps to analyze and identify the crisp of the data. This paper implements the sparsity based document representation to improve Topic Modeling, it organizes the data with meaningful structure by using machine learning algorithms like LDA(Latent Dirichlet Allocation) and OMP(Orthogonal Matching Pursuit) algorithms. It identifies a documents belongs to which topic as well as similarity between documents in an existing dictionary. The OMP(Orthogonal Matching Pursuit) algorithm is the best algorithm for sparse approximation With better accuracy. OMP(Orthogonal Matching Pursuit) algorithm can identify the topics to which the input document[Y] is mostly related to across a large collection of text documents present in a dictionary.","PeriodicalId":346706,"journal":{"name":"2020 International Conference on Software Security and Assurance (ICSSA)","volume":"29 10","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Document Representations to Improve Topic Modelling\",\"authors\":\"P. V. Poojitha, R. Menon\",\"doi\":\"10.2139/ssrn.3733546\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Each and every day we are collecting lots of information from web applications. So it is difficult to understand or detect what the whole information is all about. To detect, understand and summarise the whole information we need some specific tools and techniques like topic modelling which helps to analyze and identify the crisp of the data. This paper implements the sparsity based document representation to improve Topic Modeling, it organizes the data with meaningful structure by using machine learning algorithms like LDA(Latent Dirichlet Allocation) and OMP(Orthogonal Matching Pursuit) algorithms. It identifies a documents belongs to which topic as well as similarity between documents in an existing dictionary. The OMP(Orthogonal Matching Pursuit) algorithm is the best algorithm for sparse approximation With better accuracy. OMP(Orthogonal Matching Pursuit) algorithm can identify the topics to which the input document[Y] is mostly related to across a large collection of text documents present in a dictionary.\",\"PeriodicalId\":346706,\"journal\":{\"name\":\"2020 International Conference on Software Security and Assurance (ICSSA)\",\"volume\":\"29 10\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Software Security and Assurance (ICSSA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2139/ssrn.3733546\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Software Security and Assurance (ICSSA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2139/ssrn.3733546","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Document Representations to Improve Topic Modelling
Each and every day we are collecting lots of information from web applications. So it is difficult to understand or detect what the whole information is all about. To detect, understand and summarise the whole information we need some specific tools and techniques like topic modelling which helps to analyze and identify the crisp of the data. This paper implements the sparsity based document representation to improve Topic Modeling, it organizes the data with meaningful structure by using machine learning algorithms like LDA(Latent Dirichlet Allocation) and OMP(Orthogonal Matching Pursuit) algorithms. It identifies a documents belongs to which topic as well as similarity between documents in an existing dictionary. The OMP(Orthogonal Matching Pursuit) algorithm is the best algorithm for sparse approximation With better accuracy. OMP(Orthogonal Matching Pursuit) algorithm can identify the topics to which the input document[Y] is mostly related to across a large collection of text documents present in a dictionary.