基于潜在语义索引的文档分类方法

Jeong-Joon Kim, Yong-Soo Lee, Jin-Yong Moon, Jeongmin Park
{"title":"基于潜在语义索引的文档分类方法","authors":"Jeong-Joon Kim, Yong-Soo Lee, Jin-Yong Moon, Jeongmin Park","doi":"10.14257/IJGDC.2018.11.4.09","DOIUrl":null,"url":null,"abstract":"Among the studies, Latent Semantic Indexing and Non-negative Matrix Factorization, which are algorithms to classify the document by meaning, try solve the problems by converting the document to vector. However, there are 2 problems in these algorithms that the different understanding according to education document and the difficulties to analyze the multiple representations of the terms. Meanwhile, WordNet is a word dictionary interpreting the relationship of the words based on Human Intelligence Science and widely used in such as query term extension of the search engine. However, it is difficult to adapt to the neologism and slang and word meaning change to fast-changing time. Therefore, in this paper we solve the problem of the multiple representations of the words by partly applying the words relationship of the WordNet to Latent Semantic Indexing using by genetic algorithms for more efficient clustering document with the strength and weakness of the Latent Semantic Indexing and WordNet. And with this we try to improve precision and increase the efficiency of the overall clusters","PeriodicalId":46000,"journal":{"name":"International Journal of Grid and Distributed Computing","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2018-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Document Classification Method based on Latent Semantic Indexing\",\"authors\":\"Jeong-Joon Kim, Yong-Soo Lee, Jin-Yong Moon, Jeongmin Park\",\"doi\":\"10.14257/IJGDC.2018.11.4.09\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Among the studies, Latent Semantic Indexing and Non-negative Matrix Factorization, which are algorithms to classify the document by meaning, try solve the problems by converting the document to vector. However, there are 2 problems in these algorithms that the different understanding according to education document and the difficulties to analyze the multiple representations of the terms. Meanwhile, WordNet is a word dictionary interpreting the relationship of the words based on Human Intelligence Science and widely used in such as query term extension of the search engine. However, it is difficult to adapt to the neologism and slang and word meaning change to fast-changing time. Therefore, in this paper we solve the problem of the multiple representations of the words by partly applying the words relationship of the WordNet to Latent Semantic Indexing using by genetic algorithms for more efficient clustering document with the strength and weakness of the Latent Semantic Indexing and WordNet. And with this we try to improve precision and increase the efficiency of the overall clusters\",\"PeriodicalId\":46000,\"journal\":{\"name\":\"International Journal of Grid and Distributed Computing\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-04-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Grid and Distributed Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14257/IJGDC.2018.11.4.09\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Grid and Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14257/IJGDC.2018.11.4.09","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

其中,Latent Semantic Indexing和Non-negative Matrix Factorization这两种按意义对文档进行分类的算法,试图通过将文档转化为向量来解决问题。然而,这些算法存在两个问题,即根据教育文档的不同理解和分析术语的多种表示的困难。同时,WordNet是一个基于人类智能科学解释词之间关系的词字典,广泛应用于搜索引擎的查询词扩展等。然而,要适应瞬息万变的时代,新词、俚语和词义的变化是很困难的。因此,本文通过将WordNet的词关系部分应用到潜在语义索引中,利用遗传算法结合潜在语义索引和WordNet的优缺点,更有效地聚类文档,解决了词的多重表示问题。通过这种方法,我们试图提高整个集群的精度和效率
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Document Classification Method based on Latent Semantic Indexing
Among the studies, Latent Semantic Indexing and Non-negative Matrix Factorization, which are algorithms to classify the document by meaning, try solve the problems by converting the document to vector. However, there are 2 problems in these algorithms that the different understanding according to education document and the difficulties to analyze the multiple representations of the terms. Meanwhile, WordNet is a word dictionary interpreting the relationship of the words based on Human Intelligence Science and widely used in such as query term extension of the search engine. However, it is difficult to adapt to the neologism and slang and word meaning change to fast-changing time. Therefore, in this paper we solve the problem of the multiple representations of the words by partly applying the words relationship of the WordNet to Latent Semantic Indexing using by genetic algorithms for more efficient clustering document with the strength and weakness of the Latent Semantic Indexing and WordNet. And with this we try to improve precision and increase the efficiency of the overall clusters
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of Grid and Distributed Computing
International Journal of Grid and Distributed Computing COMPUTER SCIENCE, SOFTWARE ENGINEERING-
自引率
0.00%
发文量
0
期刊介绍: IJGDC aims to facilitate and support research related to control and automation technology and its applications. Our Journal provides a chance for academic and industry professionals to discuss recent progress in the area of control and automation. To bridge the gap of users who do not have access to major databases where one should pay for every downloaded article; this online publication platform is open to all readers as part of our commitment to global scientific society. Journal Topics: -Architectures and Fabrics -Autonomic and Adaptive Systems -Cluster and Grid Integration -Creation and Management of Virtual Enterprises and Organizations -Dependable and Survivable Distributed Systems -Distributed and Large-Scale Data Access and Management -Distributed Multimedia Systems -Distributed Trust Management -eScience and eBusiness Applications -Fuzzy Algorithm -Grid Economy and Business Models -Histogram Methodology -Image or Speech Filtering -Image or Speech Recognition -Information Services -Large-Scale Group Communication -Metadata, Ontologies, and Provenance -Middleware and Toolkits -Monitoring, Management and Organization Tools -Networking and Security -Novel Distributed Applications -Performance Measurement and Modeling -Pervasive Computing -Problem Solving Environments -Programming Models, Tools and Environments -QoS and resource management -Real-time and Embedded Systems -Security and Trust in Grid and Distributed Systems -Sensor Networks -Utility Computing on Global Grids -Web Services and Service-Oriented Architecture -Wireless and Mobile Ad Hoc Networks -Workflow and Multi-agent Systems
期刊最新文献
Malicious Items Detection at Public Places using Deep Learning Methods An Efficient Contribution to Computing the Skyline on GPU Evaluating Interactive Visualization Techniques on Small Touch Screen Devices Medical Data Compression and Transmission in Noisy WLANS: A Review Comparative Study of Quadrature Booster in Different Locations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1