基于文本相似度的法规推荐

Jin Zeng, Jidong Ge, Yemao Zhou, Yi Feng, Chuanyi Li, Zhongjin Li, B. Luo
{"title":"基于文本相似度的法规推荐","authors":"Jin Zeng, Jidong Ge, Yemao Zhou, Yi Feng, Chuanyi Li, Zhongjin Li, B. Luo","doi":"10.1109/WISA.2017.52","DOIUrl":null,"url":null,"abstract":"The traditional approach to measure text similarity is based on the TF-IDF algorithm to get the document vector, and then use the cosine similarity algorithm to calculate the text similarity. However, this method of statistical way ignores the potential semantics of the articles or words. By some means, this method only aims at the word itself. But with the Latent Semantic Analysis, the semantic space is added on the basis of calculate TF-IDF. Each word and document can have a position in semantic space by Singular Value Decomposition. That allows the semantic analysis, document clustering, and the relationship between semantic class and document class can be finished at the same time. Here, we summarize the text similarity measures, and gradually extend to the Latent Semantic Analysis. The experiment shows that the statutes predicted by LSA are more accurate than that only by TF-IDF.","PeriodicalId":204706,"journal":{"name":"2017 14th Web Information Systems and Applications Conference (WISA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Statutes Recommendation Based on Text Similarity\",\"authors\":\"Jin Zeng, Jidong Ge, Yemao Zhou, Yi Feng, Chuanyi Li, Zhongjin Li, B. Luo\",\"doi\":\"10.1109/WISA.2017.52\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The traditional approach to measure text similarity is based on the TF-IDF algorithm to get the document vector, and then use the cosine similarity algorithm to calculate the text similarity. However, this method of statistical way ignores the potential semantics of the articles or words. By some means, this method only aims at the word itself. But with the Latent Semantic Analysis, the semantic space is added on the basis of calculate TF-IDF. Each word and document can have a position in semantic space by Singular Value Decomposition. That allows the semantic analysis, document clustering, and the relationship between semantic class and document class can be finished at the same time. Here, we summarize the text similarity measures, and gradually extend to the Latent Semantic Analysis. The experiment shows that the statutes predicted by LSA are more accurate than that only by TF-IDF.\",\"PeriodicalId\":204706,\"journal\":{\"name\":\"2017 14th Web Information Systems and Applications Conference (WISA)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 14th Web Information Systems and Applications Conference (WISA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WISA.2017.52\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 14th Web Information Systems and Applications Conference (WISA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WISA.2017.52","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

传统的度量文本相似度的方法是基于TF-IDF算法得到文档向量,然后使用余弦相似度算法计算文本相似度。然而,这种统计方法忽略了文章或词语的潜在语义。从某种意义上说,这种方法只针对单词本身。而潜在语义分析是在计算TF-IDF的基础上添加语义空间。通过奇异值分解,每个词和文档在语义空间中都有一个位置。这使得语义分析、文档聚类以及语义类与文档类之间的关系可以同时完成。在这里,我们总结了文本相似度度量,并逐步扩展到潜在语义分析。实验表明,LSA预测的法律比TF-IDF预测的法律更准确。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Statutes Recommendation Based on Text Similarity
The traditional approach to measure text similarity is based on the TF-IDF algorithm to get the document vector, and then use the cosine similarity algorithm to calculate the text similarity. However, this method of statistical way ignores the potential semantics of the articles or words. By some means, this method only aims at the word itself. But with the Latent Semantic Analysis, the semantic space is added on the basis of calculate TF-IDF. Each word and document can have a position in semantic space by Singular Value Decomposition. That allows the semantic analysis, document clustering, and the relationship between semantic class and document class can be finished at the same time. Here, we summarize the text similarity measures, and gradually extend to the Latent Semantic Analysis. The experiment shows that the statutes predicted by LSA are more accurate than that only by TF-IDF.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Efficient Time Series Classification via Sparse Linear Combination Checking the Statutes in Chinese Judgment Document Based on Editing Distance Algorithm Information Extraction from Chinese Judgment Documents Topic Classification Based on Improved Word Embedding Keyword Extraction for Social Media Short Text
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1