Statutes Recommendation Based on Text Similarity

2017 14th Web Information Systems and Applications Conference (WISA) Pub Date : 2017-11-01 DOI:10.1109/WISA.2017.52

Jin Zeng, Jidong Ge, Yemao Zhou, Yi Feng, Chuanyi Li, Zhongjin Li, B. Luo

引用次数: 1

Abstract

The traditional approach to measure text similarity is based on the TF-IDF algorithm to get the document vector, and then use the cosine similarity algorithm to calculate the text similarity. However, this method of statistical way ignores the potential semantics of the articles or words. By some means, this method only aims at the word itself. But with the Latent Semantic Analysis, the semantic space is added on the basis of calculate TF-IDF. Each word and document can have a position in semantic space by Singular Value Decomposition. That allows the semantic analysis, document clustering, and the relationship between semantic class and document class can be finished at the same time. Here, we summarize the text similarity measures, and gradually extend to the Latent Semantic Analysis. The experiment shows that the statutes predicted by LSA are more accurate than that only by TF-IDF.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于文本相似度的法规推荐

传统的度量文本相似度的方法是基于TF-IDF算法得到文档向量，然后使用余弦相似度算法计算文本相似度。然而，这种统计方法忽略了文章或词语的潜在语义。从某种意义上说，这种方法只针对单词本身。而潜在语义分析是在计算TF-IDF的基础上添加语义空间。通过奇异值分解，每个词和文档在语义空间中都有一个位置。这使得语义分析、文档聚类以及语义类与文档类之间的关系可以同时完成。在这里，我们总结了文本相似度度量，并逐步扩展到潜在语义分析。实验表明，LSA预测的法律比TF-IDF预测的法律更准确。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2017 14th Web Information Systems and Applications Conference (WISA)

自引率

0.00%

发文量

期刊最新文献

Efficient Time Series Classification via Sparse Linear Combination Checking the Statutes in Chinese Judgment Document Based on Editing Distance Algorithm Information Extraction from Chinese Judgment Documents Topic Classification Based on Improved Word Embedding Keyword Extraction for Social Media Short Text