{"title":"两种文献相似度查找方法的比较分析","authors":"Maedeh Afzali, Suresh Kumar","doi":"10.14257/IJDTA.2017.10.2.02","DOIUrl":null,"url":null,"abstract":"Similarity measurements are elemental concepts in text mining and information retrieval that helps us to quantify the similarity between documents, which is effective in the improvement of the performance of search engines and browsing techniques. Nowadays, varieties of similarity measures are available, but it is not clear that which similarity measure is more effective in finding the similarity of text documents. The aim of this paper is to provide a comparative analysis of various term based similarity measures such as Cosine similarity, Jaccard and Dice coefficient in order to evaluate the performance of this similarity measures in finding the similarity of two text documents.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"51 1","pages":"23-30"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Comparative Analysis of Various Similarity Measures for Finding Similarity of Two Documents\",\"authors\":\"Maedeh Afzali, Suresh Kumar\",\"doi\":\"10.14257/IJDTA.2017.10.2.02\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Similarity measurements are elemental concepts in text mining and information retrieval that helps us to quantify the similarity between documents, which is effective in the improvement of the performance of search engines and browsing techniques. Nowadays, varieties of similarity measures are available, but it is not clear that which similarity measure is more effective in finding the similarity of text documents. The aim of this paper is to provide a comparative analysis of various term based similarity measures such as Cosine similarity, Jaccard and Dice coefficient in order to evaluate the performance of this similarity measures in finding the similarity of two text documents.\",\"PeriodicalId\":13926,\"journal\":{\"name\":\"International journal of database theory and application\",\"volume\":\"51 1\",\"pages\":\"23-30\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-02-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International journal of database theory and application\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14257/IJDTA.2017.10.2.02\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of database theory and application","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14257/IJDTA.2017.10.2.02","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparative Analysis of Various Similarity Measures for Finding Similarity of Two Documents
Similarity measurements are elemental concepts in text mining and information retrieval that helps us to quantify the similarity between documents, which is effective in the improvement of the performance of search engines and browsing techniques. Nowadays, varieties of similarity measures are available, but it is not clear that which similarity measure is more effective in finding the similarity of text documents. The aim of this paper is to provide a comparative analysis of various term based similarity measures such as Cosine similarity, Jaccard and Dice coefficient in order to evaluate the performance of this similarity measures in finding the similarity of two text documents.