{"title":"新闻源中模糊重复文本的检测","authors":"E. Sharapova, R. Sharapov","doi":"10.1109/SYNCHROINFO.2019.8814112","DOIUrl":null,"url":null,"abstract":"The paper is devoted to the problem of fuzzy duplicate texts detection in news feeds. The signature methods of detecting fuzzy duplicate news are considered. Signatures describe the content of a news as one or a group of numbers. It is proposed to use Description words big signature. This signature consist of set of flags for the presence of description words and vector of names. This vector include names of objects, countries, names of politicians. It allow setting the exact direction of the news on this or that event. In paper the results of testing the signature methods are given. Proposed signature showed good results both in recall and in precision of duplicate news detection.","PeriodicalId":363848,"journal":{"name":"2019 Systems of Signal Synchronization, Generating and Processing in Telecommunications (SYNCHROINFO)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Detection of Fuzzy Duplicate Texts in News Feeds\",\"authors\":\"E. Sharapova, R. Sharapov\",\"doi\":\"10.1109/SYNCHROINFO.2019.8814112\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The paper is devoted to the problem of fuzzy duplicate texts detection in news feeds. The signature methods of detecting fuzzy duplicate news are considered. Signatures describe the content of a news as one or a group of numbers. It is proposed to use Description words big signature. This signature consist of set of flags for the presence of description words and vector of names. This vector include names of objects, countries, names of politicians. It allow setting the exact direction of the news on this or that event. In paper the results of testing the signature methods are given. Proposed signature showed good results both in recall and in precision of duplicate news detection.\",\"PeriodicalId\":363848,\"journal\":{\"name\":\"2019 Systems of Signal Synchronization, Generating and Processing in Telecommunications (SYNCHROINFO)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 Systems of Signal Synchronization, Generating and Processing in Telecommunications (SYNCHROINFO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SYNCHROINFO.2019.8814112\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Systems of Signal Synchronization, Generating and Processing in Telecommunications (SYNCHROINFO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SYNCHROINFO.2019.8814112","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The paper is devoted to the problem of fuzzy duplicate texts detection in news feeds. The signature methods of detecting fuzzy duplicate news are considered. Signatures describe the content of a news as one or a group of numbers. It is proposed to use Description words big signature. This signature consist of set of flags for the presence of description words and vector of names. This vector include names of objects, countries, names of politicians. It allow setting the exact direction of the news on this or that event. In paper the results of testing the signature methods are given. Proposed signature showed good results both in recall and in precision of duplicate news detection.