{"title":"本地文本重用检测的指纹方法概述","authors":"Leena Lulu, B. Belkhouche, S. Harous","doi":"10.1109/INNOVATIONS.2016.7880050","DOIUrl":null,"url":null,"abstract":"We overview several local text reuse detection methods based on fingerprinting techniques. We first define the context of local text reuse and situate it within the general spectrum of information retrieval in order to pinpoint its particular applicability and challenges. After a brief description of the major text reuse detection approaches, we introduce the general principles of fingerprinting algorithms from an information retrieval perspective. Three classes of fingerprinting methods (overlap, non-overlap, and randomized) are surveyed. Specific algorithms, such as k-gram, winnowing, hailstorm, DCT and hash-breaking, are described. The performance and characteristics of these algorithms are summarized based on data from the literature.","PeriodicalId":412653,"journal":{"name":"2016 12th International Conference on Innovations in Information Technology (IIT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Overview of fingerprinting methods for local text reuse detection\",\"authors\":\"Leena Lulu, B. Belkhouche, S. Harous\",\"doi\":\"10.1109/INNOVATIONS.2016.7880050\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We overview several local text reuse detection methods based on fingerprinting techniques. We first define the context of local text reuse and situate it within the general spectrum of information retrieval in order to pinpoint its particular applicability and challenges. After a brief description of the major text reuse detection approaches, we introduce the general principles of fingerprinting algorithms from an information retrieval perspective. Three classes of fingerprinting methods (overlap, non-overlap, and randomized) are surveyed. Specific algorithms, such as k-gram, winnowing, hailstorm, DCT and hash-breaking, are described. The performance and characteristics of these algorithms are summarized based on data from the literature.\",\"PeriodicalId\":412653,\"journal\":{\"name\":\"2016 12th International Conference on Innovations in Information Technology (IIT)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 12th International Conference on Innovations in Information Technology (IIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INNOVATIONS.2016.7880050\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 12th International Conference on Innovations in Information Technology (IIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INNOVATIONS.2016.7880050","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Overview of fingerprinting methods for local text reuse detection
We overview several local text reuse detection methods based on fingerprinting techniques. We first define the context of local text reuse and situate it within the general spectrum of information retrieval in order to pinpoint its particular applicability and challenges. After a brief description of the major text reuse detection approaches, we introduce the general principles of fingerprinting algorithms from an information retrieval perspective. Three classes of fingerprinting methods (overlap, non-overlap, and randomized) are surveyed. Specific algorithms, such as k-gram, winnowing, hailstorm, DCT and hash-breaking, are described. The performance and characteristics of these algorithms are summarized based on data from the literature.