{"title":"A novel approach to capture the similarity in summarized text using embedded model","authors":"Asha Rani Mishra, V. K. Panchal","doi":"10.2478/ijssis-2022-0002","DOIUrl":null,"url":null,"abstract":"Abstract The presence of near duplicate textual content imposes great challenges while extracting information from it. To handle these challenges, detection of near duplicates is a prime research concern. Existing research mostly uses text clustering, classification and retrieval algorithms for detection of near duplicates. Text summarization, an important tool of text mining, is not explored yet for the detection of near duplicates. Instead of using the whole document, the proposed method uses its summary as it saves both time and storage. Experimental results show that traditional similarity algorithms were able to capture similarity relatedness to a great extent even on the summarized text with a similarity score of 44.685%. Moreover, degree of similarity capture was greater (0.52%) in case of use of embedding models with better text representation as compared to traditional methods. Also, this paper highlights the research status of various similarity measures in terms of concept involved, merits and demerits.","PeriodicalId":45623,"journal":{"name":"International Journal on Smart Sensing and Intelligent Systems","volume":" ","pages":""},"PeriodicalIF":0.5000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal on Smart Sensing and Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/ijssis-2022-0002","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract The presence of near duplicate textual content imposes great challenges while extracting information from it. To handle these challenges, detection of near duplicates is a prime research concern. Existing research mostly uses text clustering, classification and retrieval algorithms for detection of near duplicates. Text summarization, an important tool of text mining, is not explored yet for the detection of near duplicates. Instead of using the whole document, the proposed method uses its summary as it saves both time and storage. Experimental results show that traditional similarity algorithms were able to capture similarity relatedness to a great extent even on the summarized text with a similarity score of 44.685%. Moreover, degree of similarity capture was greater (0.52%) in case of use of embedding models with better text representation as compared to traditional methods. Also, this paper highlights the research status of various similarity measures in terms of concept involved, merits and demerits.
期刊介绍:
nternational Journal on Smart Sensing and Intelligent Systems (S2IS) is a rapid and high-quality international forum wherein academics, researchers and practitioners may publish their high-quality, original, and state-of-the-art papers describing theoretical aspects, system architectures, analysis and design techniques, and implementation experiences in intelligent sensing technologies. The journal publishes articles reporting substantive results on a wide range of smart sensing approaches applied to variety of domain problems, including but not limited to: Ambient Intelligence and Smart Environment Analysis, Evaluation, and Test of Smart Sensors Intelligent Management of Sensors Fundamentals of Smart Sensing Principles and Mechanisms Materials and its Applications for Smart Sensors Smart Sensing Applications, Hardware, Software, Systems, and Technologies Smart Sensors in Multidisciplinary Domains and Problems Smart Sensors in Science and Engineering Smart Sensors in Social Science and Humanity