{"title":"Crossing Linguistic Barriers: Authorship Attribution in Sinhala Texts","authors":"Raheem Sarwar, Maneesha Perera, Pin Shen Teh, Raheel Nawaz, Muhammad Umair Hassan","doi":"10.1145/3655620","DOIUrl":null,"url":null,"abstract":"Authorship attribution involves determining the original author of an anonymous text from a pool of potential authors. Author attribution task has applications in several domains, such as plagiarism detection, digital text forensics, and information retrieval. While these applications extend beyond any single language, existing research has predominantly centered on English, posing challenges for application in languages like Sinhala due to linguistic disparities and a lack of language processing tools. We present the first comprehensive study on cross-topic authorship attribution for Sinhala texts and propose a solution that can effectively perform the authorship attribution task even if the topics within the test and training samples differ. Our solution consists of three main parts: (i) extraction of topic-independent stylometric features, (ii) generation of the small candidate author set with the help of similarity search, and (iii) identification of the true author. Several experimental studies were carried out to demonstrate that the proposed solution can effectively handle real-world scenarios involving a large number of candidate authors and a limited number of text samples for each candidate author.","PeriodicalId":1,"journal":{"name":"Accounts of Chemical Research","volume":"6 6","pages":""},"PeriodicalIF":17.7000,"publicationDate":"2024-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accounts of Chemical Research","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3655620","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Authorship attribution involves determining the original author of an anonymous text from a pool of potential authors. Author attribution task has applications in several domains, such as plagiarism detection, digital text forensics, and information retrieval. While these applications extend beyond any single language, existing research has predominantly centered on English, posing challenges for application in languages like Sinhala due to linguistic disparities and a lack of language processing tools. We present the first comprehensive study on cross-topic authorship attribution for Sinhala texts and propose a solution that can effectively perform the authorship attribution task even if the topics within the test and training samples differ. Our solution consists of three main parts: (i) extraction of topic-independent stylometric features, (ii) generation of the small candidate author set with the help of similarity search, and (iii) identification of the true author. Several experimental studies were carried out to demonstrate that the proposed solution can effectively handle real-world scenarios involving a large number of candidate authors and a limited number of text samples for each candidate author.
期刊介绍:
Accounts of Chemical Research presents short, concise and critical articles offering easy-to-read overviews of basic research and applications in all areas of chemistry and biochemistry. These short reviews focus on research from the author’s own laboratory and are designed to teach the reader about a research project. In addition, Accounts of Chemical Research publishes commentaries that give an informed opinion on a current research problem. Special Issues online are devoted to a single topic of unusual activity and significance.
Accounts of Chemical Research replaces the traditional article abstract with an article "Conspectus." These entries synopsize the research affording the reader a closer look at the content and significance of an article. Through this provision of a more detailed description of the article contents, the Conspectus enhances the article's discoverability by search engines and the exposure for the research.