Johan Fernquist, Björn Pelzer, Lukas Lundmark, Lisa Kaati, F. Johansson
{"title":"Similarity ranking using handcrafted stylometric traits in a swedish context","authors":"Johan Fernquist, Björn Pelzer, Lukas Lundmark, Lisa Kaati, F. Johansson","doi":"10.1145/3487351.3492719","DOIUrl":null,"url":null,"abstract":"In this paper we introduce a new type of handcrafted textual features called stylometric traits, used to create a stylistic writeprint of an author's writing style. These can be divided into four categories: (i) word variations, (ii) abbreviations, (iii) internet jargon, and (iv) numbers. A similarity ranking method is developed for ranking users' social media accounts based on how similar their writeprints are. We experiment with both vector distance metrics and machine learning-based class probabilities to measure similarity. The best performance is achieved using stylometric traits combined with the Jensen-Shannon distance metric, outperforming traditional stylometric features used in previous research.","PeriodicalId":320904,"journal":{"name":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3487351.3492719","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper we introduce a new type of handcrafted textual features called stylometric traits, used to create a stylistic writeprint of an author's writing style. These can be divided into four categories: (i) word variations, (ii) abbreviations, (iii) internet jargon, and (iv) numbers. A similarity ranking method is developed for ranking users' social media accounts based on how similar their writeprints are. We experiment with both vector distance metrics and machine learning-based class probabilities to measure similarity. The best performance is achieved using stylometric traits combined with the Jensen-Shannon distance metric, outperforming traditional stylometric features used in previous research.