{"title":"基于句子节奏特征的中文文本作者归属","authors":"Shaokang Wang, Baoping Yan","doi":"10.1109/YCICT.2010.5713152","DOIUrl":null,"url":null,"abstract":"Authorship attribution, i.e., identifying the authorship of a piece of disputed text, is an important problem due to the increased concerns on copyright violations. While various authorship attribution algorithms have been proposed to identify the authorship of articles, they fail in several situations. This paper proposes a new authorship attribution algorithm for Chinese text using the sentence rhythm features of articles. In our algorithm, a rhythm feature matrix is proposed to depict the sentence rhythm of Chinese text. In order to determine the similarity of rhythm feature matrices, we compare two definitions of similarity based on Euclidean distance and improved Kullback-Leibler Divergence, respectively. Experimental results show that our algorithm achieves a success rate of 80%.","PeriodicalId":179847,"journal":{"name":"2010 IEEE Youth Conference on Information, Computing and Telecommunications","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2010-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Authorship attribution for Chinese text based on sentence rhythm features\",\"authors\":\"Shaokang Wang, Baoping Yan\",\"doi\":\"10.1109/YCICT.2010.5713152\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Authorship attribution, i.e., identifying the authorship of a piece of disputed text, is an important problem due to the increased concerns on copyright violations. While various authorship attribution algorithms have been proposed to identify the authorship of articles, they fail in several situations. This paper proposes a new authorship attribution algorithm for Chinese text using the sentence rhythm features of articles. In our algorithm, a rhythm feature matrix is proposed to depict the sentence rhythm of Chinese text. In order to determine the similarity of rhythm feature matrices, we compare two definitions of similarity based on Euclidean distance and improved Kullback-Leibler Divergence, respectively. Experimental results show that our algorithm achieves a success rate of 80%.\",\"PeriodicalId\":179847,\"journal\":{\"name\":\"2010 IEEE Youth Conference on Information, Computing and Telecommunications\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE Youth Conference on Information, Computing and Telecommunications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/YCICT.2010.5713152\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE Youth Conference on Information, Computing and Telecommunications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/YCICT.2010.5713152","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Authorship attribution for Chinese text based on sentence rhythm features
Authorship attribution, i.e., identifying the authorship of a piece of disputed text, is an important problem due to the increased concerns on copyright violations. While various authorship attribution algorithms have been proposed to identify the authorship of articles, they fail in several situations. This paper proposes a new authorship attribution algorithm for Chinese text using the sentence rhythm features of articles. In our algorithm, a rhythm feature matrix is proposed to depict the sentence rhythm of Chinese text. In order to determine the similarity of rhythm feature matrices, we compare two definitions of similarity based on Euclidean distance and improved Kullback-Leibler Divergence, respectively. Experimental results show that our algorithm achieves a success rate of 80%.