Zhonghui Shao , Jing Zhang , Haoyang Li , Xinmei Huang , Chao Zhou , Yuanchun Wang , Jibing Gong , Cuiping Li , Hong Chen
{"title":"Authorship style transfer with inverse transfer data augmentation","authors":"Zhonghui Shao , Jing Zhang , Haoyang Li , Xinmei Huang , Chao Zhou , Yuanchun Wang , Jibing Gong , Cuiping Li , Hong Chen","doi":"10.1016/j.aiopen.2024.08.003","DOIUrl":null,"url":null,"abstract":"<div><p>Authorship style transfer aims to modify the style of neutral text to match the unique speaking or writing style of a particular individual. While Large Language Models (LLMs) present promising solutions, their effectiveness is limited by the small number of in-context learning demonstrations, particularly for authorship styles not frequently seen during pre-training. In response, this paper proposes an inverse transfer data augmentation (<span>ITDA</span>) method, leveraging LLMs to create (neutral text, stylized text) pairs. This method involves removing the existing styles from stylized texts, a process made more feasible due to the prevalence of neutral texts in pre-training. We use this augmented dataset to train a compact model that is efficient for deployment and adept at replicating the targeted style. Our experimental results, conducted across four datasets with distinct authorship styles, establish the effectiveness of <span>ITDA</span> over traditional style transfer methods and forward transfer using GPT-3.5. For further research and application, our dataset and code are openly accessible at <span><span>https://github.com/Vicky-Shao/ITDA</span><svg><path></path></svg></span>.</p></div>","PeriodicalId":100068,"journal":{"name":"AI Open","volume":"5 ","pages":"Pages 94-103"},"PeriodicalIF":0.0000,"publicationDate":"2024-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666651024000135/pdfft?md5=3a5bc730b200d5992d33b797c1afbf4f&pid=1-s2.0-S2666651024000135-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"AI Open","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666651024000135","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Authorship style transfer aims to modify the style of neutral text to match the unique speaking or writing style of a particular individual. While Large Language Models (LLMs) present promising solutions, their effectiveness is limited by the small number of in-context learning demonstrations, particularly for authorship styles not frequently seen during pre-training. In response, this paper proposes an inverse transfer data augmentation (ITDA) method, leveraging LLMs to create (neutral text, stylized text) pairs. This method involves removing the existing styles from stylized texts, a process made more feasible due to the prevalence of neutral texts in pre-training. We use this augmented dataset to train a compact model that is efficient for deployment and adept at replicating the targeted style. Our experimental results, conducted across four datasets with distinct authorship styles, establish the effectiveness of ITDA over traditional style transfer methods and forward transfer using GPT-3.5. For further research and application, our dataset and code are openly accessible at https://github.com/Vicky-Shao/ITDA.