基于提示的文本可读性评估学习

Findings (Sydney (N.S.W.) Pub Date : 2023-02-25 DOI:10.48550/arXiv.2302.13139

Bruce W. Lee, J. Lee

{"title":"基于提示的文本可读性评估学习","authors":"Bruce W. Lee, J. Lee","doi":"10.48550/arXiv.2302.13139","DOIUrl":null,"url":null,"abstract":"We propose the novel adaptation of a pre-trained seq2seq model for readability assessment. We prove that a seq2seq model - T5 or BART - can be adapted to discern which text is more difficult from two given texts (pairwise). As an exploratory study to prompt-learn a neural network for text readability in a text-to-text manner, we report useful tips for future work in seq2seq training and ranking-based approach to readability assessment. Specifically, we test nine input-output formats/prefixes and show that they can significantly influence the final model performance.Also, we argue that the combination of text-to-text training and pairwise ranking setup 1) enables leveraging multiple parallel text simplification data for teaching readability and 2) trains a neural model for the general concept of readability (therefore, better cross-domain generalization). At last, we report a 99.6% pairwise classification accuracy on Newsela and a 98.7% for OneStopEnglish, through a joint training approach. Our code is available at github.com/brucewlee/prompt-learning-readability.","PeriodicalId":73025,"journal":{"name":"Findings (Sydney (N.S.W.)","volume":"82 1","pages":"1774-1779"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Prompt-based Learning for Text Readability Assessment\",\"authors\":\"Bruce W. Lee, J. Lee\",\"doi\":\"10.48550/arXiv.2302.13139\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose the novel adaptation of a pre-trained seq2seq model for readability assessment. We prove that a seq2seq model - T5 or BART - can be adapted to discern which text is more difficult from two given texts (pairwise). As an exploratory study to prompt-learn a neural network for text readability in a text-to-text manner, we report useful tips for future work in seq2seq training and ranking-based approach to readability assessment. Specifically, we test nine input-output formats/prefixes and show that they can significantly influence the final model performance.Also, we argue that the combination of text-to-text training and pairwise ranking setup 1) enables leveraging multiple parallel text simplification data for teaching readability and 2) trains a neural model for the general concept of readability (therefore, better cross-domain generalization). At last, we report a 99.6% pairwise classification accuracy on Newsela and a 98.7% for OneStopEnglish, through a joint training approach. Our code is available at github.com/brucewlee/prompt-learning-readability.\",\"PeriodicalId\":73025,\"journal\":{\"name\":\"Findings (Sydney (N.S.W.)\",\"volume\":\"82 1\",\"pages\":\"1774-1779\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Findings (Sydney (N.S.W.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.48550/arXiv.2302.13139\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Findings (Sydney (N.S.W.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2302.13139","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

我们提出了一种新的适应性的预训练seq2seq模型的可读性评估。我们证明了seq2seq模型- T5或BART -可以适应于从两个给定文本(成对)中区分哪个文本更困难。作为一项以文本对文本的方式快速学习文本可读性神经网络的探索性研究，我们报告了对未来seq2seq训练和基于排名的可读性评估方法的工作有用的提示。具体来说，我们测试了九种输入输出格式/前缀，并表明它们可以显著影响最终的模型性能。此外，我们认为文本到文本训练和两两排序设置的结合1)可以利用多个并行文本简化数据来教授可读性，2)为可读性的一般概念训练神经模型(因此，更好的跨域泛化)。最后，我们报告了通过联合训练方法，Newsela和OneStopEnglish的两两分类准确率分别为99.6%和98.7%。我们的代码可在github.com/brucewlee/prompt-learning-readability上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Prompt-based Learning for Text Readability Assessment

We propose the novel adaptation of a pre-trained seq2seq model for readability assessment. We prove that a seq2seq model - T5 or BART - can be adapted to discern which text is more difficult from two given texts (pairwise). As an exploratory study to prompt-learn a neural network for text readability in a text-to-text manner, we report useful tips for future work in seq2seq training and ranking-based approach to readability assessment. Specifically, we test nine input-output formats/prefixes and show that they can significantly influence the final model performance.Also, we argue that the combination of text-to-text training and pairwise ranking setup 1) enables leveraging multiple parallel text simplification data for teaching readability and 2) trains a neural model for the general concept of readability (therefore, better cross-domain generalization). At last, we report a 99.6% pairwise classification accuracy on Newsela and a 98.7% for OneStopEnglish, through a joint training approach. Our code is available at github.com/brucewlee/prompt-learning-readability.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Findings (Sydney (N.S.W.)

自引率

0.00%

发文量

审稿时长

4 weeks