How do control tokens affect natural language generation tasks like text simplification

IF 2.3 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE Natural Language Engineering Pub Date : 2024-01-23 DOI:10.1017/s1351324923000566
Zihao Li, Matthew Shardlow
{"title":"How do control tokens affect natural language generation tasks like text simplification","authors":"Zihao Li, Matthew Shardlow","doi":"10.1017/s1351324923000566","DOIUrl":null,"url":null,"abstract":"Recent work on text simplification has focused on the use of control tokens to further the state-of-the-art. However, it is not easy to further improve without an in-depth comprehension of the mechanisms underlying control tokens. One unexplored factor is the tokenization strategy, which we also explore. In this paper, we (1) reimplemented AudienCe-CEntric Sentence Simplification, (2) explored the effects and interactions of varying control tokens, (3) tested the influences of different tokenization strategies, (4) demonstrated how separate control tokens affect performance and (5) proposed new methods to predict the value of control tokens. We show variations of performance in the four control tokens separately. We also uncover how the design of control tokens could influence performance and give some suggestions for designing control tokens. We show the newly proposed method with higher performance in both SARI (a common scoring metric in text simplificaiton) and BERTScore (a score derived from the BERT language model) and potential in real applications.","PeriodicalId":49143,"journal":{"name":"Natural Language Engineering","volume":null,"pages":null},"PeriodicalIF":2.3000,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Natural Language Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1017/s1351324923000566","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Recent work on text simplification has focused on the use of control tokens to further the state-of-the-art. However, it is not easy to further improve without an in-depth comprehension of the mechanisms underlying control tokens. One unexplored factor is the tokenization strategy, which we also explore. In this paper, we (1) reimplemented AudienCe-CEntric Sentence Simplification, (2) explored the effects and interactions of varying control tokens, (3) tested the influences of different tokenization strategies, (4) demonstrated how separate control tokens affect performance and (5) proposed new methods to predict the value of control tokens. We show variations of performance in the four control tokens separately. We also uncover how the design of control tokens could influence performance and give some suggestions for designing control tokens. We show the newly proposed method with higher performance in both SARI (a common scoring metric in text simplificaiton) and BERTScore (a score derived from the BERT language model) and potential in real applications.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
控制标记如何影响文本简化等自然语言生成任务
最近的文本简化工作主要集中在控制标记的使用上,以进一步提高技术水平。但是,如果不深入了解控制标记的基本机制,要想进一步提高水平并非易事。其中一个尚未探索的因素就是标记化策略,我们也对此进行了探讨。在本文中,我们(1)重新实现了 AudienCe-CEntric Sentence Simplification,(2)探索了不同控制标记的效果和相互作用,(3)测试了不同标记化策略的影响,(4)展示了单独控制标记对性能的影响,(5)提出了预测控制标记价值的新方法。我们分别展示了四个控制标记的性能变化。我们还揭示了控制令牌的设计如何影响性能,并给出了一些设计控制令牌的建议。我们展示了新提出的方法在 SARI(文本简化中常用的评分标准)和 BERTScore(由 BERT 语言模型得出的分数)方面都有较高的性能,并在实际应用中具有潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Natural Language Engineering
Natural Language Engineering COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-
CiteScore
5.90
自引率
12.00%
发文量
60
审稿时长
>12 weeks
期刊介绍: Natural Language Engineering meets the needs of professionals and researchers working in all areas of computerised language processing, whether from the perspective of theoretical or descriptive linguistics, lexicology, computer science or engineering. Its aim is to bridge the gap between traditional computational linguistics research and the implementation of practical applications with potential real-world use. As well as publishing research articles on a broad range of topics - from text analysis, machine translation, information retrieval and speech analysis and generation to integrated systems and multi modal interfaces - it also publishes special issues on specific areas and technologies within these topics, an industry watch column and book reviews.
期刊最新文献
Start-up activity in the LLM ecosystem Anisotropic span embeddings and the negative impact of higher-order inference for coreference resolution: An empirical analysis Automated annotation of parallel bible corpora with cross-lingual semantic concordance How do control tokens affect natural language generation tasks like text simplification Emerging trends: When can users trust GPT, and when should they intervene?
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1