首页 > 最新文献

Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation最新文献

英文 中文
How to Compare Summarizers without Target Length? Pitfalls, Solutions and Re-Examination of the Neural Summarization Literature 如何比较没有目标长度的摘要?神经摘要文献的陷阱、解决方法与再审视
Simeng Sun, Ori Shapira, Ido Dagan, A. Nenkova
We show that plain ROUGE F1 scores are not ideal for comparing current neural systems which on average produce different lengths. This is due to a non-linear pattern between ROUGE F1 and summary length. To alleviate the effect of length during evaluation, we have proposed a new method which normalizes the ROUGE F1 scores of a system by that of a random system with same average output length. A pilot human evaluation has shown that humans prefer short summaries in terms of the verbosity of a summary but overall consider longer summaries to be of higher quality. While human evaluations are more expensive in time and resources, it is clear that normalization, such as the one we proposed for automatic evaluation, will make human evaluations more meaningful.
我们表明,普通的ROUGE F1分数对于比较平均产生不同长度的当前神经系统并不理想。这是由于ROUGE F1和总结长度之间的非线性模式。为了减轻长度在评价过程中的影响,我们提出了一种新的方法,即用具有相同平均输出长度的随机系统的ROUGE F1分数对系统的ROUGE F1分数进行归一化。一项初步的人类评估表明,就摘要的冗长程度而言,人类更喜欢简短的摘要,但总体而言,人们认为较长的摘要质量更高。虽然人类评估在时间和资源上更昂贵,但很明显,规范化,例如我们为自动评估提出的规范化,将使人类评估更有意义。
{"title":"How to Compare Summarizers without Target Length? Pitfalls, Solutions and Re-Examination of the Neural Summarization Literature","authors":"Simeng Sun, Ori Shapira, Ido Dagan, A. Nenkova","doi":"10.18653/v1/W19-2303","DOIUrl":"https://doi.org/10.18653/v1/W19-2303","url":null,"abstract":"We show that plain ROUGE F1 scores are not ideal for comparing current neural systems which on average produce different lengths. This is due to a non-linear pattern between ROUGE F1 and summary length. To alleviate the effect of length during evaluation, we have proposed a new method which normalizes the ROUGE F1 scores of a system by that of a random system with same average output length. A pilot human evaluation has shown that humans prefer short summaries in terms of the verbosity of a summary but overall consider longer summaries to be of higher quality. While human evaluations are more expensive in time and resources, it is clear that normalization, such as the one we proposed for automatic evaluation, will make human evaluations more meaningful.","PeriodicalId":223584,"journal":{"name":"Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation","volume":"113 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128106624","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 43
Neural Text Style Transfer via Denoising and Reranking 基于去噪和重排序的神经文本风格迁移
Joseph Lee, Ziang Xie, Cindy Wang, M. Drach, Dan Jurafsky, A. Ng
We introduce a simple method for text style transfer that frames style transfer as denoising: we synthesize a noisy corpus and treat the source style as a noisy version of the target style. To control for aspects such as preserving meaning while modifying style, we propose a reranking approach in the data synthesis phase. We evaluate our method on three novel style transfer tasks: transferring between British and American varieties, text genres (formal vs. casual), and lyrics from different musical genres. By measuring style transfer quality, meaning preservation, and the fluency of generated outputs, we demonstrate that our method is able both to produce high-quality output while maintaining the flexibility to suggest syntactically rich stylistic edits.
我们介绍了一种简单的文本风格转移方法,将风格转移框架为去噪:我们合成一个嘈杂的语料库,并将源风格视为目标风格的嘈杂版本。为了控制在修改样式的同时保留意义等方面,我们提出了一种在数据合成阶段重新排序的方法。我们在三个新颖的风格迁移任务中评估了我们的方法:在英美变体之间的迁移,文本类型(正式与休闲)以及来自不同音乐类型的歌词。通过测量风格转移质量、意义保存和生成输出的流畅性,我们证明了我们的方法能够产生高质量的输出,同时保持灵活性,以建议语法丰富的风格编辑。
{"title":"Neural Text Style Transfer via Denoising and Reranking","authors":"Joseph Lee, Ziang Xie, Cindy Wang, M. Drach, Dan Jurafsky, A. Ng","doi":"10.18653/v1/W19-2309","DOIUrl":"https://doi.org/10.18653/v1/W19-2309","url":null,"abstract":"We introduce a simple method for text style transfer that frames style transfer as denoising: we synthesize a noisy corpus and treat the source style as a noisy version of the target style. To control for aspects such as preserving meaning while modifying style, we propose a reranking approach in the data synthesis phase. We evaluate our method on three novel style transfer tasks: transferring between British and American varieties, text genres (formal vs. casual), and lyrics from different musical genres. By measuring style transfer quality, meaning preservation, and the fluency of generated outputs, we demonstrate that our method is able both to produce high-quality output while maintaining the flexibility to suggest syntactically rich stylistic edits.","PeriodicalId":223584,"journal":{"name":"Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation","volume":"318 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133781603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 11
Paraphrase Generation for Semi-Supervised Learning in NLU NLU中半监督学习的释义生成
Eunah Cho, He Xie, W. Campbell
Semi-supervised learning is an efficient way to improve performance for natural language processing systems. In this work, we propose Para-SSL, a scheme to generate candidate utterances using paraphrasing and methods from semi-supervised learning. In order to perform paraphrase generation in the context of a dialog system, we automatically extract paraphrase pairs to create a paraphrase corpus. Using this data, we build a paraphrase generation system and perform one-to-many generation, followed by a validation step to select only the utterances with good quality. The paraphrase-based semi-supervised learning is applied to five functionalities in a natural language understanding system. Our proposed method for semi-supervised learning using paraphrase generation does not require user utterances and can be applied prior to releasing a new functionality to a system. Experiments show that we can achieve up to 19% of relative slot error reduction without an access to user utterances, and up to 35% when leveraging live traffic utterances.
半监督学习是提高自然语言处理系统性能的有效方法。在这项工作中,我们提出了Para-SSL,这是一种使用释义和半监督学习方法生成候选话语的方案。为了在对话系统的上下文中执行释义生成,我们自动提取释义对以创建释义语料库。利用这些数据,我们构建了一个意译生成系统,并进行一对多生成,然后进行验证步骤,只选择质量好的话语。将基于释义的半监督学习应用于自然语言理解系统的五个功能。我们提出的使用释义生成的半监督学习方法不需要用户的话语,并且可以在向系统发布新功能之前应用。实验表明,在不访问用户话语的情况下,我们可以实现高达19%的相对时隙误差减少,而在利用实时流量话语时,我们可以实现高达35%的相对时隙误差减少。
{"title":"Paraphrase Generation for Semi-Supervised Learning in NLU","authors":"Eunah Cho, He Xie, W. Campbell","doi":"10.18653/v1/W19-2306","DOIUrl":"https://doi.org/10.18653/v1/W19-2306","url":null,"abstract":"Semi-supervised learning is an efficient way to improve performance for natural language processing systems. In this work, we propose Para-SSL, a scheme to generate candidate utterances using paraphrasing and methods from semi-supervised learning. In order to perform paraphrase generation in the context of a dialog system, we automatically extract paraphrase pairs to create a paraphrase corpus. Using this data, we build a paraphrase generation system and perform one-to-many generation, followed by a validation step to select only the utterances with good quality. The paraphrase-based semi-supervised learning is applied to five functionalities in a natural language understanding system. Our proposed method for semi-supervised learning using paraphrase generation does not require user utterances and can be applied prior to releasing a new functionality to a system. Experiments show that we can achieve up to 19% of relative slot error reduction without an access to user utterances, and up to 35% when leveraging live traffic utterances.","PeriodicalId":223584,"journal":{"name":"Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122916029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Neural Text Simplification in Low-Resource Conditions Using Weak Supervision 基于弱监督的低资源条件下神经文本简化
Alessio Palmero Aprosio, Sara Tonelli, M. Turchi, Matteo Negri, Mattia Antonino Di Gangi
Neural text simplification has gained increasing attention in the NLP community thanks to recent advancements in deep sequence-to-sequence learning. Most recent efforts with such a data-demanding paradigm have dealt with the English language, for which sizeable training datasets are currently available to deploy competitive models. Similar improvements on less resource-rich languages are conditioned either to intensive manual work to create training data, or to the design of effective automatic generation techniques to bypass the data acquisition bottleneck. Inspired by the machine translation field, in which synthetic parallel pairs generated from monolingual data yield significant improvements to neural models, in this paper we exploit large amounts of heterogeneous data to automatically select simple sentences, which are then used to create synthetic simplification pairs. We also evaluate other solutions, such as oversampling and the use of external word embeddings to be fed to the neural simplification system. Our approach is evaluated on Italian and Spanish, for which few thousand gold sentence pairs are available. The results show that these techniques yield performance improvements over a baseline sequence-to-sequence configuration.
由于深度序列到序列学习的最新进展,神经文本简化在NLP社区中获得了越来越多的关注。最近对这种数据要求很高的范式的研究主要针对英语语言,目前有相当大的训练数据集可用于部署竞争性模型。在资源不丰富的语言上,类似的改进要么依赖于大量的手工工作来创建训练数据,要么依赖于设计有效的自动生成技术来绕过数据获取瓶颈。在机器翻译领域,由单语数据生成的合成并行对对神经模型产生了显著的改进,受此启发,本文利用大量异构数据自动选择简单句子,然后使用这些简单句子创建合成简化对。我们还评估了其他解决方案,如过采样和使用外部词嵌入来馈送到神经简化系统。我们的方法在意大利语和西班牙语上进行了评估,这两种语言有几千个金句对可用。结果表明,与基线序列到序列配置相比,这些技术产生了性能改进。
{"title":"Neural Text Simplification in Low-Resource Conditions Using Weak Supervision","authors":"Alessio Palmero Aprosio, Sara Tonelli, M. Turchi, Matteo Negri, Mattia Antonino Di Gangi","doi":"10.18653/v1/W19-2305","DOIUrl":"https://doi.org/10.18653/v1/W19-2305","url":null,"abstract":"Neural text simplification has gained increasing attention in the NLP community thanks to recent advancements in deep sequence-to-sequence learning. Most recent efforts with such a data-demanding paradigm have dealt with the English language, for which sizeable training datasets are currently available to deploy competitive models. Similar improvements on less resource-rich languages are conditioned either to intensive manual work to create training data, or to the design of effective automatic generation techniques to bypass the data acquisition bottleneck. Inspired by the machine translation field, in which synthetic parallel pairs generated from monolingual data yield significant improvements to neural models, in this paper we exploit large amounts of heterogeneous data to automatically select simple sentences, which are then used to create synthetic simplification pairs. We also evaluate other solutions, such as oversampling and the use of external word embeddings to be fed to the neural simplification system. Our approach is evaluated on Italian and Spanish, for which few thousand gold sentence pairs are available. The results show that these techniques yield performance improvements over a baseline sequence-to-sequence configuration.","PeriodicalId":223584,"journal":{"name":"Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131053359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
期刊
Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1