Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation最新文献

英文中文

How to Compare Summarizers without Target Length? Pitfalls, Solutions and Re-Examination of the Neural Summarization Literature 如何比较没有目标长度的摘要?神经摘要文献的陷阱、解决方法与再审视

Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation

Pub Date : 1900-01-01 DOI: 10.18653/v1/W19-2303

Simeng Sun, Ori Shapira, Ido Dagan, A. Nenkova

We show that plain ROUGE F1 scores are not ideal for comparing current neural systems which on average produce different lengths. This is due to a non-linear pattern between ROUGE F1 and summary length. To alleviate the effect of length during evaluation, we have proposed a new method which normalizes the ROUGE F1 scores of a system by that of a random system with same average output length. A pilot human evaluation has shown that humans prefer short summaries in terms of the verbosity of a summary but overall consider longer summaries to be of higher quality. While human evaluations are more expensive in time and resources, it is clear that normalization, such as the one we proposed for automatic evaluation, will make human evaluations more meaningful.

我们表明，普通的ROUGE F1分数对于比较平均产生不同长度的当前神经系统并不理想。这是由于ROUGE F1和总结长度之间的非线性模式。为了减轻长度在评价过程中的影响，我们提出了一种新的方法，即用具有相同平均输出长度的随机系统的ROUGE F1分数对系统的ROUGE F1分数进行归一化。一项初步的人类评估表明，就摘要的冗长程度而言，人类更喜欢简短的摘要，但总体而言，人们认为较长的摘要质量更高。虽然人类评估在时间和资源上更昂贵，但很明显，规范化，例如我们为自动评估提出的规范化，将使人类评估更有意义。

引用次数: 43

Neural Text Style Transfer via Denoising and Reranking 基于去噪和重排序的神经文本风格迁移

Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation

Pub Date : 1900-01-01 DOI: 10.18653/v1/W19-2309

Joseph Lee, Ziang Xie, Cindy Wang, M. Drach, Dan Jurafsky, A. Ng

We introduce a simple method for text style transfer that frames style transfer as denoising: we synthesize a noisy corpus and treat the source style as a noisy version of the target style. To control for aspects such as preserving meaning while modifying style, we propose a reranking approach in the data synthesis phase. We evaluate our method on three novel style transfer tasks: transferring between British and American varieties, text genres (formal vs. casual), and lyrics from different musical genres. By measuring style transfer quality, meaning preservation, and the fluency of generated outputs, we demonstrate that our method is able both to produce high-quality output while maintaining the flexibility to suggest syntactically rich stylistic edits.

我们介绍了一种简单的文本风格转移方法，将风格转移框架为去噪:我们合成一个嘈杂的语料库，并将源风格视为目标风格的嘈杂版本。为了控制在修改样式的同时保留意义等方面，我们提出了一种在数据合成阶段重新排序的方法。我们在三个新颖的风格迁移任务中评估了我们的方法:在英美变体之间的迁移，文本类型(正式与休闲)以及来自不同音乐类型的歌词。通过测量风格转移质量、意义保存和生成输出的流畅性，我们证明了我们的方法能够产生高质量的输出，同时保持灵活性，以建议语法丰富的风格编辑。

引用次数: 11

Paraphrase Generation for Semi-Supervised Learning in NLU NLU中半监督学习的释义生成

Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation

Pub Date : 1900-01-01 DOI: 10.18653/v1/W19-2306

Eunah Cho, He Xie, W. Campbell

Semi-supervised learning is an efficient way to improve performance for natural language processing systems. In this work, we propose Para-SSL, a scheme to generate candidate utterances using paraphrasing and methods from semi-supervised learning. In order to perform paraphrase generation in the context of a dialog system, we automatically extract paraphrase pairs to create a paraphrase corpus. Using this data, we build a paraphrase generation system and perform one-to-many generation, followed by a validation step to select only the utterances with good quality. The paraphrase-based semi-supervised learning is applied to five functionalities in a natural language understanding system. Our proposed method for semi-supervised learning using paraphrase generation does not require user utterances and can be applied prior to releasing a new functionality to a system. Experiments show that we can achieve up to 19% of relative slot error reduction without an access to user utterances, and up to 35% when leveraging live traffic utterances.

半监督学习是提高自然语言处理系统性能的有效方法。在这项工作中，我们提出了Para-SSL，这是一种使用释义和半监督学习方法生成候选话语的方案。为了在对话系统的上下文中执行释义生成，我们自动提取释义对以创建释义语料库。利用这些数据，我们构建了一个意译生成系统，并进行一对多生成，然后进行验证步骤，只选择质量好的话语。将基于释义的半监督学习应用于自然语言理解系统的五个功能。我们提出的使用释义生成的半监督学习方法不需要用户的话语，并且可以在向系统发布新功能之前应用。实验表明，在不访问用户话语的情况下，我们可以实现高达19%的相对时隙误差减少，而在利用实时流量话语时，我们可以实现高达35%的相对时隙误差减少。

引用次数: 26

Neural Text Simplification in Low-Resource Conditions Using Weak Supervision 基于弱监督的低资源条件下神经文本简化

Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation

Pub Date : 1900-01-01 DOI: 10.18653/v1/W19-2305

Alessio Palmero Aprosio, Sara Tonelli, M. Turchi, Matteo Negri, Mattia Antonino Di Gangi

Neural text simplification has gained increasing attention in the NLP community thanks to recent advancements in deep sequence-to-sequence learning. Most recent efforts with such a data-demanding paradigm have dealt with the English language, for which sizeable training datasets are currently available to deploy competitive models. Similar improvements on less resource-rich languages are conditioned either to intensive manual work to create training data, or to the design of effective automatic generation techniques to bypass the data acquisition bottleneck. Inspired by the machine translation field, in which synthetic parallel pairs generated from monolingual data yield significant improvements to neural models, in this paper we exploit large amounts of heterogeneous data to automatically select simple sentences, which are then used to create synthetic simplification pairs. We also evaluate other solutions, such as oversampling and the use of external word embeddings to be fed to the neural simplification system. Our approach is evaluated on Italian and Spanish, for which few thousand gold sentence pairs are available. The results show that these techniques yield performance improvements over a baseline sequence-to-sequence configuration.

由于深度序列到序列学习的最新进展，神经文本简化在NLP社区中获得了越来越多的关注。最近对这种数据要求很高的范式的研究主要针对英语语言，目前有相当大的训练数据集可用于部署竞争性模型。在资源不丰富的语言上，类似的改进要么依赖于大量的手工工作来创建训练数据，要么依赖于设计有效的自动生成技术来绕过数据获取瓶颈。在机器翻译领域，由单语数据生成的合成并行对对神经模型产生了显著的改进，受此启发，本文利用大量异构数据自动选择简单句子，然后使用这些简单句子创建合成简化对。我们还评估了其他解决方案，如过采样和使用外部词嵌入来馈送到神经简化系统。我们的方法在意大利语和西班牙语上进行了评估，这两种语言有几千个金句对可用。结果表明，与基线序列到序列配置相比，这些技术产生了性能改进。

{"title":"Neural Text Simplification in Low-Resource Conditions Using Weak Supervision","authors":"Alessio Palmero Aprosio, Sara Tonelli, M. Turchi, Matteo Negri, Mattia Antonino Di Gangi","doi":"10.18653/v1/W19-2305","DOIUrl":"https://doi.org/10.18653/v1/W19-2305","url":null,"abstract":"Neural text simplification has gained increasing attention in the NLP community thanks to recent advancements in deep sequence-to-sequence learning. Most recent efforts with such a data-demanding paradigm have dealt with the English language, for which sizeable training datasets are currently available to deploy competitive models. Similar improvements on less resource-rich languages are conditioned either to intensive manual work to create training data, or to the design of effective automatic generation techniques to bypass the data acquisition bottleneck. Inspired by the machine translation field, in which synthetic parallel pairs generated from monolingual data yield significant improvements to neural models, in this paper we exploit large amounts of heterogeneous data to automatically select simple sentences, which are then used to create synthetic simplification pairs. We also evaluate other solutions, such as oversampling and the use of external word embeddings to be fed to the neural simplification system. Our approach is evaluated on Italian and Spanish, for which few thousand gold sentence pairs are available. The results show that these techniques yield performance improvements over a baseline sequence-to-sequence configuration.","PeriodicalId":223584,"journal":{"name":"Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131053359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 26

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀