阅读时间预测生成文本的质量高于和超越人类评级

European Workshop on Natural Language Generation Pub Date : 2015-09-01 DOI:10.18653/v1/w15-4705

Sina Zarrieß, Sebastian Loth, David Schlangen

{"title":"阅读时间预测生成文本的质量高于和超越人类评级","authors":"Sina Zarrieß, Sebastian Loth, David Schlangen","doi":"10.18653/v1/w15-4705","DOIUrl":null,"url":null,"abstract":"Typically, human evaluation of NLG output is based on user ratings. We collected ratings and reading time data in a simple, low-cost experimental paradigm for text generation. Participants were presented corpus texts, automatically linearised texts, and texts containing predicted referring expressions and automatic linearisation. We demonstrate that the reading time metrics outperform the ratings in classifying texts according to their quality. Regression analyses showed that self-reported ratings discriminated poorly between the kinds of manipulation, especially between defects in word order and text coherence. In contrast, a combination of objective measures from the low-cost mouse contingent reading paradigm provided very high classification accuracy and thus, greater insight into the actual quality of an automatically generated text.","PeriodicalId":307841,"journal":{"name":"European Workshop on Natural Language Generation","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Reading Times Predict the Quality of Generated Text Above and Beyond Human Ratings\",\"authors\":\"Sina Zarrieß, Sebastian Loth, David Schlangen\",\"doi\":\"10.18653/v1/w15-4705\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Typically, human evaluation of NLG output is based on user ratings. We collected ratings and reading time data in a simple, low-cost experimental paradigm for text generation. Participants were presented corpus texts, automatically linearised texts, and texts containing predicted referring expressions and automatic linearisation. We demonstrate that the reading time metrics outperform the ratings in classifying texts according to their quality. Regression analyses showed that self-reported ratings discriminated poorly between the kinds of manipulation, especially between defects in word order and text coherence. In contrast, a combination of objective measures from the low-cost mouse contingent reading paradigm provided very high classification accuracy and thus, greater insight into the actual quality of an automatically generated text.\",\"PeriodicalId\":307841,\"journal\":{\"name\":\"European Workshop on Natural Language Generation\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Workshop on Natural Language Generation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/w15-4705\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Workshop on Natural Language Generation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/w15-4705","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

通常，人类对NLG输出的评估是基于用户评级的。我们在一个简单、低成本的文本生成实验范例中收集评分和阅读时间数据。参与者被呈现语料库文本，自动线性化文本，以及包含预测引用表达式和自动线性化的文本。我们证明了阅读时间指标在根据文本质量对文本进行分类方面优于评级。回归分析表明，自我报告的评分对各种操纵的区分能力很差，特别是在词序和文本连贯的缺陷之间。相比之下，来自低成本鼠标随机阅读范例的客观测量的组合提供了非常高的分类准确性，因此，更深入地了解自动生成文本的实际质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Reading Times Predict the Quality of Generated Text Above and Beyond Human Ratings

Typically, human evaluation of NLG output is based on user ratings. We collected ratings and reading time data in a simple, low-cost experimental paradigm for text generation. Participants were presented corpus texts, automatically linearised texts, and texts containing predicted referring expressions and automatic linearisation. We demonstrate that the reading time metrics outperform the ratings in classifying texts according to their quality. Regression analyses showed that self-reported ratings discriminated poorly between the kinds of manipulation, especially between defects in word order and text coherence. In contrast, a combination of objective measures from the low-cost mouse contingent reading paradigm provided very high classification accuracy and thus, greater insight into the actual quality of an automatically generated text.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

European Workshop on Natural Language Generation

自引率

0.00%

发文量

期刊最新文献

Natural Language Generation from Pictographs A Personal Storytelling about Your Favorite Data Topic Transition Strategies for an Information-Giving Agent Sentence Ordering in Electronic Navigational Chart Companion Text Generation Generating Récit from Sensor Data: Evaluation of a Task Model for Story Planning and Preliminary Experiments with GPS Data