使用机器学习研究语言生成的内容选择

European Workshop on Natural Language Generation Pub Date : 2009-03-30 DOI:10.3115/1610195.1610218

Colin Kelly, Ann A. Copestake, Nikiforos Karamanis

{"title":"使用机器学习研究语言生成的内容选择","authors":"Colin Kelly, Ann A. Copestake, Nikiforos Karamanis","doi":"10.3115/1610195.1610218","DOIUrl":null,"url":null,"abstract":"The content selection component of a natural language generation system decides which information should be communicated in its output. We use information from reports on the game of cricket. We first describe a simple factoid-to-text alignment algorithm then treat content selection as a collective classification problem and demonstrate that simple 'grouping' of statistics at various levels of granularity yields substantially improved results over a probabilistic baseline. We additionally show that holding back of specific types of input data, and linking database structures with commonality further increase performance.","PeriodicalId":307841,"journal":{"name":"European Workshop on Natural Language Generation","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Investigating Content Selection for Language Generation using Machine Learning\",\"authors\":\"Colin Kelly, Ann A. Copestake, Nikiforos Karamanis\",\"doi\":\"10.3115/1610195.1610218\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The content selection component of a natural language generation system decides which information should be communicated in its output. We use information from reports on the game of cricket. We first describe a simple factoid-to-text alignment algorithm then treat content selection as a collective classification problem and demonstrate that simple 'grouping' of statistics at various levels of granularity yields substantially improved results over a probabilistic baseline. We additionally show that holding back of specific types of input data, and linking database structures with commonality further increase performance.\",\"PeriodicalId\":307841,\"journal\":{\"name\":\"European Workshop on Natural Language Generation\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-03-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Workshop on Natural Language Generation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3115/1610195.1610218\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Workshop on Natural Language Generation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3115/1610195.1610218","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

摘要

自然语言生成系统的内容选择组件决定在其输出中应该传达哪些信息。我们使用来自板球比赛报道的信息。我们首先描述了一个简单的事实到文本对齐算法，然后将内容选择视为一个集体分类问题，并证明了在不同粒度级别上对统计数据进行简单的“分组”可以大大提高概率基线的结果。我们还表明，保留特定类型的输入数据，并将数据库结构与共性联系起来，可以进一步提高性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Investigating Content Selection for Language Generation using Machine Learning

The content selection component of a natural language generation system decides which information should be communicated in its output. We use information from reports on the game of cricket. We first describe a simple factoid-to-text alignment algorithm then treat content selection as a collective classification problem and demonstrate that simple 'grouping' of statistics at various levels of granularity yields substantially improved results over a probabilistic baseline. We additionally show that holding back of specific types of input data, and linking database structures with commonality further increase performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

European Workshop on Natural Language Generation

自引率

0.00%

发文量

期刊最新文献

Natural Language Generation from Pictographs A Personal Storytelling about Your Favorite Data Topic Transition Strategies for an Information-Giving Agent Sentence Ordering in Electronic Navigational Chart Companion Text Generation Generating Récit from Sensor Data: Evaluation of a Task Model for Story Planning and Preliminary Experiments with GPS Data