Investigating Content Selection for Language Generation using Machine Learning

European Workshop on Natural Language Generation Pub Date : 2009-03-30 DOI:10.3115/1610195.1610218

Colin Kelly, Ann A. Copestake, Nikiforos Karamanis

引用次数: 23

Abstract

The content selection component of a natural language generation system decides which information should be communicated in its output. We use information from reports on the game of cricket. We first describe a simple factoid-to-text alignment algorithm then treat content selection as a collective classification problem and demonstrate that simple 'grouping' of statistics at various levels of granularity yields substantially improved results over a probabilistic baseline. We additionally show that holding back of specific types of input data, and linking database structures with commonality further increase performance.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

使用机器学习研究语言生成的内容选择

自然语言生成系统的内容选择组件决定在其输出中应该传达哪些信息。我们使用来自板球比赛报道的信息。我们首先描述了一个简单的事实到文本对齐算法，然后将内容选择视为一个集体分类问题，并证明了在不同粒度级别上对统计数据进行简单的“分组”可以大大提高概率基线的结果。我们还表明，保留特定类型的输入数据，并将数据库结构与共性联系起来，可以进一步提高性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

European Workshop on Natural Language Generation

自引率

0.00%

发文量

期刊最新文献

Natural Language Generation from Pictographs A Personal Storytelling about Your Favorite Data Topic Transition Strategies for an Information-Giving Agent Sentence Ordering in Electronic Navigational Chart Companion Text Generation Generating Récit from Sensor Data: Evaluation of a Task Model for Story Planning and Preliminary Experiments with GPS Data