自动释义获取标题的聚类与匹配

European Workshop on Natural Language Generation Pub Date : 2009-03-30 DOI:10.3115/1610195.1610216

S. Wubben, Antal van den Bosch, E. Krahmer, E. Marsi

{"title":"自动释义获取标题的聚类与匹配","authors":"S. Wubben, Antal van den Bosch, E. Krahmer, E. Marsi","doi":"10.3115/1610195.1610216","DOIUrl":null,"url":null,"abstract":"For developing a data-driven text rewriting algorithm for paraphrasing, it is essential to have a monolingual corpus of aligned paraphrased sentences. News article headlines are a rich source of paraphrases; they tend to describe the same event in various different ways, and can easily be obtained from the web. We compare two methods of aligning headlines to construct such an aligned corpus of paraphrases, one based on clustering, and the other on pairwise similarity-based matching. We show that the latter performs best on the task of aligning paraphrastic headlines.","PeriodicalId":307841,"journal":{"name":"European Workshop on Natural Language Generation","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":"{\"title\":\"Clustering and Matching Headlines for Automatic Paraphrase Acquisition\",\"authors\":\"S. Wubben, Antal van den Bosch, E. Krahmer, E. Marsi\",\"doi\":\"10.3115/1610195.1610216\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For developing a data-driven text rewriting algorithm for paraphrasing, it is essential to have a monolingual corpus of aligned paraphrased sentences. News article headlines are a rich source of paraphrases; they tend to describe the same event in various different ways, and can easily be obtained from the web. We compare two methods of aligning headlines to construct such an aligned corpus of paraphrases, one based on clustering, and the other on pairwise similarity-based matching. We show that the latter performs best on the task of aligning paraphrastic headlines.\",\"PeriodicalId\":307841,\"journal\":{\"name\":\"European Workshop on Natural Language Generation\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-03-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"33\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"European Workshop on Natural Language Generation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3115/1610195.1610216\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Workshop on Natural Language Generation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3115/1610195.1610216","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 33

摘要

为了开发用于意译的数据驱动文本重写算法，必须有一个对齐的意译句子的单语语料库。新闻标题是释义的丰富来源;他们倾向于用各种不同的方式描述同一事件，并且可以很容易地从网络上获得。我们比较了两种对齐标题的方法来构建这样一个对齐的释义语料库，一种是基于聚类的，另一种是基于两两相似度的匹配。我们表明，后者在调整释义标题的任务上表现最好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Clustering and Matching Headlines for Automatic Paraphrase Acquisition

For developing a data-driven text rewriting algorithm for paraphrasing, it is essential to have a monolingual corpus of aligned paraphrased sentences. News article headlines are a rich source of paraphrases; they tend to describe the same event in various different ways, and can easily be obtained from the web. We compare two methods of aligning headlines to construct such an aligned corpus of paraphrases, one based on clustering, and the other on pairwise similarity-based matching. We show that the latter performs best on the task of aligning paraphrastic headlines.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

European Workshop on Natural Language Generation

自引率

0.00%

发文量

期刊最新文献

Natural Language Generation from Pictographs A Personal Storytelling about Your Favorite Data Topic Transition Strategies for an Information-Giving Agent Sentence Ordering in Electronic Navigational Chart Companion Text Generation Generating Récit from Sensor Data: Evaluation of a Task Model for Story Planning and Preliminary Experiments with GPS Data