一种基于概念的多文档摘要探索中心性和位置的ILP方法

2018 7th Brazilian Conference on Intelligent Systems (BRACIS) Pub Date : 2018-10-01 DOI:10.1109/BRACIS.2018.00015

Hilário Oliveira, R. Lins, Rinaldo Lima, F. Freitas, S. Simske

{"title":"一种基于概念的多文档摘要探索中心性和位置的ILP方法","authors":"Hilário Oliveira, R. Lins, Rinaldo Lima, F. Freitas, S. Simske","doi":"10.1109/BRACIS.2018.00015","DOIUrl":null,"url":null,"abstract":"Multi-document summarization systems aim to generate a brief text containing the most relevant information from a collection of related documents. The fast and continually growing volume of text data has increasingly drawn the attention from users and researchers to such systems. Aspects such as sentence centrality and position have been extensively studied in multi-document summarization as indicators of content relevancy. Very few works have investigated their efficient integration using global-based optimization approaches, however. This paper proposes a concept-based integer linear programming approach for multi-document summarization of news articles that integrates centrality and position features to filter out the less relevant sentences and measure the importance of concepts (textual fragments) in composing the output summary. The presented approach relies on a centrality-based strategy to perform the sentence clustering process and also to support the sentence ordering step. The benchmarks conducted with four datasets of the Document Understanding Conferences from 2001 to 2004 demonstrate that the proposed approach presents competitive performance compared with other state-of-the-art methods.","PeriodicalId":405190,"journal":{"name":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Concept-Based ILP Approach for Multi-document Summarization Exploring Centrality and Position\",\"authors\":\"Hilário Oliveira, R. Lins, Rinaldo Lima, F. Freitas, S. Simske\",\"doi\":\"10.1109/BRACIS.2018.00015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multi-document summarization systems aim to generate a brief text containing the most relevant information from a collection of related documents. The fast and continually growing volume of text data has increasingly drawn the attention from users and researchers to such systems. Aspects such as sentence centrality and position have been extensively studied in multi-document summarization as indicators of content relevancy. Very few works have investigated their efficient integration using global-based optimization approaches, however. This paper proposes a concept-based integer linear programming approach for multi-document summarization of news articles that integrates centrality and position features to filter out the less relevant sentences and measure the importance of concepts (textual fragments) in composing the output summary. The presented approach relies on a centrality-based strategy to perform the sentence clustering process and also to support the sentence ordering step. The benchmarks conducted with four datasets of the Document Understanding Conferences from 2001 to 2004 demonstrate that the proposed approach presents competitive performance compared with other state-of-the-art methods.\",\"PeriodicalId\":405190,\"journal\":{\"name\":\"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)\",\"volume\":\"46 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BRACIS.2018.00015\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 7th Brazilian Conference on Intelligent Systems (BRACIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BRACIS.2018.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

多文档摘要系统旨在从相关文档的集合中生成包含最相关信息的简短文本。快速增长的文本数据量越来越引起用户和研究人员对此类系统的关注。句子中心性和位置等方面作为内容相关性的指标在多文档摘要中得到了广泛的研究。然而，很少有作品使用基于全局的优化方法来研究它们的有效集成。本文提出了一种基于概念的整数线性规划方法，用于新闻文章的多文档摘要，该方法结合中心性和位置特征来过滤不太相关的句子，并测量概念(文本片段)在组成输出摘要中的重要性。所提出的方法依赖于基于中心的策略来执行句子聚类过程，并支持句子排序步骤。以2001年至2004年文件了解会议的四个数据集进行的基准测试表明，与其他最先进的方法相比，建议的方法具有竞争力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

A Concept-Based ILP Approach for Multi-document Summarization Exploring Centrality and Position

Multi-document summarization systems aim to generate a brief text containing the most relevant information from a collection of related documents. The fast and continually growing volume of text data has increasingly drawn the attention from users and researchers to such systems. Aspects such as sentence centrality and position have been extensively studied in multi-document summarization as indicators of content relevancy. Very few works have investigated their efficient integration using global-based optimization approaches, however. This paper proposes a concept-based integer linear programming approach for multi-document summarization of news articles that integrates centrality and position features to filter out the less relevant sentences and measure the importance of concepts (textual fragments) in composing the output summary. The presented approach relies on a centrality-based strategy to perform the sentence clustering process and also to support the sentence ordering step. The benchmarks conducted with four datasets of the Document Understanding Conferences from 2001 to 2004 demonstrate that the proposed approach presents competitive performance compared with other state-of-the-art methods.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 7th Brazilian Conference on Intelligent Systems (BRACIS)

自引率

0.00%

发文量