Uma Comparação Sistemática de Diferentes Abordagens para a Sumarização Automática Extrativa de Textos em Português

IF 0.3 Q4 LINGUISTICS Linguamatica Pub Date : 2015-07-31 DOI:10.21814/LM.7.1.203
M. Costa, Bruno Martins
{"title":"Uma Comparação Sistemática de Diferentes Abordagens para a Sumarização Automática Extrativa de Textos em Português","authors":"M. Costa, Bruno Martins","doi":"10.21814/LM.7.1.203","DOIUrl":null,"url":null,"abstract":"Automatic document summarization is the task of automatically generating condensed versions of source texts, presenting itself as one of the fundamental problems in the areas of Information Retrieval and Natural Language Processing. In this paper, different extractive approaches are compared in the task of summarizing individual documents corresponding to journalistic texts written in Portuguese. Through the use of the ROUGE package for measuring the quality of the produced summaries, we report on results for two different experimental domains, involving (i) the generation of headlines for news articles written in European Portuguese, and (ii) the generation of summaries for news articles written in Brazilian Portuguese. The results demonstrate that methods based on the selection of the first sentences have the best results  when building extractive news headlines in terms of several ROUGE metrics. Regarding the generation of summaries with more than one sentence, the method that achieved the best results was the LSA Squared algorithm, for the various ROUGE metrics.","PeriodicalId":41819,"journal":{"name":"Linguamatica","volume":"7 1","pages":"23-40"},"PeriodicalIF":0.3000,"publicationDate":"2015-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Linguamatica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21814/LM.7.1.203","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"LINGUISTICS","Score":null,"Total":0}
引用次数: 3

Abstract

Automatic document summarization is the task of automatically generating condensed versions of source texts, presenting itself as one of the fundamental problems in the areas of Information Retrieval and Natural Language Processing. In this paper, different extractive approaches are compared in the task of summarizing individual documents corresponding to journalistic texts written in Portuguese. Through the use of the ROUGE package for measuring the quality of the produced summaries, we report on results for two different experimental domains, involving (i) the generation of headlines for news articles written in European Portuguese, and (ii) the generation of summaries for news articles written in Brazilian Portuguese. The results demonstrate that methods based on the selection of the first sentences have the best results  when building extractive news headlines in terms of several ROUGE metrics. Regarding the generation of summaries with more than one sentence, the method that achieved the best results was the LSA Squared algorithm, for the various ROUGE metrics.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
葡萄牙语文本自动提取摘要不同方法的系统比较
自动文档摘要是自动生成源文本的压缩版本的任务,是信息检索和自然语言处理领域的基本问题之一。在本文中,不同的提取方法在总结与葡萄牙语新闻文本对应的单个文件的任务中进行了比较。通过使用ROUGE软件包来衡量生成摘要的质量,我们报告了两个不同实验领域的结果,包括(i)用欧洲葡萄牙语撰写的新闻文章的标题生成,以及(ii)用巴西葡萄牙语撰写的新闻文章的摘要生成。结果表明,基于首句选择的方法在构建提取新闻标题时,在几个ROUGE指标方面具有最佳效果。对于多句摘要的生成,对于各种ROUGE指标,取得最佳效果的方法是LSA Squared算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Linguamatica
Linguamatica LINGUISTICS-
CiteScore
1.40
自引率
0.00%
发文量
4
审稿时长
6 weeks
期刊最新文献
A compilação e a análise de métricas textuais de um corpus de redações Classificação da qualidade da argumentação em tweets no domínio da política brasileira Extracção de Relações de Apoio e Oposição em Títulos de Notícias de Política em Português Pais, filhos e outras relações familiares no DIP DIP - Desafio de Identificação de Personagens: objectivo, organização, recursos e resultados
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1