Incorporating word embeddings in the hierarchical dirichlet process for query-oriented text summarization

H. V. Lierde, T. Chow
{"title":"Incorporating word embeddings in the hierarchical dirichlet process for query-oriented text summarization","authors":"H. V. Lierde, T. Chow","doi":"10.1109/INDIN.2017.8104916","DOIUrl":null,"url":null,"abstract":"The ever-growing amount of textual data available online creates the need for automatic text summarization tools. Probabilistic topic models are able to infer semantic relationships between sentences which is a key step of extractive summarization methods. However, they strongly rely on word co-occurrence patterns and fail to capture the actual semantic relationships between words such as synonymy, antonymy, etc. We propose a novel algorithm which incorporates pre-trained word embeddings in the probabilistic topic model in order to capture semantic similarities between sentences. These similarities provide the basis for a sentence ranking algorithm for query-oriented summarization. The summary is then produced by extracting highly ranked sentences from the original corpus. Our method is shown to outperform state-of-the-art algorithms on a benchmark dataset.","PeriodicalId":6595,"journal":{"name":"2017 IEEE 15th International Conference on Industrial Informatics (INDIN)","volume":"4 1","pages":"1037-1042"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 15th International Conference on Industrial Informatics (INDIN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDIN.2017.8104916","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

The ever-growing amount of textual data available online creates the need for automatic text summarization tools. Probabilistic topic models are able to infer semantic relationships between sentences which is a key step of extractive summarization methods. However, they strongly rely on word co-occurrence patterns and fail to capture the actual semantic relationships between words such as synonymy, antonymy, etc. We propose a novel algorithm which incorporates pre-trained word embeddings in the probabilistic topic model in order to capture semantic similarities between sentences. These similarities provide the basis for a sentence ranking algorithm for query-oriented summarization. The summary is then produced by extracting highly ranked sentences from the original corpus. Our method is shown to outperform state-of-the-art algorithms on a benchmark dataset.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在面向查询的文本摘要的分层狄利克雷过程中结合词嵌入
在线可用的文本数据数量不断增长,这就产生了对自动文本摘要工具的需求。概率主题模型能够推断句子之间的语义关系,这是抽取摘要方法的关键步骤。然而,它们严重依赖单词共现模式,无法捕捉单词之间的实际语义关系,如同义词、反义词等。我们提出了一种新的算法,在概率主题模型中加入预训练的词嵌入,以捕获句子之间的语义相似度。这些相似性为面向查询摘要的句子排序算法提供了基础。然后通过从原始语料库中提取排名较高的句子来生成摘要。我们的方法在基准数据集上的表现优于最先进的算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
A time-synchronized ZigBee building network for smart water management Detection of regime switching points in non-stationary sequences using stochastic learning based weak estimation method Novel infrastructure with common API using docker for scaling the degree of platforms for smart community services Cloud architecture for industrial image processing: Platform for realtime inline quality assurance Migration from traditional towards cyber-physical production systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1