Selecting Article Segment Titles Based on Keyphrase Features and Semantic Relatedness

Yuming Guo, M. Iwaihara
{"title":"Selecting Article Segment Titles Based on Keyphrase Features and Semantic Relatedness","authors":"Yuming Guo, M. Iwaihara","doi":"10.1109/IIAI-AAI.2018.00034","DOIUrl":null,"url":null,"abstract":"Nowadays people can find almost all kinds of information they want from the Internet. However, in most cases, users are not willing to find their target among segment among long paragraphs, by spending much time browsing texts. Existing work on topic labeling works effectively and performs well on document categorization, but inadequate for granularity of detailed contents. Thus we propose a method for selecting titles for segments in long documents. We analyze the characteristics of high quality titles for article segments, from the aspect of semantic relatedness between the target segment and related articles as well as other segments. Then we revise three features proposed before. We improve the phraseness feature, for giving appropriate scores for long titles. Meanwhile, we combine the features SimPF and Embedding-vector to enhance the efficiency and rationality. We use Wikipedia articles for experimental evaluations, in which a large number of article segments are titled manually, and a great number of articles lack detailed segment titles. We evaluate scoring functions by where hidden original segment titles are ranked, through precision@K. Through rigorous evaluations, we show an optimum combination of the features.","PeriodicalId":309975,"journal":{"name":"2018 7th International Congress on Advanced Applied Informatics (IIAI-AAI)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 7th International Congress on Advanced Applied Informatics (IIAI-AAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IIAI-AAI.2018.00034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Nowadays people can find almost all kinds of information they want from the Internet. However, in most cases, users are not willing to find their target among segment among long paragraphs, by spending much time browsing texts. Existing work on topic labeling works effectively and performs well on document categorization, but inadequate for granularity of detailed contents. Thus we propose a method for selecting titles for segments in long documents. We analyze the characteristics of high quality titles for article segments, from the aspect of semantic relatedness between the target segment and related articles as well as other segments. Then we revise three features proposed before. We improve the phraseness feature, for giving appropriate scores for long titles. Meanwhile, we combine the features SimPF and Embedding-vector to enhance the efficiency and rationality. We use Wikipedia articles for experimental evaluations, in which a large number of article segments are titled manually, and a great number of articles lack detailed segment titles. We evaluate scoring functions by where hidden original segment titles are ranked, through precision@K. Through rigorous evaluations, we show an optimum combination of the features.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于关键词特征和语义相关性的文章分段标题选择
现在人们可以从互联网上找到几乎所有他们想要的信息。然而,在大多数情况下,用户不愿意花费大量时间浏览文本,在长段落的分段中找到自己的目标。现有的主题标注工作在文档分类方面表现良好,但在细节内容的粒度方面做得不够。因此,我们提出了一种在长文档中选择片段标题的方法。本文从目标词段与相关词段以及其他词段之间的语义关联角度,分析了高质量词段标题的特征。然后对之前提出的三个特征进行了修正。我们改进了短语功能,为长标题提供适当的分数。同时,我们结合了SimPF和Embedding-vector的特点,提高了算法的效率和合理性。我们使用维基百科的文章进行实验评估,其中大量的文章分段是手工命名的,大量的文章缺乏详细的分段标题。我们通过precision@K通过隐藏的原始片段标题的排名来评估评分函数。通过严格的评估,我们展示了特征的最佳组合。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Finding High Quality Documents through Link and Click Graphs Seamless Support for International Students' Job Hunting in Japan Using Learning Log System and eBook Message from Program Chair Internet Based Interactive Transcription Support System for Woodblock-Printed Japanese Historical Book Images Common Sensing and Analyses to Visualize a Production Process with Parallelly Utilized Resource - Job-Shop and Flow-Shop Cases
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1