Selecting Article Segment Titles Based on Keyphrase Features and Semantic Relatedness

2018 7th International Congress on Advanced Applied Informatics (IIAI-AAI) Pub Date : 2018-07-01 DOI:10.1109/IIAI-AAI.2018.00034

Yuming Guo, M. Iwaihara

{"title":"Selecting Article Segment Titles Based on Keyphrase Features and Semantic Relatedness","authors":"Yuming Guo, M. Iwaihara","doi":"10.1109/IIAI-AAI.2018.00034","DOIUrl":null,"url":null,"abstract":"Nowadays people can find almost all kinds of information they want from the Internet. However, in most cases, users are not willing to find their target among segment among long paragraphs, by spending much time browsing texts. Existing work on topic labeling works effectively and performs well on document categorization, but inadequate for granularity of detailed contents. Thus we propose a method for selecting titles for segments in long documents. We analyze the characteristics of high quality titles for article segments, from the aspect of semantic relatedness between the target segment and related articles as well as other segments. Then we revise three features proposed before. We improve the phraseness feature, for giving appropriate scores for long titles. Meanwhile, we combine the features SimPF and Embedding-vector to enhance the efficiency and rationality. We use Wikipedia articles for experimental evaluations, in which a large number of article segments are titled manually, and a great number of articles lack detailed segment titles. We evaluate scoring functions by where hidden original segment titles are ranked, through precision@K. Through rigorous evaluations, we show an optimum combination of the features.","PeriodicalId":309975,"journal":{"name":"2018 7th International Congress on Advanced Applied Informatics (IIAI-AAI)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 7th International Congress on Advanced Applied Informatics (IIAI-AAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IIAI-AAI.2018.00034","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Nowadays people can find almost all kinds of information they want from the Internet. However, in most cases, users are not willing to find their target among segment among long paragraphs, by spending much time browsing texts. Existing work on topic labeling works effectively and performs well on document categorization, but inadequate for granularity of detailed contents. Thus we propose a method for selecting titles for segments in long documents. We analyze the characteristics of high quality titles for article segments, from the aspect of semantic relatedness between the target segment and related articles as well as other segments. Then we revise three features proposed before. We improve the phraseness feature, for giving appropriate scores for long titles. Meanwhile, we combine the features SimPF and Embedding-vector to enhance the efficiency and rationality. We use Wikipedia articles for experimental evaluations, in which a large number of article segments are titled manually, and a great number of articles lack detailed segment titles. We evaluate scoring functions by where hidden original segment titles are ranked, through precision@K. Through rigorous evaluations, we show an optimum combination of the features.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

基于关键词特征和语义相关性的文章分段标题选择

现在人们可以从互联网上找到几乎所有他们想要的信息。然而，在大多数情况下，用户不愿意花费大量时间浏览文本，在长段落的分段中找到自己的目标。现有的主题标注工作在文档分类方面表现良好，但在细节内容的粒度方面做得不够。因此，我们提出了一种在长文档中选择片段标题的方法。本文从目标词段与相关词段以及其他词段之间的语义关联角度，分析了高质量词段标题的特征。然后对之前提出的三个特征进行了修正。我们改进了短语功能，为长标题提供适当的分数。同时，我们结合了SimPF和Embedding-vector的特点，提高了算法的效率和合理性。我们使用维基百科的文章进行实验评估，其中大量的文章分段是手工命名的，大量的文章缺乏详细的分段标题。我们通过precision@K通过隐藏的原始片段标题的排名来评估评分函数。通过严格的评估，我们展示了特征的最佳组合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

2018 7th International Congress on Advanced Applied Informatics (IIAI-AAI)

自引率

0.00%

发文量