Leading Sentence News TextRank

Phua Yeong Tsann, Yew Kwang Hooi, Mohd Fadzil bin Hassan, Matthew Teow Yok Wooi
{"title":"Leading Sentence News TextRank","authors":"Phua Yeong Tsann, Yew Kwang Hooi, Mohd Fadzil bin Hassan, Matthew Teow Yok Wooi","doi":"10.1109/ICICyTA53712.2021.9689186","DOIUrl":null,"url":null,"abstract":"Application of automatic text summarization is a popular Natural Language Processing task and often used in extracting lengthy content to produce short summary. This is a tedious yet time-consuming task. This study focuses on Malay news articles with the aim to select representative sentences for Malay news headline generation. The dataset used in the experiment is a collection of multi-genre Malay news published between year of 2017 and 2019 from Bernama.com. In this study, a leading sentence approach is applied in the TextRank with TF-IDF and Word2Vec as language models to perform salient sentence extraction. In the experiment, the top-ranking sentences extracted are based on the 15%, 20%, 25% and 30% of the original news content. The extracted contents are evaluation against the original news headline using ROUGE evaluation matric. The model shows that the inclusion of first sentence and first two sentences from the news are able to achieve significant improvement. This leading sentence approach is able to achieve improvement of the F1 score from 1.36 to 7.98. Besides that, the experiment also proofs that the ROUGE scores decrease as the percentage of extraction increase. Thus, the proposed method is fast and resource efficient as compared to other state-of-the-art Natural Language approach.","PeriodicalId":448148,"journal":{"name":"2021 International Conference on Intelligent Cybernetics Technology & Applications (ICICyTA)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Intelligent Cybernetics Technology & Applications (ICICyTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICyTA53712.2021.9689186","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Application of automatic text summarization is a popular Natural Language Processing task and often used in extracting lengthy content to produce short summary. This is a tedious yet time-consuming task. This study focuses on Malay news articles with the aim to select representative sentences for Malay news headline generation. The dataset used in the experiment is a collection of multi-genre Malay news published between year of 2017 and 2019 from Bernama.com. In this study, a leading sentence approach is applied in the TextRank with TF-IDF and Word2Vec as language models to perform salient sentence extraction. In the experiment, the top-ranking sentences extracted are based on the 15%, 20%, 25% and 30% of the original news content. The extracted contents are evaluation against the original news headline using ROUGE evaluation matric. The model shows that the inclusion of first sentence and first two sentences from the news are able to achieve significant improvement. This leading sentence approach is able to achieve improvement of the F1 score from 1.36 to 7.98. Besides that, the experiment also proofs that the ROUGE scores decrease as the percentage of extraction increase. Thus, the proposed method is fast and resource efficient as compared to other state-of-the-art Natural Language approach.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
引子句新闻文本
文本自动摘要是自然语言处理中常用的一项任务,通常用于提取冗长的内容生成简短的摘要。这是一项乏味而耗时的任务。本研究的重点是马来语新闻文章,目的是选择马来语新闻标题生成的代表性句子。实验中使用的数据集是Bernama.com在2017年至2019年期间发布的多类型马来新闻的集合。本研究以TF-IDF和Word2Vec为语言模型,在TextRank中采用先导句方法进行显著句提取。在实验中,根据原新闻内容的15%、20%、25%和30%提取出排名靠前的句子。提取的内容使用ROUGE评价矩阵对原新闻标题进行评价。模型表明,从新闻中加入第一句和前两句能够取得显著的进步。这种引语的方法能够使F1分数从1.36提高到7.98。此外,实验还证明了ROUGE分数随着提取百分比的增加而降低。因此,与其他最先进的自然语言方法相比,所提出的方法速度快,资源高效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Enhanced Sentiment Analysis Technique using Machine Learning (B.R.A.G.E technique) Fruit Ripeness Sorting Machine using Color Sensors Comparative Analysis of Community Detection Methods for Link Failure Recovery in Software Defined Networks Secure MQTT Authentication and Message Exchange Methods for IoT Constrained Device SVD-Based Feature Extraction Technique for The Improvement of Effective Connectivity Detection
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1