Maximum Marginal Relevance and Vector Space Model for Summarizing Students' Final Project Abstracts

Gunawan Gunawan, Fitria Fitria, Esther Irawati Setiawan, Kimiya Fujisawa
{"title":"Maximum Marginal Relevance and Vector Space Model for Summarizing Students' Final Project Abstracts","authors":"Gunawan Gunawan, Fitria Fitria, Esther Irawati Setiawan, Kimiya Fujisawa","doi":"10.17977/um018v6i12023p57-68","DOIUrl":null,"url":null,"abstract":"Automatic summarization is reducing a text document with a computer program to create a summary that retains the essential parts of the original document. Automatic summarization is necessary to deal with information overload, and the amount of data is increasing. A summary is needed to get the contents of the article briefly. A summary is an effective way to present extended information in a concise form of the main contents of an article, and the aim is to tell the reader the essence of a central idea. The simple concept of a summary is to take an essential part of the entire contents of the article. Which then presents it back in summary form. The steps in this research will start with the user selecting or searching for text documents that will be summarized with keywords in the abstract as a query. The proposed approach performs text preprocessing for documents: sentence breaking, case folding, word tokenizing, filtering, and stemming. The results of the preprocessed text are weighted by term frequency-inverse document frequency (tf-idf), then weighted for query relevance using the vector space model and sentence similarity using cosine similarity. The next stage is maximum marginal relevance for sentence extraction. The proposed approach provides comprehensive summarization compared with another approach. The test results are compared with manual summaries, which produce an average precision of 88%, recall of 61%, and f-measure of 70%.","PeriodicalId":52868,"journal":{"name":"Knowledge Engineering and Data Science","volume":"18 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge Engineering and Data Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17977/um018v6i12023p57-68","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Automatic summarization is reducing a text document with a computer program to create a summary that retains the essential parts of the original document. Automatic summarization is necessary to deal with information overload, and the amount of data is increasing. A summary is needed to get the contents of the article briefly. A summary is an effective way to present extended information in a concise form of the main contents of an article, and the aim is to tell the reader the essence of a central idea. The simple concept of a summary is to take an essential part of the entire contents of the article. Which then presents it back in summary form. The steps in this research will start with the user selecting or searching for text documents that will be summarized with keywords in the abstract as a query. The proposed approach performs text preprocessing for documents: sentence breaking, case folding, word tokenizing, filtering, and stemming. The results of the preprocessed text are weighted by term frequency-inverse document frequency (tf-idf), then weighted for query relevance using the vector space model and sentence similarity using cosine similarity. The next stage is maximum marginal relevance for sentence extraction. The proposed approach provides comprehensive summarization compared with another approach. The test results are compared with manual summaries, which produce an average precision of 88%, recall of 61%, and f-measure of 70%.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用于汇总学生毕业设计摘要的最大边际相关性和矢量空间模型
自动摘要是用计算机程序缩减文本文档,以创建保留原始文档重要部分的摘要。自动摘要是应对信息过载和数据量不断增加的必要手段。要想简明扼要地了解文章内容,就需要摘要。摘要是以简洁的形式呈现文章主要内容的扩展信息的有效方法,其目的是告诉读者中心思想的精髓。摘要的简单概念是摘取文章全部内容的重要部分。然后再以摘要的形式呈现出来。本研究的步骤将从用户选择或搜索文本文档开始,这些文档将以摘要中的关键词作为查询内容进行总结。建议的方法对文档进行文本预处理:断句、大小写折叠、单词标记化、过滤和词干化。预处理文本的结果按照词频-反文档频率(tf-idf)进行加权,然后使用向量空间模型对查询相关性进行加权,并使用余弦相似性对句子相似性进行加权。下一阶段是提取句子的最大边际相关性。与其他方法相比,建议的方法提供了全面的摘要。测试结果与人工摘要进行了比较,后者的平均精确度为 88%,召回率为 61%,f-measure 为 70%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
4
审稿时长
8 weeks
期刊最新文献
Optimizing Random Forest Algorithm to Classify Player's Memorisation via In-game Data Long-Term Traffic Prediction Based on Stacked GCN Model Round-Robin Algorithm in Load Balancing for National Data Centers K-Means Clustering and Multilayer Perceptron for Categorizing Student Business Groups Maximum Marginal Relevance and Vector Space Model for Summarizing Students' Final Project Abstracts
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1