基于抽取技术的马拉地语文本摘要

Mrs. Kirti Pankaj Kakde, Dr. H. M. Padalikar
{"title":"基于抽取技术的马拉地语文本摘要","authors":"Mrs. Kirti Pankaj Kakde, Dr. H. M. Padalikar","doi":"10.35940/ijeat.e4200.0612523","DOIUrl":null,"url":null,"abstract":"Multilingualism has played a key role in India, where people speak and understand more than one language. Marathi, as one of the official languages inMaharashtra state, is often used in sources such as newspapers or blogs. However, manually summarizing bulky Marathi paragraphs or texts for easy comprehension can be challenging. To address this, text summarization becomes essential to make large documents easily readable and understandable. This research article focuses on single document text summarization using the Natural Language Processing (NLP) approach, a subfield of Artificial Intelligence. Automatic text summarization is employed to extract relevant information in a concise manner. Information Extraction is particularly useful when summarizing documents consisting of multiple sentences into three or four sentences. While extensive research has been conducted on English Text Summarization, the field of Marathi document summarization remains largely unexplored. This research paper explores extractive text summarization techniques specifically for Marathi documents, utilizing the LexRank algorithm along with Genism, a graph-based technique, to generate informative summaries within word limit constraints. The experiment was conducted on the IndicNLP Marathi news article dataset, resulting in 78% precision, 72% recall, and 75% F-measure using the frequency-based method, and 78% precision, 78% recall, and 78% F-measure using the Lex Rank algorithm.","PeriodicalId":13981,"journal":{"name":"International Journal of Engineering and Advanced Technology","volume":"26 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Marathi Text Summarization using Extractive Technique\",\"authors\":\"Mrs. Kirti Pankaj Kakde, Dr. H. M. Padalikar\",\"doi\":\"10.35940/ijeat.e4200.0612523\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Multilingualism has played a key role in India, where people speak and understand more than one language. Marathi, as one of the official languages inMaharashtra state, is often used in sources such as newspapers or blogs. However, manually summarizing bulky Marathi paragraphs or texts for easy comprehension can be challenging. To address this, text summarization becomes essential to make large documents easily readable and understandable. This research article focuses on single document text summarization using the Natural Language Processing (NLP) approach, a subfield of Artificial Intelligence. Automatic text summarization is employed to extract relevant information in a concise manner. Information Extraction is particularly useful when summarizing documents consisting of multiple sentences into three or four sentences. While extensive research has been conducted on English Text Summarization, the field of Marathi document summarization remains largely unexplored. This research paper explores extractive text summarization techniques specifically for Marathi documents, utilizing the LexRank algorithm along with Genism, a graph-based technique, to generate informative summaries within word limit constraints. The experiment was conducted on the IndicNLP Marathi news article dataset, resulting in 78% precision, 72% recall, and 75% F-measure using the frequency-based method, and 78% precision, 78% recall, and 78% F-measure using the Lex Rank algorithm.\",\"PeriodicalId\":13981,\"journal\":{\"name\":\"International Journal of Engineering and Advanced Technology\",\"volume\":\"26 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Engineering and Advanced Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.35940/ijeat.e4200.0612523\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Engineering and Advanced Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.35940/ijeat.e4200.0612523","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

多语制在印度发挥了关键作用,人们会说并理解一种以上的语言。马拉地语作为马哈拉施特拉邦的官方语言之一,经常被用于报纸或博客等来源。然而,手动总结冗长的马拉地语段落或文本以方便理解可能具有挑战性。为了解决这个问题,文本摘要对于使大型文档易于阅读和理解变得至关重要。本文主要研究人工智能的一个分支——自然语言处理(NLP)方法在单文档文本摘要中的应用。采用自动文本摘要,以简洁的方式提取相关信息。在将由多个句子组成的文档总结为三句或四句时,信息提取特别有用。虽然对英语文本摘要进行了广泛的研究,但马拉地语文档摘要领域仍未得到广泛的探索。本研究论文探索了专门针对马拉地语文档的提取文本摘要技术,利用LexRank算法和Genism(一种基于图的技术)在字数限制约束下生成信息丰富的摘要。在IndicNLP马拉地语新闻文章数据集上进行了实验,使用基于频率的方法获得了78%的精度、72%的召回率和75%的F-measure,使用Lex Rank算法获得了78%的精度、78%的召回率和78%的F-measure。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Marathi Text Summarization using Extractive Technique
Multilingualism has played a key role in India, where people speak and understand more than one language. Marathi, as one of the official languages inMaharashtra state, is often used in sources such as newspapers or blogs. However, manually summarizing bulky Marathi paragraphs or texts for easy comprehension can be challenging. To address this, text summarization becomes essential to make large documents easily readable and understandable. This research article focuses on single document text summarization using the Natural Language Processing (NLP) approach, a subfield of Artificial Intelligence. Automatic text summarization is employed to extract relevant information in a concise manner. Information Extraction is particularly useful when summarizing documents consisting of multiple sentences into three or four sentences. While extensive research has been conducted on English Text Summarization, the field of Marathi document summarization remains largely unexplored. This research paper explores extractive text summarization techniques specifically for Marathi documents, utilizing the LexRank algorithm along with Genism, a graph-based technique, to generate informative summaries within word limit constraints. The experiment was conducted on the IndicNLP Marathi news article dataset, resulting in 78% precision, 72% recall, and 75% F-measure using the frequency-based method, and 78% precision, 78% recall, and 78% F-measure using the Lex Rank algorithm.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Car Door Sound Quality Assessment - A Review for NVH Performance Research Airport Runway Crack Detection to Classify and Densify Surface Crack Type Computer-Aided Diagnosis System for Automated Detection of Mri Brain Tumors Smart Artificial Intelligence System for Heart Disease Prediction A Comprehensive Study on Failure Modes and Mechanisms of Thin Film Chip Resistors
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1