Categorized Text Document Summarization in the Kannada Language by sentence ranking

R. Jayashree, K. S. Murthy, B. Anami
{"title":"Categorized Text Document Summarization in the Kannada Language by sentence ranking","authors":"R. Jayashree, K. S. Murthy, B. Anami","doi":"10.1109/ISDA.2012.6416635","DOIUrl":null,"url":null,"abstract":"The growth of internet has given rise to the need for better Information Retrieval (IR) techniques which help in obtaining relevant information at a faster rate. Text Summarization is one such technique which aims at producing a quick and concise summary of the Text. Of late, Key word based summary has drawn wide attention of researchers in Natural Language Processing community. The algorithm we have developed extracts key words from Kannada text documents, for which we combine GSS (Galavotti, Sebastiani, Simi)[13] coefficients and IDF(Inverse Document Frequency) methods along with TF(Term Frequency) for extracting key words and later uses these for summarization. The important objective our work is to assign a weight to each word in a sentence, the weight of a sentence is the sum of weights of all words, based on the scoring of sentences; we choose top `m' sentences. A document from a given category is selected from our database custom built for this purpose. The files are obtained from Kannada Webdunia. Kannada Webdunia is a Kannada Portal which offers Political News, Cinema News, Sports news, Shopping and Jokes. Depending on the number of sentences given by the user, a summary is generated. Finally we make comparison of machine generated summary with that of human summary. Yet another objective of this work is to perform feature extraction through removal of stop words. For removing stop words we have presented a novel technique which finds structurally similar words in a document.","PeriodicalId":370150,"journal":{"name":"2012 12th International Conference on Intelligent Systems Design and Applications (ISDA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 12th International Conference on Intelligent Systems Design and Applications (ISDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISDA.2012.6416635","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 23

Abstract

The growth of internet has given rise to the need for better Information Retrieval (IR) techniques which help in obtaining relevant information at a faster rate. Text Summarization is one such technique which aims at producing a quick and concise summary of the Text. Of late, Key word based summary has drawn wide attention of researchers in Natural Language Processing community. The algorithm we have developed extracts key words from Kannada text documents, for which we combine GSS (Galavotti, Sebastiani, Simi)[13] coefficients and IDF(Inverse Document Frequency) methods along with TF(Term Frequency) for extracting key words and later uses these for summarization. The important objective our work is to assign a weight to each word in a sentence, the weight of a sentence is the sum of weights of all words, based on the scoring of sentences; we choose top `m' sentences. A document from a given category is selected from our database custom built for this purpose. The files are obtained from Kannada Webdunia. Kannada Webdunia is a Kannada Portal which offers Political News, Cinema News, Sports news, Shopping and Jokes. Depending on the number of sentences given by the user, a summary is generated. Finally we make comparison of machine generated summary with that of human summary. Yet another objective of this work is to perform feature extraction through removal of stop words. For removing stop words we have presented a novel technique which finds structurally similar words in a document.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于句子排序的卡纳达语分类文本文档摘要
随着互联网的发展,人们需要更好的信息检索(IR)技术,以更快的速度获取相关信息。文本摘要就是这样一种技术,其目的是产生一个快速和简洁的文本摘要。近年来,基于关键词的摘要受到了自然语言处理界的广泛关注。我们开发的算法从卡纳达语文本文档中提取关键词,为此我们结合了GSS (Galavotti, Sebastiani, Simi)[13]系数和IDF(逆文档频率)方法以及TF(Term Frequency)方法来提取关键词,然后使用这些方法进行摘要。我们工作的重要目标是为句子中的每个单词分配一个权重,句子的权重是所有单词的权重之和,基于句子的评分;我们选择最上面的“m”句。从为此目的定制的数据库中选择给定类别中的文档。这些文件来自Kannada Webdunia。Kannada Webdunia是一个提供政治新闻,电影新闻,体育新闻,购物和笑话的Kannada门户网站。根据用户给出的句子数量,生成摘要。最后对机器生成的摘要与人工生成的摘要进行了比较。这项工作的另一个目标是通过去除停止词来进行特征提取。对于停止词的去除,我们提出了一种新的方法,即在文档中找到结构相似的词。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Prediction of risk score for heart disease using associative classification and hybrid feature subset selection WSDL-TC: Collaborative customization of web services Knowledge representation and reasoning based on generalised fuzzy Petri nets Interval-valued fuzzy graph representation of concept lattice Community optimization: Function optimization by a simulated web community
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1