基于句子排序的卡纳达语分类文本文档摘要

2012 12th International Conference on Intelligent Systems Design and Applications (ISDA) Pub Date : 2012-11-01 DOI:10.1109/ISDA.2012.6416635

R. Jayashree, K. S. Murthy, B. Anami

{"title":"基于句子排序的卡纳达语分类文本文档摘要","authors":"R. Jayashree, K. S. Murthy, B. Anami","doi":"10.1109/ISDA.2012.6416635","DOIUrl":null,"url":null,"abstract":"The growth of internet has given rise to the need for better Information Retrieval (IR) techniques which help in obtaining relevant information at a faster rate. Text Summarization is one such technique which aims at producing a quick and concise summary of the Text. Of late, Key word based summary has drawn wide attention of researchers in Natural Language Processing community. The algorithm we have developed extracts key words from Kannada text documents, for which we combine GSS (Galavotti, Sebastiani, Simi)[13] coefficients and IDF(Inverse Document Frequency) methods along with TF(Term Frequency) for extracting key words and later uses these for summarization. The important objective our work is to assign a weight to each word in a sentence, the weight of a sentence is the sum of weights of all words, based on the scoring of sentences; we choose top `m' sentences. A document from a given category is selected from our database custom built for this purpose. The files are obtained from Kannada Webdunia. Kannada Webdunia is a Kannada Portal which offers Political News, Cinema News, Sports news, Shopping and Jokes. Depending on the number of sentences given by the user, a summary is generated. Finally we make comparison of machine generated summary with that of human summary. Yet another objective of this work is to perform feature extraction through removal of stop words. For removing stop words we have presented a novel technique which finds structurally similar words in a document.","PeriodicalId":370150,"journal":{"name":"2012 12th International Conference on Intelligent Systems Design and Applications (ISDA)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Categorized Text Document Summarization in the Kannada Language by sentence ranking\",\"authors\":\"R. Jayashree, K. S. Murthy, B. Anami\",\"doi\":\"10.1109/ISDA.2012.6416635\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The growth of internet has given rise to the need for better Information Retrieval (IR) techniques which help in obtaining relevant information at a faster rate. Text Summarization is one such technique which aims at producing a quick and concise summary of the Text. Of late, Key word based summary has drawn wide attention of researchers in Natural Language Processing community. The algorithm we have developed extracts key words from Kannada text documents, for which we combine GSS (Galavotti, Sebastiani, Simi)[13] coefficients and IDF(Inverse Document Frequency) methods along with TF(Term Frequency) for extracting key words and later uses these for summarization. The important objective our work is to assign a weight to each word in a sentence, the weight of a sentence is the sum of weights of all words, based on the scoring of sentences; we choose top `m' sentences. A document from a given category is selected from our database custom built for this purpose. The files are obtained from Kannada Webdunia. Kannada Webdunia is a Kannada Portal which offers Political News, Cinema News, Sports news, Shopping and Jokes. Depending on the number of sentences given by the user, a summary is generated. Finally we make comparison of machine generated summary with that of human summary. Yet another objective of this work is to perform feature extraction through removal of stop words. For removing stop words we have presented a novel technique which finds structurally similar words in a document.\",\"PeriodicalId\":370150,\"journal\":{\"name\":\"2012 12th International Conference on Intelligent Systems Design and Applications (ISDA)\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 12th International Conference on Intelligent Systems Design and Applications (ISDA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISDA.2012.6416635\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 12th International Conference on Intelligent Systems Design and Applications (ISDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISDA.2012.6416635","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

摘要

随着互联网的发展，人们需要更好的信息检索(IR)技术，以更快的速度获取相关信息。文本摘要就是这样一种技术，其目的是产生一个快速和简洁的文本摘要。近年来，基于关键词的摘要受到了自然语言处理界的广泛关注。我们开发的算法从卡纳达语文本文档中提取关键词，为此我们结合了GSS (Galavotti, Sebastiani, Simi)[13]系数和IDF(逆文档频率)方法以及TF(Term Frequency)方法来提取关键词，然后使用这些方法进行摘要。我们工作的重要目标是为句子中的每个单词分配一个权重，句子的权重是所有单词的权重之和，基于句子的评分;我们选择最上面的“m”句。从为此目的定制的数据库中选择给定类别中的文档。这些文件来自Kannada Webdunia。Kannada Webdunia是一个提供政治新闻，电影新闻，体育新闻，购物和笑话的Kannada门户网站。根据用户给出的句子数量，生成摘要。最后对机器生成的摘要与人工生成的摘要进行了比较。这项工作的另一个目标是通过去除停止词来进行特征提取。对于停止词的去除，我们提出了一种新的方法，即在文档中找到结构相似的词。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Categorized Text Document Summarization in the Kannada Language by sentence ranking

The growth of internet has given rise to the need for better Information Retrieval (IR) techniques which help in obtaining relevant information at a faster rate. Text Summarization is one such technique which aims at producing a quick and concise summary of the Text. Of late, Key word based summary has drawn wide attention of researchers in Natural Language Processing community. The algorithm we have developed extracts key words from Kannada text documents, for which we combine GSS (Galavotti, Sebastiani, Simi)[13] coefficients and IDF(Inverse Document Frequency) methods along with TF(Term Frequency) for extracting key words and later uses these for summarization. The important objective our work is to assign a weight to each word in a sentence, the weight of a sentence is the sum of weights of all words, based on the scoring of sentences; we choose top `m' sentences. A document from a given category is selected from our database custom built for this purpose. The files are obtained from Kannada Webdunia. Kannada Webdunia is a Kannada Portal which offers Political News, Cinema News, Sports news, Shopping and Jokes. Depending on the number of sentences given by the user, a summary is generated. Finally we make comparison of machine generated summary with that of human summary. Yet another objective of this work is to perform feature extraction through removal of stop words. For removing stop words we have presented a novel technique which finds structurally similar words in a document.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 12th International Conference on Intelligent Systems Design and Applications (ISDA)

自引率

0.00%

发文量