{"title":"基于图论方法的马拉雅拉姆语抽取文档摘要","authors":"E. B. Ajmal, Rosna P. Haroon","doi":"10.1109/ECONF.2015.41","DOIUrl":null,"url":null,"abstract":"Text summarization is a way to condense the large amount of information into a concise form by the process of selection of important information and discarding unimportant and redundant information. The need for Text summarization has increased much due to the abundance of documents in the internet.Even though a lot of text summarization systems have been developed for summarizing documents in various languages, there is no such well performing system for Malayalam.In this paper, we propose the use of Graph theoretic approach for summarizing Malayalam documents that is motivated by the method of identification of themes. After the common preprocessing steps, namely, stop word removal and stemming, sentences in the documents are represented as nodes in an undirected graph. There is a node for every sentence. Two sentences are connected with an edge if the two sentences share some common words, or in other words, their (cosine, or such) similarity is above some threshold. This representation yields two results: The partitions contained in the graph (that is those sub-graphs that are unconnected to the other sub graphs), form distinct topics covered in the documents. The second result yielded by the graph-theoretic method is the identification of the important sentences in the document. We apply graph theoretic approach on Malayalam text summarization task and achieve comparable results to the state of the art.","PeriodicalId":268471,"journal":{"name":"2015 Fifth International Conference on e-Learning (econf)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"An Extractive Malayalam Document Summarization Based on Graph Theoretic Approach\",\"authors\":\"E. B. Ajmal, Rosna P. Haroon\",\"doi\":\"10.1109/ECONF.2015.41\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text summarization is a way to condense the large amount of information into a concise form by the process of selection of important information and discarding unimportant and redundant information. The need for Text summarization has increased much due to the abundance of documents in the internet.Even though a lot of text summarization systems have been developed for summarizing documents in various languages, there is no such well performing system for Malayalam.In this paper, we propose the use of Graph theoretic approach for summarizing Malayalam documents that is motivated by the method of identification of themes. After the common preprocessing steps, namely, stop word removal and stemming, sentences in the documents are represented as nodes in an undirected graph. There is a node for every sentence. Two sentences are connected with an edge if the two sentences share some common words, or in other words, their (cosine, or such) similarity is above some threshold. This representation yields two results: The partitions contained in the graph (that is those sub-graphs that are unconnected to the other sub graphs), form distinct topics covered in the documents. The second result yielded by the graph-theoretic method is the identification of the important sentences in the document. We apply graph theoretic approach on Malayalam text summarization task and achieve comparable results to the state of the art.\",\"PeriodicalId\":268471,\"journal\":{\"name\":\"2015 Fifth International Conference on e-Learning (econf)\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 Fifth International Conference on e-Learning (econf)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ECONF.2015.41\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Fifth International Conference on e-Learning (econf)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECONF.2015.41","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Extractive Malayalam Document Summarization Based on Graph Theoretic Approach
Text summarization is a way to condense the large amount of information into a concise form by the process of selection of important information and discarding unimportant and redundant information. The need for Text summarization has increased much due to the abundance of documents in the internet.Even though a lot of text summarization systems have been developed for summarizing documents in various languages, there is no such well performing system for Malayalam.In this paper, we propose the use of Graph theoretic approach for summarizing Malayalam documents that is motivated by the method of identification of themes. After the common preprocessing steps, namely, stop word removal and stemming, sentences in the documents are represented as nodes in an undirected graph. There is a node for every sentence. Two sentences are connected with an edge if the two sentences share some common words, or in other words, their (cosine, or such) similarity is above some threshold. This representation yields two results: The partitions contained in the graph (that is those sub-graphs that are unconnected to the other sub graphs), form distinct topics covered in the documents. The second result yielded by the graph-theoretic method is the identification of the important sentences in the document. We apply graph theoretic approach on Malayalam text summarization task and achieve comparable results to the state of the art.