{"title":"基于随机森林分类器的多文档摘要系统","authors":"Ansamma John, M. Wilscy","doi":"10.1109/RAICS.2013.6745442","DOIUrl":null,"url":null,"abstract":"In the recent times, the requirement for generation of multi-document summary has gained a lot of attention among the researchers due to the information explosion in the web media. Mostly, the text summarization technique uses the sentence extraction technique where the salient sentences in the multiple documents are extracted and presented as a summary. In our proposed system, we have developed a random forest classifier based multi-document summarization system that differentiates the sentences in the multiple documents as one belonging to the summary or not belonging to the summary. For this each sentence in the documents is represented by a set of feature scores. Classifier is trained using feature scores and summary information of each sentence in the document set. Feature scores of sentences of multiple documents to be summarized are given as the test document for the classifier. From the output of the classifier, sentences that belonging to the summary class, a required size summary is generated using Maximal Marginal Relevance. The experiments are conducted using the DUC 2002 dataset and its corresponding summary. Experimental results show the quality of the summary generated by this method is good in terms of relevance and novelty.","PeriodicalId":184155,"journal":{"name":"2013 IEEE Recent Advances in Intelligent Computational Systems (RAICS)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Random forest classifier based multi-document summarization system\",\"authors\":\"Ansamma John, M. Wilscy\",\"doi\":\"10.1109/RAICS.2013.6745442\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the recent times, the requirement for generation of multi-document summary has gained a lot of attention among the researchers due to the information explosion in the web media. Mostly, the text summarization technique uses the sentence extraction technique where the salient sentences in the multiple documents are extracted and presented as a summary. In our proposed system, we have developed a random forest classifier based multi-document summarization system that differentiates the sentences in the multiple documents as one belonging to the summary or not belonging to the summary. For this each sentence in the documents is represented by a set of feature scores. Classifier is trained using feature scores and summary information of each sentence in the document set. Feature scores of sentences of multiple documents to be summarized are given as the test document for the classifier. From the output of the classifier, sentences that belonging to the summary class, a required size summary is generated using Maximal Marginal Relevance. The experiments are conducted using the DUC 2002 dataset and its corresponding summary. Experimental results show the quality of the summary generated by this method is good in terms of relevance and novelty.\",\"PeriodicalId\":184155,\"journal\":{\"name\":\"2013 IEEE Recent Advances in Intelligent Computational Systems (RAICS)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE Recent Advances in Intelligent Computational Systems (RAICS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/RAICS.2013.6745442\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE Recent Advances in Intelligent Computational Systems (RAICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/RAICS.2013.6745442","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Random forest classifier based multi-document summarization system
In the recent times, the requirement for generation of multi-document summary has gained a lot of attention among the researchers due to the information explosion in the web media. Mostly, the text summarization technique uses the sentence extraction technique where the salient sentences in the multiple documents are extracted and presented as a summary. In our proposed system, we have developed a random forest classifier based multi-document summarization system that differentiates the sentences in the multiple documents as one belonging to the summary or not belonging to the summary. For this each sentence in the documents is represented by a set of feature scores. Classifier is trained using feature scores and summary information of each sentence in the document set. Feature scores of sentences of multiple documents to be summarized are given as the test document for the classifier. From the output of the classifier, sentences that belonging to the summary class, a required size summary is generated using Maximal Marginal Relevance. The experiments are conducted using the DUC 2002 dataset and its corresponding summary. Experimental results show the quality of the summary generated by this method is good in terms of relevance and novelty.