Zengmin Geng, Jujian Zhang, Xuefei Li, Jianxi Du, Zhengdong Liu
{"title":"Research on Web Document Summarization","authors":"Zengmin Geng, Jujian Zhang, Xuefei Li, Jianxi Du, Zhengdong Liu","doi":"10.1109/ITAPP.2010.5566074","DOIUrl":null,"url":null,"abstract":"Web document summarization (WDS) is becoming one of the hot subjects in the text summarization field due to the rapidly increasing number of documents on Web. WDS is different from traditional text summarization because it must process hyperlinked texts. This paper first analyses the features of Web documents, then gives a definition for WDS, and finally presents an algorithm for WDS based on sentences extraction. Each sentence's weight is a weighted sum of words' weight and its sentence-structure's weight. The former weight is adjusted by document class graph and latter weight considers both the Web formats and hyperlink attributes. The weight proportion of words and structures is learned by machine learning approach. Experiments on 2,000 Web documents show that our algorithm is feasible.","PeriodicalId":116013,"journal":{"name":"2010 International Conference on Internet Technology and Applications","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 International Conference on Internet Technology and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITAPP.2010.5566074","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Web document summarization (WDS) is becoming one of the hot subjects in the text summarization field due to the rapidly increasing number of documents on Web. WDS is different from traditional text summarization because it must process hyperlinked texts. This paper first analyses the features of Web documents, then gives a definition for WDS, and finally presents an algorithm for WDS based on sentences extraction. Each sentence's weight is a weighted sum of words' weight and its sentence-structure's weight. The former weight is adjusted by document class graph and latter weight considers both the Web formats and hyperlink attributes. The weight proportion of words and structures is learned by machine learning approach. Experiments on 2,000 Web documents show that our algorithm is feasible.