{"title":"一种基于poss的网络中文新闻话题提取方法","authors":"Xujian Zhao, Peiquan Jin, Lihua Yue","doi":"10.1109/FGCNS.2008.71","DOIUrl":null,"url":null,"abstract":"News topic extraction is very important for news search engine. The traditional methods are based on pattern matching and linguistic analysis, which mainly depend on the measurement of feature similarity. But due to two reasons, those methods are basically inefficient to process Chinese news topic extraction from Internet. The first reason is the difficulty of Natural Language Processing (NLP) for Chinese, and the other is the diversity and fast update speed of Internet news. At the present, some works utilizing news special structure (e.g. title) for Chinese news topic are presented. However, two problems still remain unsolved so far, which are (1) missing of some news topic and (2) irregular topic words produced. Aiming to solve these two problems, we propose a POS-based approach to news topic extraction. We first segment words and tag POS for news title, and then eliminate segmentation errors according to POS information and position relation. After that, topic words are associated and combined into bigger ones, and different topic weights are assigned to those bigger words. We conduct an experiment on 600 Chinese news Web pages to demonstrate our new approach. The experimental results show that our approach has a higher recall and precision rate of news topic extraction and reduces irregular topic words obviously.","PeriodicalId":370780,"journal":{"name":"2008 Second International Conference on Future Generation Communication and Networking Symposia","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"A Novel POS-Based Approach to Chinese News Topic Extraction from Internet\",\"authors\":\"Xujian Zhao, Peiquan Jin, Lihua Yue\",\"doi\":\"10.1109/FGCNS.2008.71\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"News topic extraction is very important for news search engine. The traditional methods are based on pattern matching and linguistic analysis, which mainly depend on the measurement of feature similarity. But due to two reasons, those methods are basically inefficient to process Chinese news topic extraction from Internet. The first reason is the difficulty of Natural Language Processing (NLP) for Chinese, and the other is the diversity and fast update speed of Internet news. At the present, some works utilizing news special structure (e.g. title) for Chinese news topic are presented. However, two problems still remain unsolved so far, which are (1) missing of some news topic and (2) irregular topic words produced. Aiming to solve these two problems, we propose a POS-based approach to news topic extraction. We first segment words and tag POS for news title, and then eliminate segmentation errors according to POS information and position relation. After that, topic words are associated and combined into bigger ones, and different topic weights are assigned to those bigger words. We conduct an experiment on 600 Chinese news Web pages to demonstrate our new approach. The experimental results show that our approach has a higher recall and precision rate of news topic extraction and reduces irregular topic words obviously.\",\"PeriodicalId\":370780,\"journal\":{\"name\":\"2008 Second International Conference on Future Generation Communication and Networking Symposia\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 Second International Conference on Future Generation Communication and Networking Symposia\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FGCNS.2008.71\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 Second International Conference on Future Generation Communication and Networking Symposia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FGCNS.2008.71","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Novel POS-Based Approach to Chinese News Topic Extraction from Internet
News topic extraction is very important for news search engine. The traditional methods are based on pattern matching and linguistic analysis, which mainly depend on the measurement of feature similarity. But due to two reasons, those methods are basically inefficient to process Chinese news topic extraction from Internet. The first reason is the difficulty of Natural Language Processing (NLP) for Chinese, and the other is the diversity and fast update speed of Internet news. At the present, some works utilizing news special structure (e.g. title) for Chinese news topic are presented. However, two problems still remain unsolved so far, which are (1) missing of some news topic and (2) irregular topic words produced. Aiming to solve these two problems, we propose a POS-based approach to news topic extraction. We first segment words and tag POS for news title, and then eliminate segmentation errors according to POS information and position relation. After that, topic words are associated and combined into bigger ones, and different topic weights are assigned to those bigger words. We conduct an experiment on 600 Chinese news Web pages to demonstrate our new approach. The experimental results show that our approach has a higher recall and precision rate of news topic extraction and reduces irregular topic words obviously.