{"title":"使用隐马尔可夫模型的网站主题分类","authors":"Lyonel Serradura, M. Slimane, N. Vincent","doi":"10.1109/ICDAR.2001.953955","DOIUrl":null,"url":null,"abstract":"There is more and more information available on the Internet. We need tools to help us extract the right piece of information. We have developed a classification algorithm tackling this issue in French. It distinguishes web pages classifying their text content into themes. We use Hidden Markov Models (HMM) to build this method named STCoL (Supervised Thematic Corpus Learning). Once themes are modeled with HMMs, STCoL is able to classify documents from different sources. This method is not only efficient but is also robust.","PeriodicalId":277816,"journal":{"name":"Proceedings of Sixth International Conference on Document Analysis and Recognition","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Web sites thematic classification using hidden Markov models\",\"authors\":\"Lyonel Serradura, M. Slimane, N. Vincent\",\"doi\":\"10.1109/ICDAR.2001.953955\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There is more and more information available on the Internet. We need tools to help us extract the right piece of information. We have developed a classification algorithm tackling this issue in French. It distinguishes web pages classifying their text content into themes. We use Hidden Markov Models (HMM) to build this method named STCoL (Supervised Thematic Corpus Learning). Once themes are modeled with HMMs, STCoL is able to classify documents from different sources. This method is not only efficient but is also robust.\",\"PeriodicalId\":277816,\"journal\":{\"name\":\"Proceedings of Sixth International Conference on Document Analysis and Recognition\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of Sixth International Conference on Document Analysis and Recognition\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDAR.2001.953955\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of Sixth International Conference on Document Analysis and Recognition","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDAR.2001.953955","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Web sites thematic classification using hidden Markov models
There is more and more information available on the Internet. We need tools to help us extract the right piece of information. We have developed a classification algorithm tackling this issue in French. It distinguishes web pages classifying their text content into themes. We use Hidden Markov Models (HMM) to build this method named STCoL (Supervised Thematic Corpus Learning). Once themes are modeled with HMMs, STCoL is able to classify documents from different sources. This method is not only efficient but is also robust.