{"title":"网页多标签自适应体裁分类","authors":"Chaker Jebari","doi":"10.1109/ICMLA.2012.106","DOIUrl":null,"url":null,"abstract":"This paper proposes a new centroid-based approach to classify web pages by genre using character ngrams extracted from different information sources such as URL, title, headings and anchors. To deal with the complexity of web pages and the rapid evolution of web genres, our approach implements a multi-label and adaptive classification scheme in which web pages are classified one by one and each web page can affect more than one genre. According to the similarity between the new page and each genre centroid, our approach either adapts the genre centroid under consideration or considers the new page as noise page and discards it. The experiment results show that our approach is very fast and achieves better results than existing multi-label classifiers.","PeriodicalId":157399,"journal":{"name":"2012 11th International Conference on Machine Learning and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"A Multi-label and Adaptive Genre Classification of Web Pages\",\"authors\":\"Chaker Jebari\",\"doi\":\"10.1109/ICMLA.2012.106\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a new centroid-based approach to classify web pages by genre using character ngrams extracted from different information sources such as URL, title, headings and anchors. To deal with the complexity of web pages and the rapid evolution of web genres, our approach implements a multi-label and adaptive classification scheme in which web pages are classified one by one and each web page can affect more than one genre. According to the similarity between the new page and each genre centroid, our approach either adapts the genre centroid under consideration or considers the new page as noise page and discards it. The experiment results show that our approach is very fast and achieves better results than existing multi-label classifiers.\",\"PeriodicalId\":157399,\"journal\":{\"name\":\"2012 11th International Conference on Machine Learning and Applications\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-12-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 11th International Conference on Machine Learning and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2012.106\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 11th International Conference on Machine Learning and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2012.106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Multi-label and Adaptive Genre Classification of Web Pages
This paper proposes a new centroid-based approach to classify web pages by genre using character ngrams extracted from different information sources such as URL, title, headings and anchors. To deal with the complexity of web pages and the rapid evolution of web genres, our approach implements a multi-label and adaptive classification scheme in which web pages are classified one by one and each web page can affect more than one genre. According to the similarity between the new page and each genre centroid, our approach either adapts the genre centroid under consideration or considers the new page as noise page and discards it. The experiment results show that our approach is very fast and achieves better results than existing multi-label classifiers.