{"title":"Web page classification based on Schema.org collection","authors":"Jonas Krutil, M. Kudelka, V. Snás̃el","doi":"10.1109/CASoN.2012.6412428","DOIUrl":null,"url":null,"abstract":"The internet is a library of a huge amount of information and there is a need for categorize its content based on web page classification. Classification of web page content can improve the quality of web search and its accuracy. Unfortunately the high dimensionality of the web pages dataset has made the process of classification difficult. The use of an automatic method for web page classification can simplify the whole process and assist the search engine in getting more relevant results. Nowadays information on the web is generally structured and formatted in a not formal way. This absence of semantics leads to create formal methods to provide more semantics information into web page. Search engines including Bing, Google, Yahoo! and Yandex formed collection of schemas Schema.org to support web page semantics and improve their search results. This paper explores the use of formal source code structure for classifying a large collection of the web content. Is focused on use of schemas collection Schema.org to classify web pages and categorize them unambiguously.","PeriodicalId":431370,"journal":{"name":"2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CASoN.2012.6412428","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
The internet is a library of a huge amount of information and there is a need for categorize its content based on web page classification. Classification of web page content can improve the quality of web search and its accuracy. Unfortunately the high dimensionality of the web pages dataset has made the process of classification difficult. The use of an automatic method for web page classification can simplify the whole process and assist the search engine in getting more relevant results. Nowadays information on the web is generally structured and formatted in a not formal way. This absence of semantics leads to create formal methods to provide more semantics information into web page. Search engines including Bing, Google, Yahoo! and Yandex formed collection of schemas Schema.org to support web page semantics and improve their search results. This paper explores the use of formal source code structure for classifying a large collection of the web content. Is focused on use of schemas collection Schema.org to classify web pages and categorize them unambiguously.