基于Schema.org集合的网页分类

2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN) Pub Date : 2012-11-01 DOI:10.1109/CASoN.2012.6412428

Jonas Krutil, M. Kudelka, V. Snás̃el

{"title":"基于Schema.org集合的网页分类","authors":"Jonas Krutil, M. Kudelka, V. Snás̃el","doi":"10.1109/CASoN.2012.6412428","DOIUrl":null,"url":null,"abstract":"The internet is a library of a huge amount of information and there is a need for categorize its content based on web page classification. Classification of web page content can improve the quality of web search and its accuracy. Unfortunately the high dimensionality of the web pages dataset has made the process of classification difficult. The use of an automatic method for web page classification can simplify the whole process and assist the search engine in getting more relevant results. Nowadays information on the web is generally structured and formatted in a not formal way. This absence of semantics leads to create formal methods to provide more semantics information into web page. Search engines including Bing, Google, Yahoo! and Yandex formed collection of schemas Schema.org to support web page semantics and improve their search results. This paper explores the use of formal source code structure for classifying a large collection of the web content. Is focused on use of schemas collection Schema.org to classify web pages and categorize them unambiguously.","PeriodicalId":431370,"journal":{"name":"2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Web page classification based on Schema.org collection\",\"authors\":\"Jonas Krutil, M. Kudelka, V. Snás̃el\",\"doi\":\"10.1109/CASoN.2012.6412428\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The internet is a library of a huge amount of information and there is a need for categorize its content based on web page classification. Classification of web page content can improve the quality of web search and its accuracy. Unfortunately the high dimensionality of the web pages dataset has made the process of classification difficult. The use of an automatic method for web page classification can simplify the whole process and assist the search engine in getting more relevant results. Nowadays information on the web is generally structured and formatted in a not formal way. This absence of semantics leads to create formal methods to provide more semantics information into web page. Search engines including Bing, Google, Yahoo! and Yandex formed collection of schemas Schema.org to support web page semantics and improve their search results. This paper explores the use of formal source code structure for classifying a large collection of the web content. Is focused on use of schemas collection Schema.org to classify web pages and categorize them unambiguously.\",\"PeriodicalId\":431370,\"journal\":{\"name\":\"2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CASoN.2012.6412428\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CASoN.2012.6412428","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

互联网是一个海量信息的图书馆，需要基于网页分类对其内容进行分类。对网页内容进行分类可以提高网页搜索的质量和准确性。不幸的是，网页数据集的高维使分类过程变得困难。采用网页自动分类的方法可以简化整个分类过程，帮助搜索引擎获得更相关的结果。如今，网络上的信息通常以一种不正式的方式结构化和格式化。语义的缺失导致创建形式化方法来向网页提供更多的语义信息。搜索引擎包括Bing, Google, Yahoo!和Yandex形成了Schema.org模式集合，以支持网页语义和改进他们的搜索结果。本文探讨了使用形式化源代码结构对大量web内容进行分类。重点是使用模式集合Schema.org对网页进行分类，并对它们进行明确的分类。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Web page classification based on Schema.org collection

The internet is a library of a huge amount of information and there is a need for categorize its content based on web page classification. Classification of web page content can improve the quality of web search and its accuracy. Unfortunately the high dimensionality of the web pages dataset has made the process of classification difficult. The use of an automatic method for web page classification can simplify the whole process and assist the search engine in getting more relevant results. Nowadays information on the web is generally structured and formatted in a not formal way. This absence of semantics leads to create formal methods to provide more semantics information into web page. Search engines including Bing, Google, Yahoo! and Yandex formed collection of schemas Schema.org to support web page semantics and improve their search results. This paper explores the use of formal source code structure for classifying a large collection of the web content. Is focused on use of schemas collection Schema.org to classify web pages and categorize them unambiguously.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN)

自引率

0.00%

发文量

期刊最新文献

Boosting Optimum-Path Forest clustering through harmony Search and its applications for intrusion detection in computer networks Graph-based cross-validated committees ensembles Automatic sentiment analysis of Twitter messages Identifying focal patterns in social networks Ontology-based Negotiation of security requirements in cloud