基于Schema.org集合的网页分类

Jonas Krutil, M. Kudelka, V. Snás̃el
{"title":"基于Schema.org集合的网页分类","authors":"Jonas Krutil, M. Kudelka, V. Snás̃el","doi":"10.1109/CASoN.2012.6412428","DOIUrl":null,"url":null,"abstract":"The internet is a library of a huge amount of information and there is a need for categorize its content based on web page classification. Classification of web page content can improve the quality of web search and its accuracy. Unfortunately the high dimensionality of the web pages dataset has made the process of classification difficult. The use of an automatic method for web page classification can simplify the whole process and assist the search engine in getting more relevant results. Nowadays information on the web is generally structured and formatted in a not formal way. This absence of semantics leads to create formal methods to provide more semantics information into web page. Search engines including Bing, Google, Yahoo! and Yandex formed collection of schemas Schema.org to support web page semantics and improve their search results. This paper explores the use of formal source code structure for classifying a large collection of the web content. Is focused on use of schemas collection Schema.org to classify web pages and categorize them unambiguously.","PeriodicalId":431370,"journal":{"name":"2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Web page classification based on Schema.org collection\",\"authors\":\"Jonas Krutil, M. Kudelka, V. Snás̃el\",\"doi\":\"10.1109/CASoN.2012.6412428\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The internet is a library of a huge amount of information and there is a need for categorize its content based on web page classification. Classification of web page content can improve the quality of web search and its accuracy. Unfortunately the high dimensionality of the web pages dataset has made the process of classification difficult. The use of an automatic method for web page classification can simplify the whole process and assist the search engine in getting more relevant results. Nowadays information on the web is generally structured and formatted in a not formal way. This absence of semantics leads to create formal methods to provide more semantics information into web page. Search engines including Bing, Google, Yahoo! and Yandex formed collection of schemas Schema.org to support web page semantics and improve their search results. This paper explores the use of formal source code structure for classifying a large collection of the web content. Is focused on use of schemas collection Schema.org to classify web pages and categorize them unambiguously.\",\"PeriodicalId\":431370,\"journal\":{\"name\":\"2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CASoN.2012.6412428\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 Fourth International Conference on Computational Aspects of Social Networks (CASoN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CASoN.2012.6412428","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

摘要

互联网是一个海量信息的图书馆,需要基于网页分类对其内容进行分类。对网页内容进行分类可以提高网页搜索的质量和准确性。不幸的是,网页数据集的高维使分类过程变得困难。采用网页自动分类的方法可以简化整个分类过程,帮助搜索引擎获得更相关的结果。如今,网络上的信息通常以一种不正式的方式结构化和格式化。语义的缺失导致创建形式化方法来向网页提供更多的语义信息。搜索引擎包括Bing, Google, Yahoo!和Yandex形成了Schema.org模式集合,以支持网页语义和改进他们的搜索结果。本文探讨了使用形式化源代码结构对大量web内容进行分类。重点是使用模式集合Schema.org对网页进行分类,并对它们进行明确的分类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Web page classification based on Schema.org collection
The internet is a library of a huge amount of information and there is a need for categorize its content based on web page classification. Classification of web page content can improve the quality of web search and its accuracy. Unfortunately the high dimensionality of the web pages dataset has made the process of classification difficult. The use of an automatic method for web page classification can simplify the whole process and assist the search engine in getting more relevant results. Nowadays information on the web is generally structured and formatted in a not formal way. This absence of semantics leads to create formal methods to provide more semantics information into web page. Search engines including Bing, Google, Yahoo! and Yandex formed collection of schemas Schema.org to support web page semantics and improve their search results. This paper explores the use of formal source code structure for classifying a large collection of the web content. Is focused on use of schemas collection Schema.org to classify web pages and categorize them unambiguously.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Boosting Optimum-Path Forest clustering through harmony Search and its applications for intrusion detection in computer networks Graph-based cross-validated committees ensembles Automatic sentiment analysis of Twitter messages Identifying focal patterns in social networks Ontology-based Negotiation of security requirements in cloud
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1