从HTML产品目录中提取信息:从源代码和图像到RDF

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05) Pub Date : 2005-09-19 DOI:10.1109/WI.2005.78

M. Labský, V. Svátek, Ondrej Sváb-Zamazal, P. Praks, M. Krátký, V. Snás̃el

{"title":"从HTML产品目录中提取信息:从源代码和图像到RDF","authors":"M. Labský, V. Svátek, Ondrej Sváb-Zamazal, P. Praks, M. Krátký, V. Snás̃el","doi":"10.1109/WI.2005.78","DOIUrl":null,"url":null,"abstract":"We describe an application of information extraction from company Web sites focusing on product offers. A statistical approach to text analysis is used in conjunction with different ways of image classification. Ontological knowledge is used to group the extracted items into structured objects. The results are stored in an RDF repository and made available for structured search.","PeriodicalId":213856,"journal":{"name":"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"Information extraction from HTML product catalogues: from source code and images to RDF\",\"authors\":\"M. Labský, V. Svátek, Ondrej Sváb-Zamazal, P. Praks, M. Krátký, V. Snás̃el\",\"doi\":\"10.1109/WI.2005.78\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We describe an application of information extraction from company Web sites focusing on product offers. A statistical approach to text analysis is used in conjunction with different ways of image classification. Ontological knowledge is used to group the extracted items into structured objects. The results are stored in an RDF repository and made available for structured search.\",\"PeriodicalId\":213856,\"journal\":{\"name\":\"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WI.2005.78\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI.2005.78","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 29

摘要

我们描述了一个从公司网站中提取信息的应用程序，重点关注产品报价。文本分析的统计方法与不同的图像分类方法相结合。本体知识用于将提取的项目分组为结构化对象。结果存储在RDF存储库中，可用于结构化搜索。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Information extraction from HTML product catalogues: from source code and images to RDF

We describe an application of information extraction from company Web sites focusing on product offers. A statistical approach to text analysis is used in conjunction with different ways of image classification. Ontological knowledge is used to group the extracted items into structured objects. The results are stored in an RDF repository and made available for structured search.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)

自引率

0.00%

发文量

期刊最新文献

Guidance performance indicator - Web metrics for information driven Web sites Categorical term descriptor: a proposed term weighting scheme for feature selection Binary prediction based on weighted sequential mining method Compatibility analysis of Web services Architecture for automated annotation and ontology based querying of semantic Web resources