M. Labský, V. Svátek, Ondrej Sváb-Zamazal, P. Praks, M. Krátký, V. Snás̃el
{"title":"从HTML产品目录中提取信息:从源代码和图像到RDF","authors":"M. Labský, V. Svátek, Ondrej Sváb-Zamazal, P. Praks, M. Krátký, V. Snás̃el","doi":"10.1109/WI.2005.78","DOIUrl":null,"url":null,"abstract":"We describe an application of information extraction from company Web sites focusing on product offers. A statistical approach to text analysis is used in conjunction with different ways of image classification. Ontological knowledge is used to group the extracted items into structured objects. The results are stored in an RDF repository and made available for structured search.","PeriodicalId":213856,"journal":{"name":"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":"{\"title\":\"Information extraction from HTML product catalogues: from source code and images to RDF\",\"authors\":\"M. Labský, V. Svátek, Ondrej Sváb-Zamazal, P. Praks, M. Krátký, V. Snás̃el\",\"doi\":\"10.1109/WI.2005.78\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We describe an application of information extraction from company Web sites focusing on product offers. A statistical approach to text analysis is used in conjunction with different ways of image classification. Ontological knowledge is used to group the extracted items into structured objects. The results are stored in an RDF repository and made available for structured search.\",\"PeriodicalId\":213856,\"journal\":{\"name\":\"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"29\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WI.2005.78\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI.2005.78","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Information extraction from HTML product catalogues: from source code and images to RDF
We describe an application of information extraction from company Web sites focusing on product offers. A statistical approach to text analysis is used in conjunction with different ways of image classification. Ontological knowledge is used to group the extracted items into structured objects. The results are stored in an RDF repository and made available for structured search.