{"title":"Data extraction and annotation for dynamic Web pages","authors":"Hui Song, Suraj Giri, Fanyuan Ma","doi":"10.1109/EEE.2004.1287353","DOIUrl":null,"url":null,"abstract":"Many Web sites contain large sets of pages generated dynamically using a common template. The structured data extracted from these pages with semantic annotation are valuable for information system. We proposed a system, ADeaD, to automatically extract data values from these Web pages and annotate the data schema. Experimental evaluation on a lot of real Web page collections indicates our algorithm correctly extracted data and annotated the data schema.","PeriodicalId":360167,"journal":{"name":"IEEE International Conference on e-Technology, e-Commerce and e-Service, 2004. EEE '04. 2004","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on e-Technology, e-Commerce and e-Service, 2004. EEE '04. 2004","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EEE.2004.1287353","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
Many Web sites contain large sets of pages generated dynamically using a common template. The structured data extracted from these pages with semantic annotation are valuable for information system. We proposed a system, ADeaD, to automatically extract data values from these Web pages and annotate the data schema. Experimental evaluation on a lot of real Web page collections indicates our algorithm correctly extracted data and annotated the data schema.