{"title":"动态Web页面的数据提取和注释","authors":"Hui Song, Suraj Giri, Fanyuan Ma","doi":"10.1109/EEE.2004.1287353","DOIUrl":null,"url":null,"abstract":"Many Web sites contain large sets of pages generated dynamically using a common template. The structured data extracted from these pages with semantic annotation are valuable for information system. We proposed a system, ADeaD, to automatically extract data values from these Web pages and annotate the data schema. Experimental evaluation on a lot of real Web page collections indicates our algorithm correctly extracted data and annotated the data schema.","PeriodicalId":360167,"journal":{"name":"IEEE International Conference on e-Technology, e-Commerce and e-Service, 2004. EEE '04. 2004","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Data extraction and annotation for dynamic Web pages\",\"authors\":\"Hui Song, Suraj Giri, Fanyuan Ma\",\"doi\":\"10.1109/EEE.2004.1287353\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many Web sites contain large sets of pages generated dynamically using a common template. The structured data extracted from these pages with semantic annotation are valuable for information system. We proposed a system, ADeaD, to automatically extract data values from these Web pages and annotate the data schema. Experimental evaluation on a lot of real Web page collections indicates our algorithm correctly extracted data and annotated the data schema.\",\"PeriodicalId\":360167,\"journal\":{\"name\":\"IEEE International Conference on e-Technology, e-Commerce and e-Service, 2004. EEE '04. 2004\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-03-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE International Conference on e-Technology, e-Commerce and e-Service, 2004. EEE '04. 2004\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/EEE.2004.1287353\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on e-Technology, e-Commerce and e-Service, 2004. EEE '04. 2004","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EEE.2004.1287353","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Data extraction and annotation for dynamic Web pages
Many Web sites contain large sets of pages generated dynamically using a common template. The structured data extracted from these pages with semantic annotation are valuable for information system. We proposed a system, ADeaD, to automatically extract data values from these Web pages and annotate the data schema. Experimental evaluation on a lot of real Web page collections indicates our algorithm correctly extracted data and annotated the data schema.