{"title":"自动生成用于Web包装器维护的标记示例","authors":"J. Raposo, A. Pan, M. Álvarez, Justo Hidalgo","doi":"10.1109/WI.2005.40","DOIUrl":null,"url":null,"abstract":"In order to let software programs gain full benefit from semi-structured Web sources, wrapper programs must be built to provide a \"machine readable\" view over them. A significant problem of this approach is that, since Web sources are autonomous, they may experience changes that invalidate the current wrapper. In this paper, we address this problem by introducing novel heuristics and algorithms for automatically maintaining wrappers. In our approach, the system collects some query results during normal wrapper operation and, when the source changes, it uses them as input to generate a set of labeled examples for the source which can then be used to induce a new wrapper. Our experiments show that the proposed techniques show high accuracy for a wide range of real world Web data extraction problems.","PeriodicalId":213856,"journal":{"name":"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)","volume":"08 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Automatically generating labeled examples for Web wrapper maintenance\",\"authors\":\"J. Raposo, A. Pan, M. Álvarez, Justo Hidalgo\",\"doi\":\"10.1109/WI.2005.40\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In order to let software programs gain full benefit from semi-structured Web sources, wrapper programs must be built to provide a \\\"machine readable\\\" view over them. A significant problem of this approach is that, since Web sources are autonomous, they may experience changes that invalidate the current wrapper. In this paper, we address this problem by introducing novel heuristics and algorithms for automatically maintaining wrappers. In our approach, the system collects some query results during normal wrapper operation and, when the source changes, it uses them as input to generate a set of labeled examples for the source which can then be used to induce a new wrapper. Our experiments show that the proposed techniques show high accuracy for a wide range of real world Web data extraction problems.\",\"PeriodicalId\":213856,\"journal\":{\"name\":\"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)\",\"volume\":\"08 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WI.2005.40\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WI.2005.40","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Automatically generating labeled examples for Web wrapper maintenance
In order to let software programs gain full benefit from semi-structured Web sources, wrapper programs must be built to provide a "machine readable" view over them. A significant problem of this approach is that, since Web sources are autonomous, they may experience changes that invalidate the current wrapper. In this paper, we address this problem by introducing novel heuristics and algorithms for automatically maintaining wrappers. In our approach, the system collects some query results during normal wrapper operation and, when the source changes, it uses them as input to generate a set of labeled examples for the source which can then be used to induce a new wrapper. Our experiments show that the proposed techniques show high accuracy for a wide range of real world Web data extraction problems.