{"title":"基于web查询接口的模式匹配鲁棒方法","authors":"Jin Pei, Jun Hong, D. Bell","doi":"10.1109/ICDEW.2006.18","DOIUrl":null,"url":null,"abstract":"In recent years, more and more data sources are available on the Web, where data can be accessed only via web interfaces. To make these data sources sharable, holistic schema matching which benefits from a large number of schemas has attracted much attention. However, current holistic approaches still suffer from the following limitations: a). They are sensitive to noisy data; b). While different types of information about attributes (e.g. attribute names, text descriptions, domain values) have been used, it remains an issue to effectively merge these types of information. In this paper, we propose a re-sampling method for handling noisy data across multiple web interfaces in the same domain. Furthermore, we propose a novel method to utilize different types of information about attributes, which is robust for resolving homonyms. Our experimental results show that our proposed methods are highly effective.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"A Robust Approach to Schema Matching overWeb Query Interfaces\",\"authors\":\"Jin Pei, Jun Hong, D. Bell\",\"doi\":\"10.1109/ICDEW.2006.18\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, more and more data sources are available on the Web, where data can be accessed only via web interfaces. To make these data sources sharable, holistic schema matching which benefits from a large number of schemas has attracted much attention. However, current holistic approaches still suffer from the following limitations: a). They are sensitive to noisy data; b). While different types of information about attributes (e.g. attribute names, text descriptions, domain values) have been used, it remains an issue to effectively merge these types of information. In this paper, we propose a re-sampling method for handling noisy data across multiple web interfaces in the same domain. Furthermore, we propose a novel method to utilize different types of information about attributes, which is robust for resolving homonyms. Our experimental results show that our proposed methods are highly effective.\",\"PeriodicalId\":331953,\"journal\":{\"name\":\"22nd International Conference on Data Engineering Workshops (ICDEW'06)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"22nd International Conference on Data Engineering Workshops (ICDEW'06)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDEW.2006.18\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDEW.2006.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Robust Approach to Schema Matching overWeb Query Interfaces
In recent years, more and more data sources are available on the Web, where data can be accessed only via web interfaces. To make these data sources sharable, holistic schema matching which benefits from a large number of schemas has attracted much attention. However, current holistic approaches still suffer from the following limitations: a). They are sensitive to noisy data; b). While different types of information about attributes (e.g. attribute names, text descriptions, domain values) have been used, it remains an issue to effectively merge these types of information. In this paper, we propose a re-sampling method for handling noisy data across multiple web interfaces in the same domain. Furthermore, we propose a novel method to utilize different types of information about attributes, which is robust for resolving homonyms. Our experimental results show that our proposed methods are highly effective.