Oliver Schmidts, B. Kraft, Ines Siebigteroth, Albert Zündorf
{"title":"Schema Matching with Frequent Changes on Semi-Structured Input Files: A Machine Learning Approach on Biological Product Data","authors":"Oliver Schmidts, B. Kraft, Ines Siebigteroth, Albert Zündorf","doi":"10.5220/0007723602080215","DOIUrl":null,"url":null,"abstract":"For small to medium sized enterprises matching schemas is still a time consuming manual task. Even expensive commercial solutions perform poorly, if the context is not suitable for the product. In this paper, we provide an approach based on concept name learning from known transformations to discover correspondences between two schemas. We solve schema matching as a classification task. Additionally, we provide a named entity recognition approach to analyze, how the classification task relates to named entity recognition. Benchmarking against other machine learning models shows that when choosing a good learning model, schema matching based on concept name similarity can outperform other approaches and complex algorithms in terms of precision and F1-measure. Hence, our approach is able to build the foundation for improved automation of complex data integration applications for small to medium sized enterprises.","PeriodicalId":271024,"journal":{"name":"International Conference on Enterprise Information Systems","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Enterprise Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0007723602080215","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
For small to medium sized enterprises matching schemas is still a time consuming manual task. Even expensive commercial solutions perform poorly, if the context is not suitable for the product. In this paper, we provide an approach based on concept name learning from known transformations to discover correspondences between two schemas. We solve schema matching as a classification task. Additionally, we provide a named entity recognition approach to analyze, how the classification task relates to named entity recognition. Benchmarking against other machine learning models shows that when choosing a good learning model, schema matching based on concept name similarity can outperform other approaches and complex algorithms in terms of precision and F1-measure. Hence, our approach is able to build the foundation for improved automation of complex data integration applications for small to medium sized enterprises.