{"title":"Heterogeneous data integration: Challenges and opportunities","authors":"","doi":"10.1016/j.dib.2024.110853","DOIUrl":null,"url":null,"abstract":"<div><p>Integrating multiple data source technologies is essential for organizations to respond to highly dynamic market needs. Although physical data integration systems have been considered to have better query processing systems, they pose higher implementation and maintenance costs. Meanwhile, virtual data integration has become an alternative topic that is increasingly attracting the attention of researchers in the current era of big data. Various data integration methodologies have been developed and used in various domains, processing heterogeneous data using various approaches. This review article aims to provide an overview of heterogeneous data integration research focusing on methodology and approaches. It surveys existing publications, highlighting key trends, challenges, and open research topics. The main findings are: (i) Research has been conducted in various domains. However, most focus on big data rather than specific study domains; (ii) researchers primarily focus on semantics challenges, and (iii) gaps still need to be addressed and related to integration issues involving semantics and unstructured data formats that must be thoroughly addressed. Furthermore, considering elements of cutting-edge technology, such as machine learning and data integration, about privacy concerns provides a chance for additional investigation. Finally, we provide insight into the potential for a broader review of integration challenges based on case studies.</p></div>","PeriodicalId":10973,"journal":{"name":"Data in Brief","volume":null,"pages":null},"PeriodicalIF":1.0000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2352340924008175/pdfft?md5=a014f879ccb13a1c77f251749c94b425&pid=1-s2.0-S2352340924008175-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data in Brief","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352340924008175","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Integrating multiple data source technologies is essential for organizations to respond to highly dynamic market needs. Although physical data integration systems have been considered to have better query processing systems, they pose higher implementation and maintenance costs. Meanwhile, virtual data integration has become an alternative topic that is increasingly attracting the attention of researchers in the current era of big data. Various data integration methodologies have been developed and used in various domains, processing heterogeneous data using various approaches. This review article aims to provide an overview of heterogeneous data integration research focusing on methodology and approaches. It surveys existing publications, highlighting key trends, challenges, and open research topics. The main findings are: (i) Research has been conducted in various domains. However, most focus on big data rather than specific study domains; (ii) researchers primarily focus on semantics challenges, and (iii) gaps still need to be addressed and related to integration issues involving semantics and unstructured data formats that must be thoroughly addressed. Furthermore, considering elements of cutting-edge technology, such as machine learning and data integration, about privacy concerns provides a chance for additional investigation. Finally, we provide insight into the potential for a broader review of integration challenges based on case studies.
期刊介绍:
Data in Brief provides a way for researchers to easily share and reuse each other''s datasets by publishing data articles that: -Thoroughly describe your data, facilitating reproducibility. -Make your data, which is often buried in supplementary material, easier to find. -Increase traffic towards associated research articles and data, leading to more citations. -Open up doors for new collaborations. Because you never know what data will be useful to someone else, Data in Brief welcomes submissions that describe data from all research areas.