Lena Baum, Marco Johns, Armin Müller, Hammam Abu Attieh, Fabian Prasser
{"title":"HERALD:用于纵向健康数据分析的特定领域查询语言。","authors":"Lena Baum, Marco Johns, Armin Müller, Hammam Abu Attieh, Fabian Prasser","doi":"10.1016/j.ijmedinf.2024.105646","DOIUrl":null,"url":null,"abstract":"<div><h3>Background</h3><div>Large-scale health data has significant potential for research and innovation, especially with longitudinal data offering insights into prevention, disease progression, and treatment effects. Yet, analyzing this data type is complex, as data points are repeatedly documented along the timeline. As a consequence, extracting cross-sectional tabular data suitable for statistical analysis and machine learning can be challenging for medical researchers and data scientists alike, with existing tools lacking balance between ease-of-use and comprehensiveness.</div></div><div><h3>Objective</h3><div>This paper introduces HERALD, a novel domain-specific query language designed to support the transformation of longitudinal health data into cross-sectional tables. We describe the basic concepts, the query syntax, a graphical user interface for constructing and executing HERALD queries, as well as an integration into Informatics for Integrating Biology and the Bedside (i2b2).</div></div><div><h3>Methods</h3><div>The syntax of HERALD mimics natural language and supports different query types for selection, aggregation, analysis of relationships, and searching for data points based on filter expressions and temporal constraints. Using a hierarchical concept model, queries are executed individually for the data of each patient, while constructing tabular output. HERALD is closed, meaning that queries process data points and generate data points. Queries can refer to data points that have been produced by previous queries, providing a simple, but powerful nesting mechanism.</div></div><div><h3>Results</h3><div>The open-source implementation consists of a HERALD query parser, an execution engine, as well as a web-based user interface for query construction and statistical analysis. The implementation can be deployed as a standalone component and integrated into self-service data analytics environments like i2b2 as a plugin. HERALD can be valuable tool for data scientists and machine learning experts, as it simplifies the process of transforming longitudinal health data into tables and data matrices.</div></div><div><h3>Conclusion</h3><div>The construction of cross-sectional tables from longitudinal data can be supported through dedicated query languages that strike a reasonable balance between language complexity and transformation capabilities.</div></div>","PeriodicalId":54950,"journal":{"name":"International Journal of Medical Informatics","volume":null,"pages":null},"PeriodicalIF":3.7000,"publicationDate":"2024-10-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HERALD: A domain-specific query language for longitudinal health data analytics\",\"authors\":\"Lena Baum, Marco Johns, Armin Müller, Hammam Abu Attieh, Fabian Prasser\",\"doi\":\"10.1016/j.ijmedinf.2024.105646\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><h3>Background</h3><div>Large-scale health data has significant potential for research and innovation, especially with longitudinal data offering insights into prevention, disease progression, and treatment effects. Yet, analyzing this data type is complex, as data points are repeatedly documented along the timeline. As a consequence, extracting cross-sectional tabular data suitable for statistical analysis and machine learning can be challenging for medical researchers and data scientists alike, with existing tools lacking balance between ease-of-use and comprehensiveness.</div></div><div><h3>Objective</h3><div>This paper introduces HERALD, a novel domain-specific query language designed to support the transformation of longitudinal health data into cross-sectional tables. We describe the basic concepts, the query syntax, a graphical user interface for constructing and executing HERALD queries, as well as an integration into Informatics for Integrating Biology and the Bedside (i2b2).</div></div><div><h3>Methods</h3><div>The syntax of HERALD mimics natural language and supports different query types for selection, aggregation, analysis of relationships, and searching for data points based on filter expressions and temporal constraints. Using a hierarchical concept model, queries are executed individually for the data of each patient, while constructing tabular output. HERALD is closed, meaning that queries process data points and generate data points. Queries can refer to data points that have been produced by previous queries, providing a simple, but powerful nesting mechanism.</div></div><div><h3>Results</h3><div>The open-source implementation consists of a HERALD query parser, an execution engine, as well as a web-based user interface for query construction and statistical analysis. The implementation can be deployed as a standalone component and integrated into self-service data analytics environments like i2b2 as a plugin. HERALD can be valuable tool for data scientists and machine learning experts, as it simplifies the process of transforming longitudinal health data into tables and data matrices.</div></div><div><h3>Conclusion</h3><div>The construction of cross-sectional tables from longitudinal data can be supported through dedicated query languages that strike a reasonable balance between language complexity and transformation capabilities.</div></div>\",\"PeriodicalId\":54950,\"journal\":{\"name\":\"International Journal of Medical Informatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-10-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Medical Informatics\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1386505624003095\",\"RegionNum\":2,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Informatics","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1386505624003095","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
HERALD: A domain-specific query language for longitudinal health data analytics
Background
Large-scale health data has significant potential for research and innovation, especially with longitudinal data offering insights into prevention, disease progression, and treatment effects. Yet, analyzing this data type is complex, as data points are repeatedly documented along the timeline. As a consequence, extracting cross-sectional tabular data suitable for statistical analysis and machine learning can be challenging for medical researchers and data scientists alike, with existing tools lacking balance between ease-of-use and comprehensiveness.
Objective
This paper introduces HERALD, a novel domain-specific query language designed to support the transformation of longitudinal health data into cross-sectional tables. We describe the basic concepts, the query syntax, a graphical user interface for constructing and executing HERALD queries, as well as an integration into Informatics for Integrating Biology and the Bedside (i2b2).
Methods
The syntax of HERALD mimics natural language and supports different query types for selection, aggregation, analysis of relationships, and searching for data points based on filter expressions and temporal constraints. Using a hierarchical concept model, queries are executed individually for the data of each patient, while constructing tabular output. HERALD is closed, meaning that queries process data points and generate data points. Queries can refer to data points that have been produced by previous queries, providing a simple, but powerful nesting mechanism.
Results
The open-source implementation consists of a HERALD query parser, an execution engine, as well as a web-based user interface for query construction and statistical analysis. The implementation can be deployed as a standalone component and integrated into self-service data analytics environments like i2b2 as a plugin. HERALD can be valuable tool for data scientists and machine learning experts, as it simplifies the process of transforming longitudinal health data into tables and data matrices.
Conclusion
The construction of cross-sectional tables from longitudinal data can be supported through dedicated query languages that strike a reasonable balance between language complexity and transformation capabilities.
期刊介绍:
International Journal of Medical Informatics provides an international medium for dissemination of original results and interpretative reviews concerning the field of medical informatics. The Journal emphasizes the evaluation of systems in healthcare settings.
The scope of journal covers:
Information systems, including national or international registration systems, hospital information systems, departmental and/or physician''s office systems, document handling systems, electronic medical record systems, standardization, systems integration etc.;
Computer-aided medical decision support systems using heuristic, algorithmic and/or statistical methods as exemplified in decision theory, protocol development, artificial intelligence, etc.
Educational computer based programs pertaining to medical informatics or medicine in general;
Organizational, economic, social, clinical impact, ethical and cost-benefit aspects of IT applications in health care.