{"title":"[数据驱动的重症监护:缺乏全面的数据集]。","authors":"Jan-Hendrik B Hardenberg","doi":"10.1007/s00063-024-01141-z","DOIUrl":null,"url":null,"abstract":"<p><p>Intensive care units provide a data-rich environment with the potential to generate datasets in the realm of big data, which could be utilized to train powerful machine learning (ML) models. However, the currently available datasets are too small and exhibit too little diversity due to their limitation to individual hospitals. This lack of extensive and varied datasets is a primary reason for the limited generalizability and resulting low clinical utility of current ML models. Often, these models are based on data from single centers and suffer from poor external validity. There is an urgent need for the development of large-scale, multicentric, and multinational datasets. Ensuring data protection and minimizing re-identification risks pose central challenges in this process. The \"Amsterdam University Medical Center database (AmsterdamUMCdb)\" and the \"Salzburg Intensive Care database (SICdb)\" demonstrate that open access datasets are possible in Europe while complying with the data protection regulations of the General Data Protection Regulation (GDPR). Another challenge in building intensive care datasets is the absence of semantic definitions in the source data and the heterogeneity of data formats. Establishing binding industry standards for the semantic definition is crucial to ensure seamless semantic interoperability between datasets.</p>","PeriodicalId":49019,"journal":{"name":"Medizinische Klinik-Intensivmedizin Und Notfallmedizin","volume":" ","pages":"352-357"},"PeriodicalIF":1.3000,"publicationDate":"2024-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"[Data-driven intensive care: a lack of comprehensive datasets].\",\"authors\":\"Jan-Hendrik B Hardenberg\",\"doi\":\"10.1007/s00063-024-01141-z\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Intensive care units provide a data-rich environment with the potential to generate datasets in the realm of big data, which could be utilized to train powerful machine learning (ML) models. However, the currently available datasets are too small and exhibit too little diversity due to their limitation to individual hospitals. This lack of extensive and varied datasets is a primary reason for the limited generalizability and resulting low clinical utility of current ML models. Often, these models are based on data from single centers and suffer from poor external validity. There is an urgent need for the development of large-scale, multicentric, and multinational datasets. Ensuring data protection and minimizing re-identification risks pose central challenges in this process. The \\\"Amsterdam University Medical Center database (AmsterdamUMCdb)\\\" and the \\\"Salzburg Intensive Care database (SICdb)\\\" demonstrate that open access datasets are possible in Europe while complying with the data protection regulations of the General Data Protection Regulation (GDPR). Another challenge in building intensive care datasets is the absence of semantic definitions in the source data and the heterogeneity of data formats. Establishing binding industry standards for the semantic definition is crucial to ensure seamless semantic interoperability between datasets.</p>\",\"PeriodicalId\":49019,\"journal\":{\"name\":\"Medizinische Klinik-Intensivmedizin Und Notfallmedizin\",\"volume\":\" \",\"pages\":\"352-357\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2024-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Medizinische Klinik-Intensivmedizin Und Notfallmedizin\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1007/s00063-024-01141-z\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/4/26 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Medizinische Klinik-Intensivmedizin Und Notfallmedizin","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1007/s00063-024-01141-z","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/4/26 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
摘要
重症监护病房提供了一个数据丰富的环境,有可能产生大数据领域的数据集,可用于训练强大的机器学习(ML)模型。然而,由于局限于单个医院,目前可用的数据集规模太小,表现出的多样性太少。缺乏广泛而多样的数据集是导致当前 ML 模型通用性有限、临床实用性低的主要原因。这些模型通常基于单个中心的数据,外部有效性较差。目前迫切需要开发大规模、多中心和多国数据集。在这一过程中,确保数据保护和最大限度降低重新识别风险是核心挑战。阿姆斯特丹大学医学中心数据库(AmsterdamUMCdb)"和 "萨尔茨堡重症监护数据库(SICdb)"表明,在欧洲,开放访问数据集是可能的,同时也符合《通用数据保护条例》(GDPR)的数据保护规定。建立重症监护数据集的另一个挑战是源数据中语义定义的缺失和数据格式的不统一。为语义定义建立具有约束力的行业标准对于确保数据集之间无缝的语义互操作性至关重要。
[Data-driven intensive care: a lack of comprehensive datasets].
Intensive care units provide a data-rich environment with the potential to generate datasets in the realm of big data, which could be utilized to train powerful machine learning (ML) models. However, the currently available datasets are too small and exhibit too little diversity due to their limitation to individual hospitals. This lack of extensive and varied datasets is a primary reason for the limited generalizability and resulting low clinical utility of current ML models. Often, these models are based on data from single centers and suffer from poor external validity. There is an urgent need for the development of large-scale, multicentric, and multinational datasets. Ensuring data protection and minimizing re-identification risks pose central challenges in this process. The "Amsterdam University Medical Center database (AmsterdamUMCdb)" and the "Salzburg Intensive Care database (SICdb)" demonstrate that open access datasets are possible in Europe while complying with the data protection regulations of the General Data Protection Regulation (GDPR). Another challenge in building intensive care datasets is the absence of semantic definitions in the source data and the heterogeneity of data formats. Establishing binding industry standards for the semantic definition is crucial to ensure seamless semantic interoperability between datasets.
期刊介绍:
Medizinische Klinik – Intensivmedizin und Notfallmedizin is an internationally respected interdisciplinary journal. It is intended for physicians, nurses, respiratory and physical therapists active in intensive care and accident/emergency units, but also for internists, anesthesiologists, surgeons, neurologists, and pediatricians with special interest in intensive care medicine.
Comprehensive reviews describe the most recent advances in the field of internal medicine with special focus on intensive care problems. Freely submitted original articles present important studies in this discipline and promote scientific exchange, while articles in the category Photo essay feature interesting cases and aim at optimizing diagnostic and therapeutic strategies. In the rubric journal club well-respected experts comment on outstanding international publications. Review articles under the rubric "Continuing Medical Education" present verified results of scientific research and their integration into daily practice. The rubrics "Nursing practice" and "Physical therapy" round out the information.