Introduction: The Medical Informatics Initiative (MII) aims to enable cross-site secondary use of clinical data in Germany using a FHIR-based Core Data Set (CDS). However, current FHIR Implementation Guides (IG) often lack actor-specific guidance, leading to inconsistent interpretations and implementations.
Methods: This technical case report explores the use of FHIR Implementation Obligations to clarify responsibilities and expected system behavior within the MII infrastructure. Obligations were modeled using the FHIR obligation extension and ActorDefinition resources, applied to the Patient profile from the CDS Person module. A prototype IG was generated using the HL7 FHIR IG publisher tooling.
Results: Obligations were defined and rendered for multiple actors - such as Data Integration Centers (DIC) and the Health Research Data Portal (FDPG) - across selected Patient profile elements. Obligations were also linked to specific operations, enabling precise workflow targeting. The implementation improved the explicitness of responsibilities that were previously only implied.
Discussion: The study demonstrates that obligations enhance the clarity of FHIR IGs. However, limitations remain: the MII's current IG tooling does not yet support obligations, and conformance testing was not addressed. Further work is needed to standardize ActorDefinition resources, align obligations across modules, and develop validation tooling to realize the full potential of obligation-driven specifications.
{"title":"Improving Requirements Documentation in the Medical Informatics Initiative Core Data Set Using FHIR Obligations - Lessons Learned.","authors":"Julian Saß, Sylvia Thun","doi":"10.3233/SHTI251401","DOIUrl":"10.3233/SHTI251401","url":null,"abstract":"<p><strong>Introduction: </strong>The Medical Informatics Initiative (MII) aims to enable cross-site secondary use of clinical data in Germany using a FHIR-based Core Data Set (CDS). However, current FHIR Implementation Guides (IG) often lack actor-specific guidance, leading to inconsistent interpretations and implementations.</p><p><strong>Methods: </strong>This technical case report explores the use of FHIR Implementation Obligations to clarify responsibilities and expected system behavior within the MII infrastructure. Obligations were modeled using the FHIR obligation extension and ActorDefinition resources, applied to the Patient profile from the CDS Person module. A prototype IG was generated using the HL7 FHIR IG publisher tooling.</p><p><strong>Results: </strong>Obligations were defined and rendered for multiple actors - such as Data Integration Centers (DIC) and the Health Research Data Portal (FDPG) - across selected Patient profile elements. Obligations were also linked to specific operations, enabling precise workflow targeting. The implementation improved the explicitness of responsibilities that were previously only implied.</p><p><strong>Discussion: </strong>The study demonstrates that obligations enhance the clarity of FHIR IGs. However, limitations remain: the MII's current IG tooling does not yet support obligations, and conformance testing was not addressed. Further work is needed to standardize ActorDefinition resources, align obligations across modules, and develop validation tooling to realize the full potential of obligation-driven specifications.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"331 ","pages":"235-244"},"PeriodicalIF":0.0,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144985126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Introduction: Medical entity linking is an important task in biomedical natural language processing, aiming to align textual mentions of medical concepts with standardized concepts in ontologies. Most existing approaches rely on supervised models or domain-specific embeddings, which require large datasets and significant computational resources.
Objective: The objective of this work is (1) to investigate the effectiveness of large language models (LLMs) in improving both candidate generation and disambiguation for medical entity linking through synonym expansion and in-context learning, and (2) to evaluate this approach against traditional string-matching and supervised methods.
Methods: We propose a simple yet effective approach that combines string matching with an LLM through in-context learning. Our method avoids fine-tuning and minimizes annotation requirements, making it suitable for low-resource settings. Our system enhances fuzzy string matching by expanding mention spans with LLM-generated synonyms during candidate generation. UMLS entity names, aliases, and synonyms are indexed in Elasticsearch, and candidates are retrieved using both the original span and generated variants. Disambiguation is performed using an LLM with few-shot prompting to select the correct entity from the candidate list.
Results: Evaluated on the MedMentions dataset, our approach achieves 56% linking accuracy, outperforming baseline string matching but falling behind supervised learning methods. The candidate generation component reaches 70% recall@5, while the disambiguation step achieves 80% accuracy when the correct entity is among the top five. We also observe that LLM-generated descriptions do not always improve accuracy.
Conclusion: Our results demonstrate that LLMs have the potential to support medical entity linking in low-resource settings. Although our method is still outperformed by supervised models, it remains a lightweight alternative, requiring no fine-tuning or a large amount of annotated data. The approach is also adaptable to other domains and ontologies beyond biomedicine due to its flexible and domain-agnostic design.
{"title":"Medical Entity Linking in Low-Resource Settings with Fine-Tuning-Free LLMs.","authors":"Suteera Seeha, Martin Boeker, Luise Modersohn","doi":"10.3233/SHTI251402","DOIUrl":"10.3233/SHTI251402","url":null,"abstract":"<p><strong>Introduction: </strong>Medical entity linking is an important task in biomedical natural language processing, aiming to align textual mentions of medical concepts with standardized concepts in ontologies. Most existing approaches rely on supervised models or domain-specific embeddings, which require large datasets and significant computational resources.</p><p><strong>Objective: </strong>The objective of this work is (1) to investigate the effectiveness of large language models (LLMs) in improving both candidate generation and disambiguation for medical entity linking through synonym expansion and in-context learning, and (2) to evaluate this approach against traditional string-matching and supervised methods.</p><p><strong>Methods: </strong>We propose a simple yet effective approach that combines string matching with an LLM through in-context learning. Our method avoids fine-tuning and minimizes annotation requirements, making it suitable for low-resource settings. Our system enhances fuzzy string matching by expanding mention spans with LLM-generated synonyms during candidate generation. UMLS entity names, aliases, and synonyms are indexed in Elasticsearch, and candidates are retrieved using both the original span and generated variants. Disambiguation is performed using an LLM with few-shot prompting to select the correct entity from the candidate list.</p><p><strong>Results: </strong>Evaluated on the MedMentions dataset, our approach achieves 56% linking accuracy, outperforming baseline string matching but falling behind supervised learning methods. The candidate generation component reaches 70% recall@5, while the disambiguation step achieves 80% accuracy when the correct entity is among the top five. We also observe that LLM-generated descriptions do not always improve accuracy.</p><p><strong>Conclusion: </strong>Our results demonstrate that LLMs have the potential to support medical entity linking in low-resource settings. Although our method is still outperformed by supervised models, it remains a lightweight alternative, requiring no fine-tuning or a large amount of annotated data. The approach is also adaptable to other domains and ontologies beyond biomedicine due to its flexible and domain-agnostic design.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"331 ","pages":"245-254"},"PeriodicalIF":0.0,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144985149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Christian Otto, Jennifer Dörfler, Cord Spreckelsen, Jutta Hübner
Introduction: Assessing the ever-growing number of publications in evidence-based medicine by means of their risk of biases is as essential as it is challenging. This is especially true for the field of complementary and alternative medicine (CAM), a field that remains underrepresented in systematic review collections such as those by the Cochrane Review Groups.
Methods: In this work, we present CAMIH, a semantic wiki platform that offers clinicians a collaborative space to find, summarize, and discuss CAM evidence. CAMIH is built on semantic web technologies and structures information using semantic triplets. By structuring like this, CAMIH goes beyond simple data collection. Our goal is to enable a deeper understanding and organization of evidence, thereby acting as a CAM-specific supplement to existing evidence-synthesis frameworks inspired by the Cochrane methodology.
Results: We anticipate the implemented platform to make evidence synthesis and risk of bias assessment more efficient, but also reduce the time required to derive treatment strategies. Given its foundation in semantic web technologies, it serves both as a practical tool for clinicians and as a methodological blueprint for other research domains seeking to systematically organize gathered evidence.
Discussion: Given the advantages of the platform, it requires, in its current state, manual efforts to be kept up to date. However, our goal is too semi-automize this process to sustainably keep CAMIH relevant.
Conclusion: This work provides an addition to the evidence database-landscape for the CAM field. We hope it will enable clinicians to create, discuss, and synthesize evidence while also providing a blueprint for other research areas that want to organize evidence.
{"title":"CAMIH - The Complementary and Alternative Medicine Insights Hub.","authors":"Christian Otto, Jennifer Dörfler, Cord Spreckelsen, Jutta Hübner","doi":"10.3233/SHTI251377","DOIUrl":"10.3233/SHTI251377","url":null,"abstract":"<p><strong>Introduction: </strong>Assessing the ever-growing number of publications in evidence-based medicine by means of their risk of biases is as essential as it is challenging. This is especially true for the field of complementary and alternative medicine (CAM), a field that remains underrepresented in systematic review collections such as those by the Cochrane Review Groups.</p><p><strong>Methods: </strong>In this work, we present CAMIH, a semantic wiki platform that offers clinicians a collaborative space to find, summarize, and discuss CAM evidence. CAMIH is built on semantic web technologies and structures information using semantic triplets. By structuring like this, CAMIH goes beyond simple data collection. Our goal is to enable a deeper understanding and organization of evidence, thereby acting as a CAM-specific supplement to existing evidence-synthesis frameworks inspired by the Cochrane methodology.</p><p><strong>Results: </strong>We anticipate the implemented platform to make evidence synthesis and risk of bias assessment more efficient, but also reduce the time required to derive treatment strategies. Given its foundation in semantic web technologies, it serves both as a practical tool for clinicians and as a methodological blueprint for other research domains seeking to systematically organize gathered evidence.</p><p><strong>Discussion: </strong>Given the advantages of the platform, it requires, in its current state, manual efforts to be kept up to date. However, our goal is too semi-automize this process to sustainably keep CAMIH relevant.</p><p><strong>Conclusion: </strong>This work provides an addition to the evidence database-landscape for the CAM field. We hope it will enable clinicians to create, discuss, and synthesize evidence while also providing a blueprint for other research areas that want to organize evidence.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"331 ","pages":"35-43"},"PeriodicalIF":0.0,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144985024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sebastian Stäubert, Angela Merzweiler, Jörg Römhild, Stefan Lang, Martin Bialke
Introduction: The lawful processing of health data in medical research necessitates robust mechanisms for managing patient consent and objections, aligning with national and european regulations. While the initial version of the HL7 standard Consent Management" primarily focused on opt-in scenarios, evolving legal landscapes and practical implementation challenges highlight the need for comprehensive solutions encompassing both opt-in and opt-out approaches, including withdrawals and objections. This paper details the systematic revision of the latest HL7 FHIR-based "Consent Management 2.0" standard to address these limitations.
Methods: Our methodology involved a critical assessment of the 2021 standard against three years of practical experience and emerging regulatory requirements.
Results: Key improvements include enhanced support for diverse document types (consent, withdrawal, refusal, objection), refined technical specifications for automated conversion of questionnaire responses into machine-readable Consent Resources, and the introduction of a novel "ResultType" category. This new category enables use-case-specific aggregation of consent information, simplifying downstream processing and reducing interpretation ambiguities. Additionally, uniform FHIR search parameters were defined, and comprehensive examples were integrated into the implementation guide. The revised standard successfully underwent the HL7 ballot process in April 2025, with early practical implementations already demonstrating its utility.
Conclusion: This extended standard significantly enhances the interoperability and legal robustness of consent management in complex research infrastructures, fostering improved patient autonomy and trust in digital health data reuse.
{"title":"Consent Management 2.0: Empowering Patient Will in Medical Research and Care.","authors":"Sebastian Stäubert, Angela Merzweiler, Jörg Römhild, Stefan Lang, Martin Bialke","doi":"10.3233/SHTI251389","DOIUrl":"10.3233/SHTI251389","url":null,"abstract":"<p><strong>Introduction: </strong>The lawful processing of health data in medical research necessitates robust mechanisms for managing patient consent and objections, aligning with national and european regulations. While the initial version of the HL7 standard Consent Management\" primarily focused on opt-in scenarios, evolving legal landscapes and practical implementation challenges highlight the need for comprehensive solutions encompassing both opt-in and opt-out approaches, including withdrawals and objections. This paper details the systematic revision of the latest HL7 FHIR-based \"Consent Management 2.0\" standard to address these limitations.</p><p><strong>Methods: </strong>Our methodology involved a critical assessment of the 2021 standard against three years of practical experience and emerging regulatory requirements.</p><p><strong>Results: </strong>Key improvements include enhanced support for diverse document types (consent, withdrawal, refusal, objection), refined technical specifications for automated conversion of questionnaire responses into machine-readable Consent Resources, and the introduction of a novel \"ResultType\" category. This new category enables use-case-specific aggregation of consent information, simplifying downstream processing and reducing interpretation ambiguities. Additionally, uniform FHIR search parameters were defined, and comprehensive examples were integrated into the implementation guide. The revised standard successfully underwent the HL7 ballot process in April 2025, with early practical implementations already demonstrating its utility.</p><p><strong>Conclusion: </strong>This extended standard significantly enhances the interoperability and legal robustness of consent management in complex research infrastructures, fostering improved patient autonomy and trust in digital health data reuse.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"331 ","pages":"133-141"},"PeriodicalIF":0.0,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144985043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kevin Kaufmes, Georg Mathes, Dilyana Vladimirova, Stephanie Berger, Christian Fegeler, Stefan Sigle
Introduction: In the context of precision oncology, patients often have complex conditions that require treatment based on specific and up-to-date knowledge of guidelines and research. This entails considerable effort when preparing such cases for molecular tumor boards (MTBs). Large language models (LLMs) could help to lower this burden if they could provide such information quickly and precisely on demand. Since out-of-the-box LLMs are not specialized for clinical contexts, this work aims to investigate their usefulness for answering questions arising during MTB preparation. As such questions can contain sensitive data, we evaluated medium-scale models suitable for running on-premise using consumer grade hardware.
Methods: Three recent LLMs to be tested were selected based on established benchmarks and unique characteristics like reasoning capability. Exemplary questions related to MTBs were collected from domain experts. Six of those were selected for the LLMs to generate responses to. Response quality and correctness was evaluated by experts using a questionnaire.
Results: Out of 60 contacted domain experts, 5 fully completed the survey, with another 5 completing it partially. The evaluation revealed a modest overall performance. Our findings identified significant issues, where a large percentage of answers contained outdated or incomplete information, as well as factual errors. Additionally, a high discordance between evaluators regarding correctness and varying rater confidence has been observed.
Conclusion: Our results seem to be indicating that medium-scale LLMs are currently insufficiently reliable for use in precision oncology. Common issues include outdated information and confident presentation of misinformation, which indicates a gap between benchmark- and real-world performance. Future research should focus on mitigating limitations with advanced techniques such as Retrieval-Augmented-Generation (RAG), web search capability or advanced prompting, while prioritizing patient safety.
{"title":"Evaluating Medium Scale, Open-Source Large Language Models: Towards Decision Support in a Precision Oncology Care Delivery Context.","authors":"Kevin Kaufmes, Georg Mathes, Dilyana Vladimirova, Stephanie Berger, Christian Fegeler, Stefan Sigle","doi":"10.3233/SHTI251382","DOIUrl":"10.3233/SHTI251382","url":null,"abstract":"<p><strong>Introduction: </strong>In the context of precision oncology, patients often have complex conditions that require treatment based on specific and up-to-date knowledge of guidelines and research. This entails considerable effort when preparing such cases for molecular tumor boards (MTBs). Large language models (LLMs) could help to lower this burden if they could provide such information quickly and precisely on demand. Since out-of-the-box LLMs are not specialized for clinical contexts, this work aims to investigate their usefulness for answering questions arising during MTB preparation. As such questions can contain sensitive data, we evaluated medium-scale models suitable for running on-premise using consumer grade hardware.</p><p><strong>Methods: </strong>Three recent LLMs to be tested were selected based on established benchmarks and unique characteristics like reasoning capability. Exemplary questions related to MTBs were collected from domain experts. Six of those were selected for the LLMs to generate responses to. Response quality and correctness was evaluated by experts using a questionnaire.</p><p><strong>Results: </strong>Out of 60 contacted domain experts, 5 fully completed the survey, with another 5 completing it partially. The evaluation revealed a modest overall performance. Our findings identified significant issues, where a large percentage of answers contained outdated or incomplete information, as well as factual errors. Additionally, a high discordance between evaluators regarding correctness and varying rater confidence has been observed.</p><p><strong>Conclusion: </strong>Our results seem to be indicating that medium-scale LLMs are currently insufficiently reliable for use in precision oncology. Common issues include outdated information and confident presentation of misinformation, which indicates a gap between benchmark- and real-world performance. Future research should focus on mitigating limitations with advanced techniques such as Retrieval-Augmented-Generation (RAG), web search capability or advanced prompting, while prioritizing patient safety.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"331 ","pages":"81-90"},"PeriodicalIF":0.0,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144985061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joshua Wiedekopf, Tessa Ohlsen, Alan Koops, Ann-Kristin Kock-Schoppenhauer, Muhammad Adnan, Sarah Ballout, Nele Philipzik, Oya Beyan, Andreas Beyer, Michael Marschollek, Josef Ingenerf
Introduction: As part of the German Medical Informatics Initiative (MII) and Network University Medicine (NUM), a central research terminology service (TS) is provided by the Service Unit Terminology Services (SU-TermServ). This HL7 FHIR-based service depends on the timely and comprehensive availability of FHIR terminology resources to provide the necessary interactions for the distributed MII/NUM infrastructure. While German legislation has recently instituted a national terminology service for medical classifications and terminologies, the scope of the MII and NUM extends beyond routine patient care, encompassing the need for supplementary or specialized services and terminologies that are not commonly utilized elsewhere.
Methods: The SU-TermServ's processes are based on established FHIR principles and the recently-proposed Canonical Resources Management Infrastructure Implementation Guide, which are outlined in this paper.
Results: The strategy and processes implemented within the project can deliver the needed resources both to the central FHIR terminology service, but also to the local data integration centers, in a transparent and consistent fashion. The service currently provides approximately 7000 resources to users via the standardized FHIR API.
Conclusion: The professionalized distribution and maintenance of these terminological resources and the provision of a powerful TS implementation aids both the development of the Core Data Set and the data integration centers, and ultimately biomedical researchers requesting access to this rich data.
{"title":"Implementation of HL7 FHIR-Based Terminology Services for a National Federated Health Research Infrastructure.","authors":"Joshua Wiedekopf, Tessa Ohlsen, Alan Koops, Ann-Kristin Kock-Schoppenhauer, Muhammad Adnan, Sarah Ballout, Nele Philipzik, Oya Beyan, Andreas Beyer, Michael Marschollek, Josef Ingenerf","doi":"10.3233/SHTI251396","DOIUrl":"10.3233/SHTI251396","url":null,"abstract":"<p><strong>Introduction: </strong>As part of the German Medical Informatics Initiative (MII) and Network University Medicine (NUM), a central research terminology service (TS) is provided by the Service Unit Terminology Services (SU-TermServ). This HL7 FHIR-based service depends on the timely and comprehensive availability of FHIR terminology resources to provide the necessary interactions for the distributed MII/NUM infrastructure. While German legislation has recently instituted a national terminology service for medical classifications and terminologies, the scope of the MII and NUM extends beyond routine patient care, encompassing the need for supplementary or specialized services and terminologies that are not commonly utilized elsewhere.</p><p><strong>Methods: </strong>The SU-TermServ's processes are based on established FHIR principles and the recently-proposed Canonical Resources Management Infrastructure Implementation Guide, which are outlined in this paper.</p><p><strong>Results: </strong>The strategy and processes implemented within the project can deliver the needed resources both to the central FHIR terminology service, but also to the local data integration centers, in a transparent and consistent fashion. The service currently provides approximately 7000 resources to users via the standardized FHIR API.</p><p><strong>Conclusion: </strong>The professionalized distribution and maintenance of these terminological resources and the provision of a powerful TS implementation aids both the development of the Core Data Set and the data integration centers, and ultimately biomedical researchers requesting access to this rich data.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"331 ","pages":"195-203"},"PeriodicalIF":0.0,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144985108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mehmed Halilovic, Karen Otte, Thierry Meurers, Marco Alibone, Marion Ludwig, Nico Riedel, Steven Wolter, Lisa Kühnel, Steffen Hess, Fabian Prasser
Introduction: The re-use of health insurance claims data for research purposes can provide valuable insights to improve patient care. However, as health data is often highly sensitive and subject to strict regulatory frameworks, the privacy of individuals must be protected. Anonymization is a common approach to do so, but finding an effective strategy is challenging due to an inherent trade-off between privacy protection and data utility. A structured approach is needed to balance these objectives and guide the selection of appropriate anonymization strategies.
Methods: In this paper, we present a systematic evaluation of twelve anonymization strategies applied to German health insurance claims data that has previously been used in a drug safety study. The dataset consisted of 1727 records and 45 variables. Based on a structured threat modeling, we compare a conservative and a threat modeling-based approach, each with six different privacy models and risk thresholds using the ARX Data Anonymization Tool. We assess general data utility and empirically evaluate residual privacy risks using both the Anonymeter framework and a membership inference attack.
Results: Our results show that conservative anonymization ensures strong privacy protection but reduces data utility. In contrast, threat modeling retains more utility while still providing acceptable privacy under moderate thresholds.
Conclusion: The proposed process enables a systematic comparison of privacy-utility trade-offs and can be adapted to other medical datasets. Our findings highlight the importance of context-specific anonymization strategies and empirical risk evaluation to guide anonymized data sharing in healthcare.
{"title":"Anonymization of Health Insurance Claims Data for Medication Safety Assessments.","authors":"Mehmed Halilovic, Karen Otte, Thierry Meurers, Marco Alibone, Marion Ludwig, Nico Riedel, Steven Wolter, Lisa Kühnel, Steffen Hess, Fabian Prasser","doi":"10.3233/SHTI251407","DOIUrl":"10.3233/SHTI251407","url":null,"abstract":"<p><strong>Introduction: </strong>The re-use of health insurance claims data for research purposes can provide valuable insights to improve patient care. However, as health data is often highly sensitive and subject to strict regulatory frameworks, the privacy of individuals must be protected. Anonymization is a common approach to do so, but finding an effective strategy is challenging due to an inherent trade-off between privacy protection and data utility. A structured approach is needed to balance these objectives and guide the selection of appropriate anonymization strategies.</p><p><strong>Methods: </strong>In this paper, we present a systematic evaluation of twelve anonymization strategies applied to German health insurance claims data that has previously been used in a drug safety study. The dataset consisted of 1727 records and 45 variables. Based on a structured threat modeling, we compare a conservative and a threat modeling-based approach, each with six different privacy models and risk thresholds using the ARX Data Anonymization Tool. We assess general data utility and empirically evaluate residual privacy risks using both the Anonymeter framework and a membership inference attack.</p><p><strong>Results: </strong>Our results show that conservative anonymization ensures strong privacy protection but reduces data utility. In contrast, threat modeling retains more utility while still providing acceptable privacy under moderate thresholds.</p><p><strong>Conclusion: </strong>The proposed process enables a systematic comparison of privacy-utility trade-offs and can be adapted to other medical datasets. Our findings highlight the importance of context-specific anonymization strategies and empirical risk evaluation to guide anonymized data sharing in healthcare.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"331 ","pages":"283-291"},"PeriodicalIF":0.0,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144985112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anne Pelz, Philipp Heinrich, Gabriele Mueller, Anne Seim, Peter Penndorf, Martin Bialke, Martin Sedlmayr, Ines Reinecke, Markus Wolfien, Katja Hoffmann
Introduction: The German Medical Informatics Initiative (MII) promotes the use of routine clinical data for research, supported by the broad consent framework to ensure patient engagement. This work proposes a data management process and reference infrastructure to improve transparency by enabling patients to track their consent history and data use in research.
Methods: We analyzed the data provision process at the University Hospital Dresden (UKD) to identify roles and data flows relevant to secondary data use under broad consent. Established MII tools in use at UKD were evaluated for their suitability in enabling secure data access.
Results: We developed a structured data access process and implemented a reference infrastructure that lays the groundwork for a potential patient-facing application providing secure access to consent and study details.
Conclusion: The reference infrastructure demonstrates how existing MII tools can be repurposed to offer patient-centric transparency in secondary data use. Future work will address scalability, access control, and ethical considerations, such as patient expectations and the clarity of information.
{"title":"From Broad Consent to Patient Engagement: A Framework for Consent Management and Study Oversight.","authors":"Anne Pelz, Philipp Heinrich, Gabriele Mueller, Anne Seim, Peter Penndorf, Martin Bialke, Martin Sedlmayr, Ines Reinecke, Markus Wolfien, Katja Hoffmann","doi":"10.3233/SHTI251405","DOIUrl":"10.3233/SHTI251405","url":null,"abstract":"<p><strong>Introduction: </strong>The German Medical Informatics Initiative (MII) promotes the use of routine clinical data for research, supported by the broad consent framework to ensure patient engagement. This work proposes a data management process and reference infrastructure to improve transparency by enabling patients to track their consent history and data use in research.</p><p><strong>Methods: </strong>We analyzed the data provision process at the University Hospital Dresden (UKD) to identify roles and data flows relevant to secondary data use under broad consent. Established MII tools in use at UKD were evaluated for their suitability in enabling secure data access.</p><p><strong>Results: </strong>We developed a structured data access process and implemented a reference infrastructure that lays the groundwork for a potential patient-facing application providing secure access to consent and study details.</p><p><strong>Conclusion: </strong>The reference infrastructure demonstrates how existing MII tools can be repurposed to offer patient-centric transparency in secondary data use. Future work will address scalability, access control, and ethical considerations, such as patient expectations and the clarity of information.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"331 ","pages":"265-273"},"PeriodicalIF":0.0,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144985102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Introduction: The heterogeneity of metadata continues to be a key challenge in the healthcare sector. The Data Dictionary Minimal Information Model (DDMIM) aims to meet the need for interoperability between different standards and data dictionaries to facilitate the exchange of metadata.
Objective: This paper presents the conception, and the development of a metadata search portal based on the DDMIM specification, designed to improve the discoverability and accessibility of health datasets and enhance interoperability.
Methods: We conducted a literature review of existing metadata repositories to select potentially relevant ones for further work. A mapping was created to transform metadata from different MDRs into the DDMIM format. In parallel, the requirements for a prototype search portal are being evaluated, which integrates metadata from various public repositories.
Results: The results show that a DDMIM-based search portal can effectively integrate heterogeneous metadata sources and improve the finding of health datasets.
Discussion: Such a portal supports the integration of heterogeneous metadata sources and ensures compliance with FAIR principles to optimize the use of health data for research and clinical applications. It is therefore of great importance to address the existing challenges in the field of medical data integration and utilization.
{"title":"Conception and Development of a Metadata Search Portal Based on the Data Dictionary Minimal Information Model (DDMIM) Specification.","authors":"Leslie Diana Wamba Makem, Abishaa Vengadeswaran, Dupleix Achille Takoulegha, Dennis Kadioglu","doi":"10.3233/SHTI251400","DOIUrl":"10.3233/SHTI251400","url":null,"abstract":"<p><strong>Introduction: </strong>The heterogeneity of metadata continues to be a key challenge in the healthcare sector. The Data Dictionary Minimal Information Model (DDMIM) aims to meet the need for interoperability between different standards and data dictionaries to facilitate the exchange of metadata.</p><p><strong>Objective: </strong>This paper presents the conception, and the development of a metadata search portal based on the DDMIM specification, designed to improve the discoverability and accessibility of health datasets and enhance interoperability.</p><p><strong>Methods: </strong>We conducted a literature review of existing metadata repositories to select potentially relevant ones for further work. A mapping was created to transform metadata from different MDRs into the DDMIM format. In parallel, the requirements for a prototype search portal are being evaluated, which integrates metadata from various public repositories.</p><p><strong>Results: </strong>The results show that a DDMIM-based search portal can effectively integrate heterogeneous metadata sources and improve the finding of health datasets.</p><p><strong>Discussion: </strong>Such a portal supports the integration of heterogeneous metadata sources and ensures compliance with FAIR principles to optimize the use of health data for research and clinical applications. It is therefore of great importance to address the existing challenges in the field of medical data integration and utilization.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"331 ","pages":"228-234"},"PeriodicalIF":0.0,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144985055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gorkem Yilmaz, Jonathan M Mang, Markus Metzler, Hans-Ulrich Prokosch, Manfred Rauh, Jakob Zierk
Introduction: Data-driven analysis of clinical databases is an efficient method for clinical knowledge generation, which is especially suitable when exceptional ethical and practical restrictions apply, such as in pediatrics. In the multi-center PEDREF 2.0 study, we are analyzing children's laboratory test results, diagnoses, and procedures from more than 20 German tertiary care centers to establish pediatric reference intervals. The PEDREF 2.0 study uses the framework of the German Medical Informatics Initiative, but the specific study needs require the development of a customized module for distributed pediatric analyses.
Methods: We developed the Pediatric Distributed Analysis, Anonymization, and Aggregation Module (PED-DATA), which is a containerized application that we deployed to all participating centers. PED-DATA transforms the input datasets to a harmonized internal representation and enables their decentralized analysis in compliance with data protection rules, resulting in an anonymous output dataset that is transferred for central analysis.
Results: In a preliminary analysis of data from 15 centers, we analyzed 52,807,236 laboratory test results from 753,774 different patients (323,943 to 4,338,317 test results per laboratory test), enabling us to establish pediatric reference intervals with previously unmatched precision.
Conclusion: PED-DATA facilitates the implementation of pediatric data-driven multicenter studies in a decentralized and privacy-respecting manner, and its use throughout German University Hospitals in the PEDREF 2.0 study demonstrates its usefulness in a real-world use case.
{"title":"PED-DATA: A Privacy-Preserving Framework for Data-Driven, Pediatric Multi-Center Studies.","authors":"Gorkem Yilmaz, Jonathan M Mang, Markus Metzler, Hans-Ulrich Prokosch, Manfred Rauh, Jakob Zierk","doi":"10.3233/SHTI251409","DOIUrl":"10.3233/SHTI251409","url":null,"abstract":"<p><strong>Introduction: </strong>Data-driven analysis of clinical databases is an efficient method for clinical knowledge generation, which is especially suitable when exceptional ethical and practical restrictions apply, such as in pediatrics. In the multi-center PEDREF 2.0 study, we are analyzing children's laboratory test results, diagnoses, and procedures from more than 20 German tertiary care centers to establish pediatric reference intervals. The PEDREF 2.0 study uses the framework of the German Medical Informatics Initiative, but the specific study needs require the development of a customized module for distributed pediatric analyses.</p><p><strong>Methods: </strong>We developed the Pediatric Distributed Analysis, Anonymization, and Aggregation Module (PED-DATA), which is a containerized application that we deployed to all participating centers. PED-DATA transforms the input datasets to a harmonized internal representation and enables their decentralized analysis in compliance with data protection rules, resulting in an anonymous output dataset that is transferred for central analysis.</p><p><strong>Results: </strong>In a preliminary analysis of data from 15 centers, we analyzed 52,807,236 laboratory test results from 753,774 different patients (323,943 to 4,338,317 test results per laboratory test), enabling us to establish pediatric reference intervals with previously unmatched precision.</p><p><strong>Conclusion: </strong>PED-DATA facilitates the implementation of pediatric data-driven multicenter studies in a decentralized and privacy-respecting manner, and its use throughout German University Hospitals in the PEDREF 2.0 study demonstrates its usefulness in a real-world use case.</p>","PeriodicalId":94357,"journal":{"name":"Studies in health technology and informatics","volume":"331 ","pages":"307-317"},"PeriodicalIF":0.0,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144985129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}