Pub Date : 2026-01-29DOI: 10.1186/s13326-026-00347-8
Clifford Chen, Muhammad Amith, Kirk Roberts, Rebecca Mauldin, Renata Komalasari, Cui Tao
{"title":"An application-based ontological knowledge base of medications to support health literacy and adherence for the consumer population: an aging population use case.","authors":"Clifford Chen, Muhammad Amith, Kirk Roberts, Rebecca Mauldin, Renata Komalasari, Cui Tao","doi":"10.1186/s13326-026-00347-8","DOIUrl":"https://doi.org/10.1186/s13326-026-00347-8","url":null,"abstract":"","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":""},"PeriodicalIF":2.0,"publicationDate":"2026-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146085878","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-22DOI: 10.1186/s13326-025-00345-2
Anuwat Pengput, Alexander D Diehl
{"title":"Ontology development and use for cholangiocarcinoma risk factors and predictions: a term enrichment data analysis and machine learning classification.","authors":"Anuwat Pengput, Alexander D Diehl","doi":"10.1186/s13326-025-00345-2","DOIUrl":"10.1186/s13326-025-00345-2","url":null,"abstract":"","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"17 1","pages":"2"},"PeriodicalIF":2.0,"publicationDate":"2026-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12829242/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146029611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-06DOI: 10.1186/s13326-025-00344-3
Tessa Ohlsen, André Sander, Josef Ingenerf
Background: The Expression Constraint Language (ECL) is a powerful query language for SNOMED CT, enabling precise semantic queries across clinical concepts. However, its complex syntax and reliance on the SNOMED CT Concept Model make it difficult for non-experts to use, limiting its broader adoption in clinical research and healthcare analytics.
Objective: This work presents ECLed, a web-based tool designed to simplify access to ECL queries by abstracting the complexity of ECL syntax and the SNOMED CT Concept Model. ECLed is aimed at non-technical users, enabling the creation and modification of ECL queries and facilitating the querying of patient data coded with SNOMED CT.
Methods: ECLed was developed following a detailed requirements analysis, addressing both functional and non-functional needs. The tool supports the creation and editing of SNOMED CT ECL queries, integrates a processed Concept Model, and uses FHIR terminology services for semantic validation. Its modular architecture, with a frontend based on Angular and a backend on Spring Boot, ensures seamless communication through RESTful interfaces.
Result: ECLed demonstrated high usability in a user survey. Technical validation confirmed that it reliably generates and edits complex ECL queries. The tool was successfully integrated into the DaWiMed research platform, enhancing clinical analysis workflows. It also worked effectively with clinical data in FHIR format, although scalability with larger datasets remains to be tested.
Discussion: ECLed overcomes the limitations of existing ECL tools by abstracting the complexity of both the syntax and the SNOMED CT Concept Model. It provides a user-friendly solution that enables both technical and non-technical users to easily create and edit ECL queries.
Conclusion: ECLed offers a practical, user-friendly solution for creating SNOMED CT ECL queries, effectively hiding the underlying complexity while optimizing clinical research and data analysis workflows. It holds significant potential for further development and integration into additional research platforms.
{"title":"ECLed- a tool supporting the effective use of the SNOMED CT Expression Constraint Language.","authors":"Tessa Ohlsen, André Sander, Josef Ingenerf","doi":"10.1186/s13326-025-00344-3","DOIUrl":"10.1186/s13326-025-00344-3","url":null,"abstract":"<p><strong>Background: </strong>The Expression Constraint Language (ECL) is a powerful query language for SNOMED CT, enabling precise semantic queries across clinical concepts. However, its complex syntax and reliance on the SNOMED CT Concept Model make it difficult for non-experts to use, limiting its broader adoption in clinical research and healthcare analytics.</p><p><strong>Objective: </strong>This work presents ECLed, a web-based tool designed to simplify access to ECL queries by abstracting the complexity of ECL syntax and the SNOMED CT Concept Model. ECLed is aimed at non-technical users, enabling the creation and modification of ECL queries and facilitating the querying of patient data coded with SNOMED CT.</p><p><strong>Methods: </strong>ECLed was developed following a detailed requirements analysis, addressing both functional and non-functional needs. The tool supports the creation and editing of SNOMED CT ECL queries, integrates a processed Concept Model, and uses FHIR terminology services for semantic validation. Its modular architecture, with a frontend based on Angular and a backend on Spring Boot, ensures seamless communication through RESTful interfaces.</p><p><strong>Result: </strong>ECLed demonstrated high usability in a user survey. Technical validation confirmed that it reliably generates and edits complex ECL queries. The tool was successfully integrated into the DaWiMed research platform, enhancing clinical analysis workflows. It also worked effectively with clinical data in FHIR format, although scalability with larger datasets remains to be tested.</p><p><strong>Discussion: </strong>ECLed overcomes the limitations of existing ECL tools by abstracting the complexity of both the syntax and the SNOMED CT Concept Model. It provides a user-friendly solution that enables both technical and non-technical users to easily create and edit ECL queries.</p><p><strong>Conclusion: </strong>ECLed offers a practical, user-friendly solution for creating SNOMED CT ECL queries, effectively hiding the underlying complexity while optimizing clinical research and data analysis workflows. It holds significant potential for further development and integration into additional research platforms.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"17 1","pages":"1"},"PeriodicalIF":2.0,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12777381/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145911535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Background: Around 30 million people in Europe are affected by a rare (or orphan) disease, defined as a condition occurring in fewer than 1 in 2,000 individuals. The primary challenge is to automatically and efficiently identify scientific articles and guidelines that address a particular rare disease. We present a novel methodology to annotate and index scientific text with taxonomical concepts describing rare diseases from the OrphaNet taxonomy. This task is complicated by several technical challenges, including the lack of sufficiently large, human-annotated datasets for supervised training and the polysemy/synonymy and surface-form variation of rare disease names, which can hinder any annotation engine.
Results: We introduce a framework that operationalizes OrphaNet for large-scale literature annotation by integrating the TERMite engine with curated synonym expansion, label normalization (including deprecated/renamed concepts), and fuzzy matching. On benchmark datasets, the approach achieves precision = 92%, recall = 75%, and F1 = 83%, outperforming an string-matching baseline. Applying the pipeline to Scopus produces disease-specific corpora suitable for bibliometric and scientometric analyses (e.g., institution, country, and subject-area profiles). These outputs power the Rare Diseases Monitor dashboard for exploring national and global research activity.
Conclusion: To our knowledge, this is the first systematic, scalable semantic framework for annotating and indexing rare disease literature at scale. By operationalizing OrphaNet in an automated, reproducible pipeline and addressing data scarcity and lexical variability, the work advances biomedical semantics for rare diseases and enables disease-centric monitoring, evaluation, and discovery across the research landscape.
{"title":"Annotating and indexing scientific articles with rare diseases.","authors":"Hosein Azarbonyad, Zubair Afzal, Rik Iping, Max Dumoulin, Ilse Nederveen, Jiangtao Yu, Georgios Tsatsaronis","doi":"10.1186/s13326-025-00346-1","DOIUrl":"10.1186/s13326-025-00346-1","url":null,"abstract":"<p><strong>Background: </strong>Around 30 million people in Europe are affected by a rare (or orphan) disease, defined as a condition occurring in fewer than 1 in 2,000 individuals. The primary challenge is to automatically and efficiently identify scientific articles and guidelines that address a particular rare disease. We present a novel methodology to annotate and index scientific text with taxonomical concepts describing rare diseases from the OrphaNet taxonomy. This task is complicated by several technical challenges, including the lack of sufficiently large, human-annotated datasets for supervised training and the polysemy/synonymy and surface-form variation of rare disease names, which can hinder any annotation engine.</p><p><strong>Results: </strong>We introduce a framework that operationalizes OrphaNet for large-scale literature annotation by integrating the TERMite engine with curated synonym expansion, label normalization (including deprecated/renamed concepts), and fuzzy matching. On benchmark datasets, the approach achieves precision = 92%, recall = 75%, and F1 = 83%, outperforming an string-matching baseline. Applying the pipeline to Scopus produces disease-specific corpora suitable for bibliometric and scientometric analyses (e.g., institution, country, and subject-area profiles). These outputs power the Rare Diseases Monitor dashboard for exploring national and global research activity.</p><p><strong>Conclusion: </strong>To our knowledge, this is the first systematic, scalable semantic framework for annotating and indexing rare disease literature at scale. By operationalizing OrphaNet in an automated, reproducible pipeline and addressing data scarcity and lexical variability, the work advances biomedical semantics for rare diseases and enables disease-centric monitoring, evaluation, and discovery across the research landscape.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"3"},"PeriodicalIF":2.0,"publicationDate":"2026-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12870340/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145911488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-12-18DOI: 10.1186/s13326-025-00341-6
Paloma Rabaey, Stefan Heytens, Thomas Demeester
{"title":"SimSUM - simulated benchmark with structured and unstructured medical records.","authors":"Paloma Rabaey, Stefan Heytens, Thomas Demeester","doi":"10.1186/s13326-025-00341-6","DOIUrl":"10.1186/s13326-025-00341-6","url":null,"abstract":"","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"20"},"PeriodicalIF":2.0,"publicationDate":"2025-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12713242/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145781327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-29DOI: 10.1186/s13326-025-00343-4
Joshua Wiedekopf, Tessa Ohlsen, Ann-Kristin Kock-Schoppenhauer, Josef Ingenerf
Background: HL7 FHIR terminological services (TS) are a valuable tool towards better healthcare interoperability, but require representations of terminologies using FHIR resources to provide their services. As most terminologies are not natively distributed using FHIR resources, converters are needed. Large-scale FHIR projects, especially those with a national or even an international scope, define enormous numbers of value sets and reference many large and complex code systems, which must be regularly updated in TS and other systems. This necessitates a flexible, scalable and efficient provision of these artifacts. This work aims to develop a comprehensive, extensible and accessible toolkit for FHIR terminology conversion, making it possible for terminology authors, FHIR profilers and other actors to provide standardized TS for large-scale terminological artifacts.
Implementation: Based on the prevalent HL7 FHIR Shorthand (FSH) specification, a converter toolkit, called BabelFSH, was created that utilizes an adaptable plugin architecture to separate the definition of content from that of the needed declarative metadata. The development process was guided by formalized design goals.
Results: All eight design goals were addressed by BabelFSH. Validation of the systems' performance and completeness was exemplarily demonstrated using Alpha-ID-SE, an important terminology used for diagnosis coding especially of rare diseases within Germany. The tool is now used extensively within the content delivery pipeline for a central FHIR TS with a national scope within the German Medical Informatics Initiative and Network University Medicine and demonstrates adequate usability for FHIR developers.
Discussion: The first development focus was geared towards the requirements of the central research FHIR TS for the federated FHIR infrastructure in Germany, and has proven to be very useful towards that goal. Opportunities for further improvement were identified in the validation process especially, as the validation messages are currently imprecise at times. The design of the application lends itself to the implementation of further use cases, such as direct connectivity to legacy systems for catalog conversion to FHIR.
Conclusions: The developed BabelFSH tool is a novel, powerful and open-source approach to making heterogenous sources of terminological knowledge accessible as FHIR resources, thus aiding semantic interoperability in healthcare in general.
{"title":"BabelFSH-a toolkit for an effective HL7 FHIR-based terminology provision.","authors":"Joshua Wiedekopf, Tessa Ohlsen, Ann-Kristin Kock-Schoppenhauer, Josef Ingenerf","doi":"10.1186/s13326-025-00343-4","DOIUrl":"10.1186/s13326-025-00343-4","url":null,"abstract":"<p><strong>Background: </strong>HL7 FHIR terminological services (TS) are a valuable tool towards better healthcare interoperability, but require representations of terminologies using FHIR resources to provide their services. As most terminologies are not natively distributed using FHIR resources, converters are needed. Large-scale FHIR projects, especially those with a national or even an international scope, define enormous numbers of value sets and reference many large and complex code systems, which must be regularly updated in TS and other systems. This necessitates a flexible, scalable and efficient provision of these artifacts. This work aims to develop a comprehensive, extensible and accessible toolkit for FHIR terminology conversion, making it possible for terminology authors, FHIR profilers and other actors to provide standardized TS for large-scale terminological artifacts.</p><p><strong>Implementation: </strong>Based on the prevalent HL7 FHIR Shorthand (FSH) specification, a converter toolkit, called BabelFSH, was created that utilizes an adaptable plugin architecture to separate the definition of content from that of the needed declarative metadata. The development process was guided by formalized design goals.</p><p><strong>Results: </strong>All eight design goals were addressed by BabelFSH. Validation of the systems' performance and completeness was exemplarily demonstrated using Alpha-ID-SE, an important terminology used for diagnosis coding especially of rare diseases within Germany. The tool is now used extensively within the content delivery pipeline for a central FHIR TS with a national scope within the German Medical Informatics Initiative and Network University Medicine and demonstrates adequate usability for FHIR developers.</p><p><strong>Discussion: </strong>The first development focus was geared towards the requirements of the central research FHIR TS for the federated FHIR infrastructure in Germany, and has proven to be very useful towards that goal. Opportunities for further improvement were identified in the validation process especially, as the validation messages are currently imprecise at times. The design of the application lends itself to the implementation of further use cases, such as direct connectivity to legacy systems for catalog conversion to FHIR.</p><p><strong>Conclusions: </strong>The developed BabelFSH tool is a novel, powerful and open-source approach to making heterogenous sources of terminological knowledge accessible as FHIR resources, thus aiding semantic interoperability in healthcare in general.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":" ","pages":"19"},"PeriodicalIF":2.0,"publicationDate":"2025-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12679771/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145633792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-10-28DOI: 10.1186/s13326-025-00340-7
Lars Vogt
Background: Ensuring the FAIRness (Findable, Accessible, Interoperable, Reusable) of data and metadata is an important goal in both research and industry. Knowledge graphs and ontologies have been central in achieving this goal, with interoperability of data and metadata receiving much attention. This paper argues that the emphasis on machine-actionability has overshadowed the essential need for human-actionability of data and metadata, and provides three examples that describe the lack of human-actionability within knowledge graphs.
Results: The paper propagates the incorporation of cognitive interoperability as another vital layer within the European Open Science Cloud Interoperability Framework and discusses the relation between human explorability of data and metadata and their cognitive interoperability. It suggests adding the CLEAR Principle to support the cognitive interoperability and human contextual explorability of data and metadata. The subsequent sections present the concept of semantic units, elucidating their important role in attaining CLEAR. Semantic units structure a knowledge graph into identifiable and semantically meaningful subgraphs, each represented with its own resource that constitutes a FAIR Digital Object (FDO) and that instantiates a corresponding FDO class. Various categories of FDOs are distinguished. Each semantic unit can be displayed in a user interface either as a mind-map-like graph or as natural language text.
Conclusions: Semantic units organize knowledge graphs into levels of representational granularity, distinct granularity trees, and diverse frames of reference. This organization supports the cognitive interoperability of data and metadata and facilitates their contextual explorability by humans. The development of innovative user interfaces enabled by FDOs that are based on semantic units would empower users to access, navigate, and explore information in CLEAR knowledge graphs with optimized efficiency.
{"title":"The CLEAR Principle: organizing data and metadata into semantically meaningful types of FAIR Digital Objects to increase their human explorability and cognitive interoperability.","authors":"Lars Vogt","doi":"10.1186/s13326-025-00340-7","DOIUrl":"10.1186/s13326-025-00340-7","url":null,"abstract":"<p><strong>Background: </strong>Ensuring the FAIRness (Findable, Accessible, Interoperable, Reusable) of data and metadata is an important goal in both research and industry. Knowledge graphs and ontologies have been central in achieving this goal, with interoperability of data and metadata receiving much attention. This paper argues that the emphasis on machine-actionability has overshadowed the essential need for human-actionability of data and metadata, and provides three examples that describe the lack of human-actionability within knowledge graphs.</p><p><strong>Results: </strong>The paper propagates the incorporation of cognitive interoperability as another vital layer within the European Open Science Cloud Interoperability Framework and discusses the relation between human explorability of data and metadata and their cognitive interoperability. It suggests adding the CLEAR Principle to support the cognitive interoperability and human contextual explorability of data and metadata. The subsequent sections present the concept of semantic units, elucidating their important role in attaining CLEAR. Semantic units structure a knowledge graph into identifiable and semantically meaningful subgraphs, each represented with its own resource that constitutes a FAIR Digital Object (FDO) and that instantiates a corresponding FDO class. Various categories of FDOs are distinguished. Each semantic unit can be displayed in a user interface either as a mind-map-like graph or as natural language text.</p><p><strong>Conclusions: </strong>Semantic units organize knowledge graphs into levels of representational granularity, distinct granularity trees, and diverse frames of reference. This organization supports the cognitive interoperability of data and metadata and facilitates their contextual explorability by humans. The development of innovative user interfaces enabled by FDOs that are based on semantic units would empower users to access, navigate, and explore information in CLEAR knowledge graphs with optimized efficiency.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"18"},"PeriodicalIF":2.0,"publicationDate":"2025-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12570660/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145389754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-15DOI: 10.1186/s13326-025-00338-1
Sathvik Guru Rao, Pranitha Rokkam, Bide Zhang, Astghik Sargsyan, Abish Kaladharan, Priya Sethumadhavan, Marc Jacobs, Martin Hofmann-Apitius, Alpha Tom Kodamullil
Background: Disease surveillance systems play a crucial role in monitoring and preventing infectious diseases. However, the current landscape, primarily focused on fragmented health data, poses challenges to contextual understanding and decision-making. This paper addresses this issue by proposing a semantic framework using ontologies to provide a unified data representation for seamless integration. The paper demonstrates the effectiveness of this approach using a case study of a COVID-19 incident at a football game in Italy.
Method: In this study, we undertook a comprehensive approach to gather and analyze data for the development of ontologies within the realm of pandemic intelligence. Multiple ontologies were meticulously crafted to cater to different domains related to pandemic intelligence, such as healthcare systems, mass gatherings, travel, and diseases. The ontologies were classified into top-level, domain, and application layers. This classification facilitated the development of a three-layered architecture, promoting reusability, and consistency in knowledge representation, and serving as the backbone of our semantic framework.
Result: Through the utilization of our semantic framework, we accomplished semantic enrichment of both structured and unstructured data. The integration of data from diverse sources involved mapping to ontology concepts, leading to the creation and storage of RDF triples in the triple store. This process resulted in the construction of linked data, ultimately enhancing the discoverability and accessibility of valuable insights. Furthermore, our anomaly detection algorithm effectively leveraged knowledge graphs extracted from the triple store, employing semantic relationships to discern patterns and anomalies within the data. Notably, this capability was exemplified by the identification of correlations between a football game and a COVID-19 event occurring at the same location and time.
Conclusion: The framework showcased its capability to address intricate, multi-domain queries and support diverse levels of detail. Additionally, it demonstrated proficiency in data analysis and visualization, generating graphs that depict patterns and trends; however, challenges related to ontology maintenance, alignment, and mapping must be addressed for the approach's optimal utilization.
{"title":"Three-layered semantic framework for public health intelligence.","authors":"Sathvik Guru Rao, Pranitha Rokkam, Bide Zhang, Astghik Sargsyan, Abish Kaladharan, Priya Sethumadhavan, Marc Jacobs, Martin Hofmann-Apitius, Alpha Tom Kodamullil","doi":"10.1186/s13326-025-00338-1","DOIUrl":"10.1186/s13326-025-00338-1","url":null,"abstract":"<p><strong>Background: </strong>Disease surveillance systems play a crucial role in monitoring and preventing infectious diseases. However, the current landscape, primarily focused on fragmented health data, poses challenges to contextual understanding and decision-making. This paper addresses this issue by proposing a semantic framework using ontologies to provide a unified data representation for seamless integration. The paper demonstrates the effectiveness of this approach using a case study of a COVID-19 incident at a football game in Italy.</p><p><strong>Method: </strong>In this study, we undertook a comprehensive approach to gather and analyze data for the development of ontologies within the realm of pandemic intelligence. Multiple ontologies were meticulously crafted to cater to different domains related to pandemic intelligence, such as healthcare systems, mass gatherings, travel, and diseases. The ontologies were classified into top-level, domain, and application layers. This classification facilitated the development of a three-layered architecture, promoting reusability, and consistency in knowledge representation, and serving as the backbone of our semantic framework.</p><p><strong>Result: </strong>Through the utilization of our semantic framework, we accomplished semantic enrichment of both structured and unstructured data. The integration of data from diverse sources involved mapping to ontology concepts, leading to the creation and storage of RDF triples in the triple store. This process resulted in the construction of linked data, ultimately enhancing the discoverability and accessibility of valuable insights. Furthermore, our anomaly detection algorithm effectively leveraged knowledge graphs extracted from the triple store, employing semantic relationships to discern patterns and anomalies within the data. Notably, this capability was exemplified by the identification of correlations between a football game and a COVID-19 event occurring at the same location and time.</p><p><strong>Conclusion: </strong>The framework showcased its capability to address intricate, multi-domain queries and support diverse levels of detail. Additionally, it demonstrated proficiency in data analysis and visualization, generating graphs that depict patterns and trends; however, challenges related to ontology maintenance, alignment, and mapping must be addressed for the approach's optimal utilization.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"17"},"PeriodicalIF":2.0,"publicationDate":"2025-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12439389/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145069053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-09-01DOI: 10.1186/s13326-025-00335-4
Adeel Ansari, Marisa Conte, Allen Flynn, Avanti Paturkar
Background: For clinical care and research, knowledge graphs with patient data can be enriched by extracting parameters from a knowledge graph and then using them as inputs to compute new patient features with pure functions. Systematic and transparent methods for enriching knowledge graphs with newly computed patient features are of interest. When enriching the patient data in knowledge graphs this way, existing ontologies and well-known data resource standards can help promote semantic interoperability.
Results: We developed and tested a new data processing pipeline for extracting, computing, and returning newly computed results to a large knowledge graph populated with electronic health record and patient survey data. We show that RDF data resource types already specified by Health Level 7's FHIR RDF effort can be programmatically validated and then used by this new data processing pipeline to represent newly derived patient-level features.
Conclusions: Knowledge graph technology can be augmented with standards-based semantic data processing pipelines for deploying and tracing the use of pure functions to derive new patient-level features from existing data. Semantic data processing pipelines enable research enterprises to report on new patient-level computations of interest with linked metadata that details the origin and background of every new computation.
{"title":"A prototype ETL pipeline that uses HL7 FHIR RDF resources when deploying pure functions to enrich knowledge graph patient data.","authors":"Adeel Ansari, Marisa Conte, Allen Flynn, Avanti Paturkar","doi":"10.1186/s13326-025-00335-4","DOIUrl":"https://doi.org/10.1186/s13326-025-00335-4","url":null,"abstract":"<p><strong>Background: </strong>For clinical care and research, knowledge graphs with patient data can be enriched by extracting parameters from a knowledge graph and then using them as inputs to compute new patient features with pure functions. Systematic and transparent methods for enriching knowledge graphs with newly computed patient features are of interest. When enriching the patient data in knowledge graphs this way, existing ontologies and well-known data resource standards can help promote semantic interoperability.</p><p><strong>Results: </strong>We developed and tested a new data processing pipeline for extracting, computing, and returning newly computed results to a large knowledge graph populated with electronic health record and patient survey data. We show that RDF data resource types already specified by Health Level 7's FHIR RDF effort can be programmatically validated and then used by this new data processing pipeline to represent newly derived patient-level features.</p><p><strong>Conclusions: </strong>Knowledge graph technology can be augmented with standards-based semantic data processing pipelines for deploying and tracing the use of pure functions to derive new patient-level features from existing data. Semantic data processing pipelines enable research enterprises to report on new patient-level computations of interest with linked metadata that details the origin and background of every new computation.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"16"},"PeriodicalIF":2.0,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12400713/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144955205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-08-21DOI: 10.1186/s13326-025-00337-2
Erik M van Mulligen, Rowan Parry, Johan van der Lei, Jan A Kors
Background: The eTRANSAFE project developed tools that support translational research. One of the challenges in this project was to combine preclinical and clinical data, which are coded with different terminologies and granularities, and are expressed as single pre-coordinated, clinical concepts and as combinations of preclinical concepts from different terminologies. This study develops and evaluates the Rosetta Stone approach, which maps combinations of preclinical concepts to clinical, pre-coordinated concepts, allowing for different levels of exactness of mappings.
Methods: Concepts from preclinical and clinical terminologies used in eTRANSAFE have been mapped to the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT). SNOMED CT acts as an intermediary terminology that provides the semantics to bridge between pre-coordinated clinical concepts and combinations of preclinical concepts with different levels of granularity. The mappings from clinical terminologies to SNOMED CT were taken from existing resources, while mappings from the preclinical terminologies to SNOMED CT were manually created. A coordination template defines the relation types that can be explored for a mapping and assigns a penalty score that reflects the inexactness of the mapping. A subset of 60 pre-coordinated concepts was mapped both with the Rosetta Stone semantic approach and with a lexical term matching approach. Both results were manually evaluated.
Results: A total of 34,308 concepts from preclinical terminologies (Histopathology terminology, Standard for Exchange of Nonclinical Data (SEND) code lists, Mouse Adult Gross Anatomy Ontology) and a clinical terminology (MedDRA) were mapped to SNOMED CT as the intermediary bridging terminology. A terminology service has been developed that returns dynamically the exact and inexact mappings between preclinical and clinical concepts. On the evaluation set, the precision of the mappings from the terminology service was high (95%), much higher than for lexical term matching (22%).
Conclusion: The Rosetta Stone approach uses a semantically rich intermediate terminology to map between pre-coordinated clinical concepts and a combination of preclinical concepts with different levels of exactness. The possibility to generate not only exact but also inexact mappings allows to relate larger amounts of preclinical and clinical data, which can be helpful in translational use cases.
{"title":"Mapping between clinical and preclinical terminologies: eTRANSAFE's Rosetta stone approach.","authors":"Erik M van Mulligen, Rowan Parry, Johan van der Lei, Jan A Kors","doi":"10.1186/s13326-025-00337-2","DOIUrl":"https://doi.org/10.1186/s13326-025-00337-2","url":null,"abstract":"<p><strong>Background: </strong>The eTRANSAFE project developed tools that support translational research. One of the challenges in this project was to combine preclinical and clinical data, which are coded with different terminologies and granularities, and are expressed as single pre-coordinated, clinical concepts and as combinations of preclinical concepts from different terminologies. This study develops and evaluates the Rosetta Stone approach, which maps combinations of preclinical concepts to clinical, pre-coordinated concepts, allowing for different levels of exactness of mappings.</p><p><strong>Methods: </strong>Concepts from preclinical and clinical terminologies used in eTRANSAFE have been mapped to the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT). SNOMED CT acts as an intermediary terminology that provides the semantics to bridge between pre-coordinated clinical concepts and combinations of preclinical concepts with different levels of granularity. The mappings from clinical terminologies to SNOMED CT were taken from existing resources, while mappings from the preclinical terminologies to SNOMED CT were manually created. A coordination template defines the relation types that can be explored for a mapping and assigns a penalty score that reflects the inexactness of the mapping. A subset of 60 pre-coordinated concepts was mapped both with the Rosetta Stone semantic approach and with a lexical term matching approach. Both results were manually evaluated.</p><p><strong>Results: </strong>A total of 34,308 concepts from preclinical terminologies (Histopathology terminology, Standard for Exchange of Nonclinical Data (SEND) code lists, Mouse Adult Gross Anatomy Ontology) and a clinical terminology (MedDRA) were mapped to SNOMED CT as the intermediary bridging terminology. A terminology service has been developed that returns dynamically the exact and inexact mappings between preclinical and clinical concepts. On the evaluation set, the precision of the mappings from the terminology service was high (95%), much higher than for lexical term matching (22%).</p><p><strong>Conclusion: </strong>The Rosetta Stone approach uses a semantically rich intermediate terminology to map between pre-coordinated clinical concepts and a combination of preclinical concepts with different levels of exactness. The possibility to generate not only exact but also inexact mappings allows to relate larger amounts of preclinical and clinical data, which can be helpful in translational use cases.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"15"},"PeriodicalIF":2.0,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12372267/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144955195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}