{"title":"Enriched knowledge representation in biological fields: a case study of literature-based discovery in Alzheimer's disease.","authors":"Yiyuan Pu, Daniel Beck, Karin Verspoor","doi":"10.1186/s13326-025-00328-3","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>In Literature-based Discovery (LBD), Swanson's original ABC model brought together isolated public knowledge statements and assembled them to infer putative hypotheses via logical connections. Modern LBD studies that scale up this approach through automation typically rely on a simple entity-based knowledge graph with co-occurrences and/or semantic triples as basic building blocks. However, our analysis of a knowledge graph constructed for a recent LBD system reveals limitations arising from such pairwise representations, which further negatively impact knowledge inference. Using LBD as the context and motivation in this work, we explore limitations of using pairwise relationships only as knowledge representation in knowledge graphs, and we identify impacts of these limitations on knowledge inference. We argue that enhanced knowledge representation is beneficial for biological knowledge representation in general, as well as for both the quality and the specificity of hypotheses proposed with LBD.</p><p><strong>Results: </strong>Based on a systematic analysis of one co-occurrence-based LBD system focusing on Alzheimer's Disease, we identify 7 types of limitations arising from the exclusive use of pairwise relationships in a standard knowledge graph-including the need to capture more than two entities interacting together in a single event-and 3 types of negative impacts on knowledge inferred with the graph-Experimentally infeasible hypotheses, Literature-inconsistent hypotheses, and Oversimplified hypotheses explanations. We also present an indicative distribution of different types of relationships. Pairwise relationships are an essential component in representation frameworks for knowledge discovery. However, only 20% of discoveries are perfectly represented with pairwise relationships alone. 73% require a combination of pairwise relationships and nested relationships. The remaining 7% are represented with pairwise relationships, nested relationships, and hypergraphs.</p><p><strong>Conclusion: </strong>We argue that the standard entity pair-based knowledge graph, while essential for representing basic binary relations, results in important limitations for comprehensive biological knowledge representation and impacts downstream tasks such as proposing meaningful discoveries in LBD. These limitations can be mitigated by integrating more semantically complex knowledge representation strategies, including capturing collective interactions and allowing for nested entities. The use of more sophisticated knowledge representation will benefit biological fields with more expressive knowledge graphs. Downstream tasks, such as LBD, can benefit from richer representations as well, allowing for generation of implicit knowledge discoveries and explanations for disease diagnosis, treatment, and mechanism that are more biologically meaningful.</p>","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"16 1","pages":"3"},"PeriodicalIF":1.6000,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11924609/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Semantics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1186/s13326-025-00328-3","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: In Literature-based Discovery (LBD), Swanson's original ABC model brought together isolated public knowledge statements and assembled them to infer putative hypotheses via logical connections. Modern LBD studies that scale up this approach through automation typically rely on a simple entity-based knowledge graph with co-occurrences and/or semantic triples as basic building blocks. However, our analysis of a knowledge graph constructed for a recent LBD system reveals limitations arising from such pairwise representations, which further negatively impact knowledge inference. Using LBD as the context and motivation in this work, we explore limitations of using pairwise relationships only as knowledge representation in knowledge graphs, and we identify impacts of these limitations on knowledge inference. We argue that enhanced knowledge representation is beneficial for biological knowledge representation in general, as well as for both the quality and the specificity of hypotheses proposed with LBD.
Results: Based on a systematic analysis of one co-occurrence-based LBD system focusing on Alzheimer's Disease, we identify 7 types of limitations arising from the exclusive use of pairwise relationships in a standard knowledge graph-including the need to capture more than two entities interacting together in a single event-and 3 types of negative impacts on knowledge inferred with the graph-Experimentally infeasible hypotheses, Literature-inconsistent hypotheses, and Oversimplified hypotheses explanations. We also present an indicative distribution of different types of relationships. Pairwise relationships are an essential component in representation frameworks for knowledge discovery. However, only 20% of discoveries are perfectly represented with pairwise relationships alone. 73% require a combination of pairwise relationships and nested relationships. The remaining 7% are represented with pairwise relationships, nested relationships, and hypergraphs.
Conclusion: We argue that the standard entity pair-based knowledge graph, while essential for representing basic binary relations, results in important limitations for comprehensive biological knowledge representation and impacts downstream tasks such as proposing meaningful discoveries in LBD. These limitations can be mitigated by integrating more semantically complex knowledge representation strategies, including capturing collective interactions and allowing for nested entities. The use of more sophisticated knowledge representation will benefit biological fields with more expressive knowledge graphs. Downstream tasks, such as LBD, can benefit from richer representations as well, allowing for generation of implicit knowledge discoveries and explanations for disease diagnosis, treatment, and mechanism that are more biologically meaningful.
期刊介绍:
Journal of Biomedical Semantics addresses issues of semantic enrichment and semantic processing in the biomedical domain. The scope of the journal covers two main areas:
Infrastructure for biomedical semantics: focusing on semantic resources and repositories, meta-data management and resource description, knowledge representation and semantic frameworks, the Biomedical Semantic Web, and semantic interoperability.
Semantic mining, annotation, and analysis: focusing on approaches and applications of semantic resources; and tools for investigation, reasoning, prediction, and discoveries in biomedicine.