Pub Date : 2026-01-15DOI: 10.1093/database/baaf090
Runjia Ji, Yongxin Pan, Wei Lin
Magnetotactic bacteria (MTB) are a unique group of microorganisms capable of navigating along geomagnetic field lines through the biomineralization of intracellular magnetic nanocrystals called magnetosomes. While genomic analyses have substantially advanced our understanding of these predominantly uncultured microorganisms, MTB genomic data remain scattered across multiple databases with inconsistent quality profiles and incomplete metadata, limiting comprehensive research efforts. To address these challenges, we developed the Genomic Database of Magnetotactic Bacteria (GdbMTB), the first comprehensive, curated genomic resource dedicated to MTB. The current version of GdbMTB integrates 365 publicly available MTB genomes and their associated metadata. Through a standardized bioinformatics workflow, it provides detailed genome quality assessments, taxonomic classifications, and annotations of magnetosome biomineralization genes, ensuring reliable data for downstream analyses. The curated metadata, encompassing environmental context and publication details, offers crucial research background, enabling users to trace the provenance of each genome. Additionally, GdbMTB offers a suite of bioinformatics tools and an analysis pipeline to facilitate advanced MTB studies. GdbMTB enhances accessibility to MTB genomic data, thereby promoting interdisciplinary research in microbiology, geomicrobiology, and biomineralization studies. Database URL: https://www.gdbmtb.cn/.
{"title":"GdbMTB: a manually curated genomic database of magnetotactic bacteria.","authors":"Runjia Ji, Yongxin Pan, Wei Lin","doi":"10.1093/database/baaf090","DOIUrl":"https://doi.org/10.1093/database/baaf090","url":null,"abstract":"<p><p>Magnetotactic bacteria (MTB) are a unique group of microorganisms capable of navigating along geomagnetic field lines through the biomineralization of intracellular magnetic nanocrystals called magnetosomes. While genomic analyses have substantially advanced our understanding of these predominantly uncultured microorganisms, MTB genomic data remain scattered across multiple databases with inconsistent quality profiles and incomplete metadata, limiting comprehensive research efforts. To address these challenges, we developed the Genomic Database of Magnetotactic Bacteria (GdbMTB), the first comprehensive, curated genomic resource dedicated to MTB. The current version of GdbMTB integrates 365 publicly available MTB genomes and their associated metadata. Through a standardized bioinformatics workflow, it provides detailed genome quality assessments, taxonomic classifications, and annotations of magnetosome biomineralization genes, ensuring reliable data for downstream analyses. The curated metadata, encompassing environmental context and publication details, offers crucial research background, enabling users to trace the provenance of each genome. Additionally, GdbMTB offers a suite of bioinformatics tools and an analysis pipeline to facilitate advanced MTB studies. GdbMTB enhances accessibility to MTB genomic data, thereby promoting interdisciplinary research in microbiology, geomicrobiology, and biomineralization studies. Database URL: https://www.gdbmtb.cn/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2026 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145970396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-15DOI: 10.1093/database/baaf084
Samiksha Maurya, Jaidev Sharma, Amit Mandoli, Vibhor Kumar
Enhancers act as cis-regulatory elements, controlling the expression of genes according to developmental stages, external signalling, and cell states. Recent studies have shown the impact of perturbation of enhancer activity on expression of genes and cell properties. However, at the same time, perturbation of many enhancers does not always show substantial effect on the expression of genes or properties of cells. Hence, there is a need to identify enhancers that can be effectively targeted for therapeutics and understanding regulation. Therefore, a comprehensive resource containing information on the effect of knockdown of enhancers is needed. Here, we introduce a database ePerturbDB, which provides resources to search the effects of 83 743 experimental perturbations of enhancers. The ePerturbDB database allows users to compare their genomic loci to the list of perturbed enhancers to know their potential effect. It also provides enriched genes and ontology terms for query enhancer location overlapping with a known experimentally perturbed enhancer list. Thus, the resource and tool in ePerturbDB can help users build hypotheses and design experiments to find effective enhancer-based therapeutics and inferences about the regulation of cell states. Database URL: http://reggen.iiitd.edu.in:1207/ePerturbDB-html/.
{"title":"ePerturbDB: enhancer's experimental perturbation database.","authors":"Samiksha Maurya, Jaidev Sharma, Amit Mandoli, Vibhor Kumar","doi":"10.1093/database/baaf084","DOIUrl":"https://doi.org/10.1093/database/baaf084","url":null,"abstract":"<p><p>Enhancers act as cis-regulatory elements, controlling the expression of genes according to developmental stages, external signalling, and cell states. Recent studies have shown the impact of perturbation of enhancer activity on expression of genes and cell properties. However, at the same time, perturbation of many enhancers does not always show substantial effect on the expression of genes or properties of cells. Hence, there is a need to identify enhancers that can be effectively targeted for therapeutics and understanding regulation. Therefore, a comprehensive resource containing information on the effect of knockdown of enhancers is needed. Here, we introduce a database ePerturbDB, which provides resources to search the effects of 83 743 experimental perturbations of enhancers. The ePerturbDB database allows users to compare their genomic loci to the list of perturbed enhancers to know their potential effect. It also provides enriched genes and ontology terms for query enhancer location overlapping with a known experimentally perturbed enhancer list. Thus, the resource and tool in ePerturbDB can help users build hypotheses and design experiments to find effective enhancer-based therapeutics and inferences about the regulation of cell states. Database URL: http://reggen.iiitd.edu.in:1207/ePerturbDB-html/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2026 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145970457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-15DOI: 10.1093/database/baaf088
Hao Li, Jiani Hu, Jie Song, Wei Zhou
Synthetic biology part discovery faces significant challenges due to inconsistent data organization and limited semantic search capabilities across existing repositories. We developed SynVectorDB, an embedding-based retrieval system that addresses these limitations through methodological innovations in data integration and AI-driven semantic search. Our approach integrates 19 850 biological parts from multiple sources (Addgene, iGEM Registry, laboratory collections), implementing systematic curation protocols that resulted in 7656 parts achieving verified status through literature-based validation and reliability assessment. We introduce a novel three-level hierarchical classification system organizing parts into functionally coherent categories (DNA Elements, RNA Elements, Coding Sequences, and Application Constructs) with detailed subcategorization. The core technical contribution employs BGE-M3 multilingual embeddings within a scalable vector database architecture to enable semantic similarity matching that significantly outperforms keyword-based retrieval methods. Standardized curation workflows enhance data comparability and search accuracy across heterogeneous sources. The dual deployment architecture ensures high performance through cloud services while maintaining open-source accessibility and deployment flexibility. The system maintains SBOL3 compatibility while providing innovative solutions for biological part organization and retrieval. Database URL: SynVectorDB is available in multiple deployment modes: web interface (https://svdb.sjtu.bio), local installation and source code (https://github.com/AilurusBio/synbio-parts-db), and MCP server integration for AI assistants (https://www.npmjs.com/package/synvectordb).
{"title":"SynVectorDB: embedding-based retrieval system for synthetic biology parts.","authors":"Hao Li, Jiani Hu, Jie Song, Wei Zhou","doi":"10.1093/database/baaf088","DOIUrl":"https://doi.org/10.1093/database/baaf088","url":null,"abstract":"<p><p>Synthetic biology part discovery faces significant challenges due to inconsistent data organization and limited semantic search capabilities across existing repositories. We developed SynVectorDB, an embedding-based retrieval system that addresses these limitations through methodological innovations in data integration and AI-driven semantic search. Our approach integrates 19 850 biological parts from multiple sources (Addgene, iGEM Registry, laboratory collections), implementing systematic curation protocols that resulted in 7656 parts achieving verified status through literature-based validation and reliability assessment. We introduce a novel three-level hierarchical classification system organizing parts into functionally coherent categories (DNA Elements, RNA Elements, Coding Sequences, and Application Constructs) with detailed subcategorization. The core technical contribution employs BGE-M3 multilingual embeddings within a scalable vector database architecture to enable semantic similarity matching that significantly outperforms keyword-based retrieval methods. Standardized curation workflows enhance data comparability and search accuracy across heterogeneous sources. The dual deployment architecture ensures high performance through cloud services while maintaining open-source accessibility and deployment flexibility. The system maintains SBOL3 compatibility while providing innovative solutions for biological part organization and retrieval. Database URL: SynVectorDB is available in multiple deployment modes: web interface (https://svdb.sjtu.bio), local installation and source code (https://github.com/AilurusBio/synbio-parts-db), and MCP server integration for AI assistants (https://www.npmjs.com/package/synvectordb).</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2026 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145970546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-15DOI: 10.1093/database/baaf083
Marios Tomazou, Marilena M Bourdakou, Eleni Nicolaidou, Grigoris Georgiou, Kyriaki Savva, Efi Athieniti, Styliana Menelaou, Sotiroula Afxenti, George M Spyrou
H-SPAR DB is a comprehensive database designed to support space health research by providing a unified platform for data integration, analysis, and interpretation. The database simplifies the complex workflows associated with spaceflight-related biology studies by combining curated molecular lists, transcriptomic datasets from NASA's GeneLab, and user-uploaded data into a streamlined, user-friendly interface. H-SPAR DB enables researchers to perform differential expression analysis, set operations, and association analyses while also generating integrative knowledge graphs around a space-related biological theme. The platform reduces the time required for data gathering and processing by offering a single platform for data exploration, analysis, and visualization. By integrating interactive visualizations and data tables, H-SPAR DB facilitates the interpretation of results, ultimately enhancing the efficiency of space biology research and fostering discoveries that address human health challenges in space. Researchers can access H-SPAR DB freely at https://bioinformatics.cing.ac.cy/H-SPARDB/ without login or other requirements.
{"title":"H-SPAR DB: human spaceflight platform for analysis and research-an integrative omics database for space health.","authors":"Marios Tomazou, Marilena M Bourdakou, Eleni Nicolaidou, Grigoris Georgiou, Kyriaki Savva, Efi Athieniti, Styliana Menelaou, Sotiroula Afxenti, George M Spyrou","doi":"10.1093/database/baaf083","DOIUrl":"https://doi.org/10.1093/database/baaf083","url":null,"abstract":"<p><p>H-SPAR DB is a comprehensive database designed to support space health research by providing a unified platform for data integration, analysis, and interpretation. The database simplifies the complex workflows associated with spaceflight-related biology studies by combining curated molecular lists, transcriptomic datasets from NASA's GeneLab, and user-uploaded data into a streamlined, user-friendly interface. H-SPAR DB enables researchers to perform differential expression analysis, set operations, and association analyses while also generating integrative knowledge graphs around a space-related biological theme. The platform reduces the time required for data gathering and processing by offering a single platform for data exploration, analysis, and visualization. By integrating interactive visualizations and data tables, H-SPAR DB facilitates the interpretation of results, ultimately enhancing the efficiency of space biology research and fostering discoveries that address human health challenges in space. Researchers can access H-SPAR DB freely at https://bioinformatics.cing.ac.cy/H-SPARDB/ without login or other requirements.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2026 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145970438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-08DOI: 10.1093/database/baaf041
{"title":"Correction to: GymnoTOA-db: a database and application to optimize functional annotation in gymnosperms.","authors":"","doi":"10.1093/database/baaf041","DOIUrl":"10.1093/database/baaf041","url":null,"abstract":"","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12560801/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144583336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-07-03DOI: 10.1093/database/baaf025
Fang-Yi Su, Gia-Han Ngo, Ben Phan, Jung-Hsien Chiang
Biomedical relation extraction often involves datasets with implicit constraints, where structural, syntactic, or semantic rules must be strictly preserved to maintain data integrity. Traditional data augmentation techniques struggle in these scenarios, as they risk violating domain-specific constraints. To address these challenges, we propose CAS (Constrained Augmentation and Semantic-Quality), a novel framework designed for constrained datasets. CAS employs large language models to generate diverse data variations while adhering to predefined rules, and it integrates the SemQ Filter. This self-evaluation mechanism ensures the quality and consistency of augmented data by filtering out noisy or semantically incongruent samples. Although CAS is primarily designed for biomedical relation extraction, its versatile design extends its applicability to tasks with implicit constraints, such as code completion, mathematical reasoning, and information retrieval. Through extensive experiments across multiple domains, CAS demonstrates its ability to enhance model performance by maintaining structural fidelity and semantic accuracy in augmented data. These results highlight the potential of CAS not only in advancing biomedical NLP research but also in addressing data augmentation challenges in diverse constrained-task settings within natural language processing. Database URL: https://github.com/ngogiahan149/CAS.
{"title":"CAS: enhancing implicit constrained data augmentation with semantic enrichment for biomedical relation extraction and beyond.","authors":"Fang-Yi Su, Gia-Han Ngo, Ben Phan, Jung-Hsien Chiang","doi":"10.1093/database/baaf025","DOIUrl":"10.1093/database/baaf025","url":null,"abstract":"<p><p>Biomedical relation extraction often involves datasets with implicit constraints, where structural, syntactic, or semantic rules must be strictly preserved to maintain data integrity. Traditional data augmentation techniques struggle in these scenarios, as they risk violating domain-specific constraints. To address these challenges, we propose CAS (Constrained Augmentation and Semantic-Quality), a novel framework designed for constrained datasets. CAS employs large language models to generate diverse data variations while adhering to predefined rules, and it integrates the SemQ Filter. This self-evaluation mechanism ensures the quality and consistency of augmented data by filtering out noisy or semantically incongruent samples. Although CAS is primarily designed for biomedical relation extraction, its versatile design extends its applicability to tasks with implicit constraints, such as code completion, mathematical reasoning, and information retrieval. Through extensive experiments across multiple domains, CAS demonstrates its ability to enhance model performance by maintaining structural fidelity and semantic accuracy in augmented data. These results highlight the potential of CAS not only in advancing biomedical NLP research but also in addressing data augmentation challenges in diverse constrained-task settings within natural language processing. Database URL: https://github.com/ngogiahan149/CAS.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12224179/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144552558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-30DOI: 10.1093/database/baaf027
Muhammad Nabeel Asim, Tayyaba Asif, Faiza Hassan, Andreas Dengel
Protein sequence analysis examines the order of amino acids within protein sequences to unlock diverse types of a wealth of knowledge about biological processes and genetic disorders. It helps in forecasting disease susceptibility by finding unique protein signatures, or biomarkers that are linked to particular disease states. Protein Sequence analysis through wet-lab experiments is expensive, time-consuming and error prone. To facilitate large-scale proteomics sequence analysis, the biological community is striving for utilizing AI competence for transitioning from wet-lab to computer aided applications. However, Proteomics and AI are two distinct fields and development of AI-driven protein sequence analysis applications requires knowledge of both domains. To bridge the gap between both fields, various review articles have been written. However, these articles focus revolves around few individual tasks or specific applications rather than providing a comprehensive overview about wide tasks and applications. Following the need of a comprehensive literature that presents a holistic view of wide array of tasks and applications, contributions of this manuscript are manifold: It bridges the gap between Proteomics and AI fields by presenting a comprehensive array of AI-driven applications for 63 distinct protein sequence analysis tasks. It equips AI researchers by facilitating biological foundations of 63 protein sequence analysis tasks. It enhances development of AI-driven protein sequence analysis applications by providing comprehensive details of 68 protein databases. It presents a rich data landscape, encompassing 627 benchmark datasets of 63 diverse protein sequence analysis tasks. It highlights the utilization of 25 unique word embedding methods and 13 language models in AI-driven protein sequence analysis applications. It accelerates the development of AI-driven applications by facilitating current state-of-the-art performances across 63 protein sequence analysis tasks.
{"title":"Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models.","authors":"Muhammad Nabeel Asim, Tayyaba Asif, Faiza Hassan, Andreas Dengel","doi":"10.1093/database/baaf027","DOIUrl":"10.1093/database/baaf027","url":null,"abstract":"<p><p>Protein sequence analysis examines the order of amino acids within protein sequences to unlock diverse types of a wealth of knowledge about biological processes and genetic disorders. It helps in forecasting disease susceptibility by finding unique protein signatures, or biomarkers that are linked to particular disease states. Protein Sequence analysis through wet-lab experiments is expensive, time-consuming and error prone. To facilitate large-scale proteomics sequence analysis, the biological community is striving for utilizing AI competence for transitioning from wet-lab to computer aided applications. However, Proteomics and AI are two distinct fields and development of AI-driven protein sequence analysis applications requires knowledge of both domains. To bridge the gap between both fields, various review articles have been written. However, these articles focus revolves around few individual tasks or specific applications rather than providing a comprehensive overview about wide tasks and applications. Following the need of a comprehensive literature that presents a holistic view of wide array of tasks and applications, contributions of this manuscript are manifold: It bridges the gap between Proteomics and AI fields by presenting a comprehensive array of AI-driven applications for 63 distinct protein sequence analysis tasks. It equips AI researchers by facilitating biological foundations of 63 protein sequence analysis tasks. It enhances development of AI-driven protein sequence analysis applications by providing comprehensive details of 68 protein databases. It presents a rich data landscape, encompassing 627 benchmark datasets of 63 diverse protein sequence analysis tasks. It highlights the utilization of 25 unique word embedding methods and 13 language models in AI-driven protein sequence analysis applications. It accelerates the development of AI-driven applications by facilitating current state-of-the-art performances across 63 protein sequence analysis tasks.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12125710/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144191613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The paper describes our biomedical relation extraction system, which is designed to participate in the BioCreative VIII challenge Track 1: BioRED Track, which emphasizes the relation extraction from biomedical literature. Our system employs an ensemble learning method, leveraging the PubTator API in conjunction with multiple pretrained bidirectional encoder representations from transformer (BERT) models. Various preprocessing inputs are incorporated, encompassing prompt questions, entity ID pairs, and co-occurrence contexts. To enhance model comprehension, special tokens and boundary tags are incorporated. Specifically, we utilize PubMedBERT alongside the Max Rule ensemble learning mechanism to amalgamate outputs from diverse classifiers. Our findings surpass the established benchmark score, thereby providing a robust benchmark for evaluating performance in this task. Moreover, our study introduces and demonstrates the effectiveness of a data-centric approach, emphasizing the significance of prioritizing high-quality data instances in enhancing model performance and robustness.
{"title":"Enhancing biomedical relation extraction through data-centric and preprocessing-robust ensemble learning approach.","authors":"Wilailack Meesawad, Jen-Chieh Han, Chun-Yu Hsueh, Yu Zhang, Hsi-Chuan Hung, Richard Tzong-Han Tsai","doi":"10.1093/database/baae127","DOIUrl":"10.1093/database/baae127","url":null,"abstract":"<p><p>The paper describes our biomedical relation extraction system, which is designed to participate in the BioCreative VIII challenge Track 1: BioRED Track, which emphasizes the relation extraction from biomedical literature. Our system employs an ensemble learning method, leveraging the PubTator API in conjunction with multiple pretrained bidirectional encoder representations from transformer (BERT) models. Various preprocessing inputs are incorporated, encompassing prompt questions, entity ID pairs, and co-occurrence contexts. To enhance model comprehension, special tokens and boundary tags are incorporated. Specifically, we utilize PubMedBERT alongside the Max Rule ensemble learning mechanism to amalgamate outputs from diverse classifiers. Our findings surpass the established benchmark score, thereby providing a robust benchmark for evaluating performance in this task. Moreover, our study introduces and demonstrates the effectiveness of a data-centric approach, emphasizing the significance of prioritizing high-quality data instances in enhancing model performance and robustness.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12097206/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-20DOI: 10.1093/database/baaf008
Alexander J Kellmann, Sander van den Hoek, Max Postema, W T Kars Maassen, Brenda S Hijmans, Marije A van der Geest, K Joeri van der Velde, Esther J van Enckevort, Morris A Swertz
We previously described Graph2VR, a prototype that enables researchers to use virtual reality (VR) to explore and navigate through Linked Data graphs using SPARQL queries (see https://doi.org/10.1093/database/baae008). Here we evaluate the use of Graph2VR in three realistic life science use cases. The first use case visualizes metadata from large-scale multi-center cohort studies across Europe and Canada via the EUCAN Connect catalogue. The second use case involves a set of genomic data from synthetic rare disease patients, which was processed through the Variant Interpretation Pipeline and then converted into Resource Description Format for visualization. The third use case involves enriching a graph with additional information, in this case, the Dutch Anatomical Therapeutic Chemical code Ontology with the DrugID from Drugbank. These examples collectively showcase Graph2VR's potential for data exploration and enrichment, as well as some of its limitations. We conclude that the endless three-dimensional space provided by VR indeed shows much potential for the navigation of very large knowledge graphs, and we provide recommendations for data preparation and VR tooling moving forward. Database URL: https://doi.org/10.1093/database/baaf008.
{"title":"An exploratory study combining Virtual Reality and Semantic Web for life science research using Graph2VR.","authors":"Alexander J Kellmann, Sander van den Hoek, Max Postema, W T Kars Maassen, Brenda S Hijmans, Marije A van der Geest, K Joeri van der Velde, Esther J van Enckevort, Morris A Swertz","doi":"10.1093/database/baaf008","DOIUrl":"https://doi.org/10.1093/database/baaf008","url":null,"abstract":"<p><p>We previously described Graph2VR, a prototype that enables researchers to use virtual reality (VR) to explore and navigate through Linked Data graphs using SPARQL queries (see https://doi.org/10.1093/database/baae008). Here we evaluate the use of Graph2VR in three realistic life science use cases. The first use case visualizes metadata from large-scale multi-center cohort studies across Europe and Canada via the EUCAN Connect catalogue. The second use case involves a set of genomic data from synthetic rare disease patients, which was processed through the Variant Interpretation Pipeline and then converted into Resource Description Format for visualization. The third use case involves enriching a graph with additional information, in this case, the Dutch Anatomical Therapeutic Chemical code Ontology with the DrugID from Drugbank. These examples collectively showcase Graph2VR's potential for data exploration and enrichment, as well as some of its limitations. We conclude that the endless three-dimensional space provided by VR indeed shows much potential for the navigation of very large knowledge graphs, and we provide recommendations for data preparation and VR tooling moving forward. Database URL: https://doi.org/10.1093/database/baaf008.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126211","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-05-20DOI: 10.1093/database/baaf008
Alexander J Kellmann, Sander van den Hoek, Max Postema, W T Kars Maassen, Brenda S Hijmans, Marije A van der Geest, K Joeri van der Velde, Esther J van Enckevort, Morris A Swertz
We previously described Graph2VR, a prototype that enables researchers to use virtual reality (VR) to explore and navigate through Linked Data graphs using SPARQL queries (see https://doi.org/10.1093/database/baae008). Here we evaluate the use of Graph2VR in three realistic life science use cases. The first use case visualizes metadata from large-scale multi-center cohort studies across Europe and Canada via the EUCAN Connect catalogue. The second use case involves a set of genomic data from synthetic rare disease patients, which was processed through the Variant Interpretation Pipeline and then converted into Resource Description Format for visualization. The third use case involves enriching a graph with additional information, in this case, the Dutch Anatomical Therapeutic Chemical code Ontology with the DrugID from Drugbank. These examples collectively showcase Graph2VR's potential for data exploration and enrichment, as well as some of its limitations. We conclude that the endless three-dimensional space provided by VR indeed shows much potential for the navigation of very large knowledge graphs, and we provide recommendations for data preparation and VR tooling moving forward. Database URL: https://doi.org/10.1093/database/baaf008.
{"title":"An exploratory study combining Virtual Reality and Semantic Web for life science research using Graph2VR.","authors":"Alexander J Kellmann, Sander van den Hoek, Max Postema, W T Kars Maassen, Brenda S Hijmans, Marije A van der Geest, K Joeri van der Velde, Esther J van Enckevort, Morris A Swertz","doi":"10.1093/database/baaf008","DOIUrl":"10.1093/database/baaf008","url":null,"abstract":"<p><p>We previously described Graph2VR, a prototype that enables researchers to use virtual reality (VR) to explore and navigate through Linked Data graphs using SPARQL queries (see https://doi.org/10.1093/database/baae008). Here we evaluate the use of Graph2VR in three realistic life science use cases. The first use case visualizes metadata from large-scale multi-center cohort studies across Europe and Canada via the EUCAN Connect catalogue. The second use case involves a set of genomic data from synthetic rare disease patients, which was processed through the Variant Interpretation Pipeline and then converted into Resource Description Format for visualization. The third use case involves enriching a graph with additional information, in this case, the Dutch Anatomical Therapeutic Chemical code Ontology with the DrugID from Drugbank. These examples collectively showcase Graph2VR's potential for data exploration and enrichment, as well as some of its limitations. We conclude that the endless three-dimensional space provided by VR indeed shows much potential for the navigation of very large knowledge graphs, and we provide recommendations for data preparation and VR tooling moving forward. Database URL: https://doi.org/10.1093/database/baaf008.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12090995/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144110024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}