Database: The Journal of Biological Databases and Curation最新文献

Post-composing ontology terms for efficient phenotyping in plant breeding. 后期合成本体术语，实现植物育种中的高效表型。

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-03-21 DOI: 10.1093/database/baaf020

Naama Menda, Bryan J Ellerbrock, Christiano C Simoes, Srikanth Kumar Karaikal, Christine Nyaga, Mirella Flores-Gonzalez, Isaak Y Tecle, David Lyon, Afolabi Agbona, Paterne A Agre, Prasad Peteti, Violet Akech, Amos Asiimwe, Eglantine Fauvelle, Karima Meghar, Thierry Tran, Dominique Dufour, Laurel Cooper, Marie-Angélique Laporte, Elizabeth Arnaud, Lukas A Mueller

Ontologies are widely used in databases to standardize data, improving data quality, integration, and ease of comparison. Within ontologies tailored to diverse use cases, post-composing user-defined terms reconciles the demands for standardization on the one hand and flexibility on the other. In many instances of Breedbase, a digital ecosystem for plant breeding designed for genomic selection, the goal is to capture phenotypic data using highly curated and rigorous crop ontologies, while adapting to the specific requirements of plant breeders to record data quickly and efficiently. For example, post-composing enables users to tailor ontology terms to suit specific and granular use cases such as repeated measurements on different plant parts and special sample preparation techniques. To achieve this, we have implemented a post-composing tool based on orthogonal ontologies providing users with the ability to introduce additional levels of phenotyping granularity tailored to unique experimental designs. Post-composed terms are designed to be reused by all breeding programs within a Breedbase instance but are not exported to the crop reference ontologies. Breedbase users can post-compose terms across various categories, such as plant anatomy, treatments, temporal events, and breeding cycles, and, as a result, generate highly specific terms for more accurate phenotyping.

本体论被广泛应用于数据库中，以实现数据标准化，提高数据质量、集成度和比较便利性。在为不同用例量身定制的本体中，后期合成用户定义的术语可以兼顾标准化需求和灵活性需求。Breedbase 是一个为基因组选育而设计的植物育种数字生态系统，在许多情况下，其目标是使用经过高度整理和严格定义的作物本体来捕获表型数据，同时适应植物育种者快速高效记录数据的具体要求。例如，通过后期合成，用户可以定制本体术语，以适应特定的细粒度使用情况，如对不同植物部位的重复测量和特殊的样品制备技术。为此，我们在正交本体论的基础上开发了一种后期合成工具，使用户能够根据独特的实验设计引入额外的表型粒度。后期合成的术语可被Breedbase实例中的所有育种计划重复使用，但不会导出到作物参考本体中。Breedbase 用户可在植物解剖学、处理、时间事件和育种周期等不同类别中后组合术语，从而生成高度具体的术语，实现更准确的表型分析。

{"title":"Post-composing ontology terms for efficient phenotyping in plant breeding.","authors":"Naama Menda, Bryan J Ellerbrock, Christiano C Simoes, Srikanth Kumar Karaikal, Christine Nyaga, Mirella Flores-Gonzalez, Isaak Y Tecle, David Lyon, Afolabi Agbona, Paterne A Agre, Prasad Peteti, Violet Akech, Amos Asiimwe, Eglantine Fauvelle, Karima Meghar, Thierry Tran, Dominique Dufour, Laurel Cooper, Marie-Angélique Laporte, Elizabeth Arnaud, Lukas A Mueller","doi":"10.1093/database/baaf020","DOIUrl":"10.1093/database/baaf020","url":null,"abstract":"Ontologies are widely used in databases to standardize data, improving data quality, integration, and ease of comparison. Within ontologies tailored to diverse use cases, post-composing user-defined terms reconciles the demands for standardization on the one hand and flexibility on the other. In many instances of Breedbase, a digital ecosystem for plant breeding designed for genomic selection, the goal is to capture phenotypic data using highly curated and rigorous crop ontologies, while adapting to the specific requirements of plant breeders to record data quickly and efficiently. For example, post-composing enables users to tailor ontology terms to suit specific and granular use cases such as repeated measurements on different plant parts and special sample preparation techniques. To achieve this, we have implemented a post-composing tool based on orthogonal ontologies providing users with the ability to introduce additional levels of phenotyping granularity tailored to unique experimental designs. Post-composed terms are designed to be reused by all breeding programs within a Breedbase instance but are not exported to the crop reference ontologies. Breedbase users can post-compose terms across various categories, such as plant anatomy, treatments, temporal events, and breeding cycles, and, as a result, generate highly specific terms for more accurate phenotyping.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11927528/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143673564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A comprehensive experimental comparison between federated and centralized learning.

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-03-19 DOI: 10.1093/database/baaf016

Swier Garst, Julian Dekker, Marcel Reinders

Federated learning is an upcoming machine learning paradigm which allows data from multiple sources to be used for training of classifiers without the data leaving the source it originally resides. This can be highly valuable for use cases such as medical research, where gathering data at a central location can be quite complicated due to privacy and legal concerns of the data. In such cases, federated learning has the potential to vastly speed up the research cycle. Although federated and central learning have been compared from a theoretical perspective, an extensive experimental comparison of performances and learning behavior still lacks. We have performed a comprehensive experimental comparison between federated and centralized learning. We evaluated various classifiers on various datasets exploring influences of different sample distributions as well as different class distributions across the clients. The results show similar performances under a wide variety of settings between the federated and central learning strategies. Federated learning is able to deal with various imbalances in the data distributions. It is sensitive to batch effects between different datasets when they coincide with location, similar to central learning, but this setting might go unobserved more easily. Federated learning seems to be robust to various challenges such as skewed data distributions, high data dimensionality, multiclass problems, and complex models. Taken together, the insights from our comparison gives much promise for applying federated learning as an alternative to sharing data. Code for reproducing the results in this work can be found at: https://github.com/swiergarst/FLComparison.

联盟学习是一种即将出现的机器学习范式，它允许将多个来源的数据用于训练分类器，而无需离开数据的原始来源。这对于医学研究等用例非常有价值，因为在医学研究中，由于数据的隐私和法律问题，在中央位置收集数据可能会相当复杂。在这种情况下，联合学习有可能大大加快研究周期。虽然联合学习和集中学习已经从理论角度进行了比较，但仍然缺乏对性能和学习行为的广泛实验比较。我们对联合学习和集中学习进行了全面的实验比较。我们对各种数据集上的分类器进行了评估，探讨了不同样本分布以及不同客户机上不同类别分布的影响。结果表明，联合学习和集中学习策略在各种设置下的性能相似。联合学习能够处理数据分布中的各种不平衡。当不同数据集的位置重合时，它对不同数据集之间的批次效应很敏感，这一点与集中学习类似，但这种情况可能更容易被忽略。联盟学习似乎对各种挑战都很稳健，例如偏斜数据分布、高数据维度、多类问题和复杂模型。综合来看，我们的比较结果为联合学习作为数据共享的替代方案提供了广阔的应用前景。转载本研究成果的代码请访问：https://github.com/swiergarst/FLComparison。

{"title":"A comprehensive experimental comparison between federated and centralized learning.","authors":"Swier Garst, Julian Dekker, Marcel Reinders","doi":"10.1093/database/baaf016","DOIUrl":"10.1093/database/baaf016","url":null,"abstract":"Federated learning is an upcoming machine learning paradigm which allows data from multiple sources to be used for training of classifiers without the data leaving the source it originally resides. This can be highly valuable for use cases such as medical research, where gathering data at a central location can be quite complicated due to privacy and legal concerns of the data. In such cases, federated learning has the potential to vastly speed up the research cycle. Although federated and central learning have been compared from a theoretical perspective, an extensive experimental comparison of performances and learning behavior still lacks. We have performed a comprehensive experimental comparison between federated and centralized learning. We evaluated various classifiers on various datasets exploring influences of different sample distributions as well as different class distributions across the clients. The results show similar performances under a wide variety of settings between the federated and central learning strategies. Federated learning is able to deal with various imbalances in the data distributions. It is sensitive to batch effects between different datasets when they coincide with location, similar to central learning, but this setting might go unobserved more easily. Federated learning seems to be robust to various challenges such as skewed data distributions, high data dimensionality, multiclass problems, and complex models. Taken together, the insights from our comparison gives much promise for applying federated learning as an alternative to sharing data. Code for reproducing the results in this work can be found at: https://github.com/swiergarst/FLComparison.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11928227/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143673562","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

VarGuideAtlas: a repository of variant interpretation guidelines. VarGuideAtlas：变体解释指南资料库。

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-03-11 DOI: 10.1093/database/baaf017

Mireia Costa, Alberto García S, Oscar Pastor

Variant interpretation guidelines guide the process of determining the role of DNA variants in patients' health. Currently, hundreds of guidelines exist, each applicable to a particular clinical domain. However, they are scattered across multiple resources and scientific literature. To address this issue, we present VarGuideAtlas, a comprehensive repository of variant interpretation guidelines that compiles information from ClinGen, ClinVar, and PubMed. Our repository offers a user-friendly web interface with advanced search capabilities, enabling clinicians and researchers to efficiently find relevant guidelines tailored to specific genes, diseases, or variant types. We employ ontologies to characterize each guideline, ensuring consistency and improving interoperability with bioinformatics tools. VarGuideAtlas represents a significant advance toward standardizing variant interpretation practices, facilitating more informed decision-making, improved clinical outcomes, and more precise genomic research. VarGuideAtlas is publicly accessible via a web-based platform (https://genomics-hub.pros.dsic.upv.es:3016/).

{"title":"VarGuideAtlas: a repository of variant interpretation guidelines.","authors":"Mireia Costa, Alberto García S, Oscar Pastor","doi":"10.1093/database/baaf017","DOIUrl":"10.1093/database/baaf017","url":null,"abstract":"Variant interpretation guidelines guide the process of determining the role of DNA variants in patients' health. Currently, hundreds of guidelines exist, each applicable to a particular clinical domain. However, they are scattered across multiple resources and scientific literature. To address this issue, we present VarGuideAtlas, a comprehensive repository of variant interpretation guidelines that compiles information from ClinGen, ClinVar, and PubMed. Our repository offers a user-friendly web interface with advanced search capabilities, enabling clinicians and researchers to efficiently find relevant guidelines tailored to specific genes, diseases, or variant types. We employ ontologies to characterize each guideline, ensuring consistency and improving interoperability with bioinformatics tools. VarGuideAtlas represents a significant advance toward standardizing variant interpretation practices, facilitating more informed decision-making, improved clinical outcomes, and more precise genomic research. VarGuideAtlas is publicly accessible via a web-based platform (https://genomics-hub.pros.dsic.upv.es:3016/).","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11895764/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143604068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Pipeline to explore information on genome editing using large language models and genome editing meta-database.

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-03-08 DOI: 10.1093/database/baaf022

Takayuki Suzuki, Hidemasa Bono

Genome editing (GE) is widely recognized as an effective and valuable technology in life sciences research. However, certain genes are difficult to edit depending on some factors such as the type of species, sequences, and GE tools. Therefore, confirming the presence or absence of GE practices in previous publications is crucial for the effective designing and establishment of research using GE. Although the Genome Editing Meta-database (GEM: https://bonohu.hiroshima-u.ac.jp/gem/) aims to provide as comprehensive GE information as possible, it does not indicate how each registered gene is involved in GE. In this study, we developed a systematic method for extracting essential GE information using large language models from the information based on GEM and GE-related articles. This approach allows for a systematic and efficient investigation of GE information that cannot be achieved using the current GEM alone. In addition, by converting the extracted GE information into metrics, we propose a potential application of this method to prioritize genes for future research. The extracted GE information and novel GE-related scores are expected to facilitate the efficient selection of target genes for GE and support the design of research using GE. Database URLs: https://github.com/szktkyk/extract_geinfo, https://github.com/szktkyk/visualize_geinfo.

{"title":"Pipeline to explore information on genome editing using large language models and genome editing meta-database.","authors":"Takayuki Suzuki, Hidemasa Bono","doi":"10.1093/database/baaf022","DOIUrl":"10.1093/database/baaf022","url":null,"abstract":"Genome editing (GE) is widely recognized as an effective and valuable technology in life sciences research. However, certain genes are difficult to edit depending on some factors such as the type of species, sequences, and GE tools. Therefore, confirming the presence or absence of GE practices in previous publications is crucial for the effective designing and establishment of research using GE. Although the Genome Editing Meta-database (GEM: https://bonohu.hiroshima-u.ac.jp/gem/) aims to provide as comprehensive GE information as possible, it does not indicate how each registered gene is involved in GE. In this study, we developed a systematic method for extracting essential GE information using large language models from the information based on GEM and GE-related articles. This approach allows for a systematic and efficient investigation of GE information that cannot be achieved using the current GEM alone. In addition, by converting the extracted GE information into metrics, we propose a potential application of this method to prioritize genes for future research. The extracted GE information and novel GE-related scores are expected to facilitate the efficient selection of target genes for GE and support the design of research using GE. Database URLs: https://github.com/szktkyk/extract_geinfo, https://github.com/szktkyk/visualize_geinfo.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11890094/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143582230","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

gymnotoa-db: a database and application to optimize functional annotation in gymnosperms.

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-03-05 DOI: 10.1093/database/baaf019

Fernando Mora-Márquez, Mikel Hurtado, Unai López de Heredia

Gymnosperms are a clade of non-flowering plants that include about 1000 living species. Due to their complex genomes and lack of genomic resources, functional annotation in genomics and transcriptomics on gymnosperms suffers from limitations. Here we present gymnotoa-db, which is a novel, publicly accessible relational database designed to facilitate functional annotation in gymnosperms. This database stores non-redundant records of gymnosperm proteins, encompassing taxonomic and functional information. The complementary software, gymnotoa-app, enables users to download gymnotoa-db and execute a comprehensive functional annotation pipeline for high-throughput sequencing-derived DNA or cDNA sequences. gymnotoa-app's user-friendly interface and efficient algorithms streamline the functional annotation process, making it an invaluable tool for researchers studying gymnosperms. We compared gymnotoa-app's performance against other annotation tools utilizing disparate reference databases. Our results demonstrate gymnotoa-app's superior ability to accurately annotate gymnosperm transcripts, recovering a greater number of transcripts and unique, non-redundant Gene Ontology terms. gymnotoa-db's distinctive features include comprehensive coverage with a non-redundant dataset of gymnosperm protein sequences, robust functional information that integrates data from multiple ontology systems, including GO, KEGG, EC, and MetaCYC, while keeping the taxonomic context, including Arabidopsis homologs. Database URL: https://blogs.upm.es/gymnotoa-db/2024/09/19/gymnotoa-app/.

{"title":"gymnotoa-db: a database and application to optimize functional annotation in gymnosperms.","authors":"Fernando Mora-Márquez, Mikel Hurtado, Unai López de Heredia","doi":"10.1093/database/baaf019","DOIUrl":"10.1093/database/baaf019","url":null,"abstract":"Gymnosperms are a clade of non-flowering plants that include about 1000 living species. Due to their complex genomes and lack of genomic resources, functional annotation in genomics and transcriptomics on gymnosperms suffers from limitations. Here we present gymnotoa-db, which is a novel, publicly accessible relational database designed to facilitate functional annotation in gymnosperms. This database stores non-redundant records of gymnosperm proteins, encompassing taxonomic and functional information. The complementary software, gymnotoa-app, enables users to download gymnotoa-db and execute a comprehensive functional annotation pipeline for high-throughput sequencing-derived DNA or cDNA sequences. gymnotoa-app's user-friendly interface and efficient algorithms streamline the functional annotation process, making it an invaluable tool for researchers studying gymnosperms. We compared gymnotoa-app's performance against other annotation tools utilizing disparate reference databases. Our results demonstrate gymnotoa-app's superior ability to accurately annotate gymnosperm transcripts, recovering a greater number of transcripts and unique, non-redundant Gene Ontology terms. gymnotoa-db's distinctive features include comprehensive coverage with a non-redundant dataset of gymnosperm protein sequences, robust functional information that integrates data from multiple ontology systems, including GO, KEGG, EC, and MetaCYC, while keeping the taxonomic context, including Arabidopsis homologs. Database URL: https://blogs.upm.es/gymnotoa-db/2024/09/19/gymnotoa-app/.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11886576/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143572429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

ForestForward: visualizing and accessing integrated world forest data from the last 50 years.

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-03-03 DOI: 10.1093/database/baaf018

E L Tejada-Gutiérrez, J Mateo Fornés, F Solsona, R Alves

Mitigating the effects of environmental exploitation on forests requires robust data analysis tools to inform sustainable management strategies and enhance ecosystem resilience. Access to extensive, integrated plant biodiversity data, spanning decades, is essential for this purpose. However, such data are often fragmented across diverse datasets with varying standards, posing two key challenges: first, integrating these datasets into a unified, well-structured data warehouse, and second, handling the vast volume of data using big data technologies to analyze and monitor the temporal evolution of ecosystems. To address these challenges, we developed and used an extract, transform, and load (ETL) protocol that curated and integrates 4482 forestry datasets from around the world, dating back to the 18th century, into a 100-GB data warehouse containing over 172 million records sourced from the Global Biodiversity Information Facility repository. We implemented Python scripts and a NoSQL MongoDB database to streamline and automate the ETL process, using the data warehouse to create the ForestForward web platform. ForestForward is a free, user-friendly application developed using the Django framework, which enables users to consult, download, and visualize the curated data. The platform allows users to explore data layers by year and observe the temporal evolution of ecosystems through visual representations. Database URL: https://forestforward.udl.cat.

{"title":"ForestForward: visualizing and accessing integrated world forest data from the last 50 years.","authors":"E L Tejada-Gutiérrez, J Mateo Fornés, F Solsona, R Alves","doi":"10.1093/database/baaf018","DOIUrl":"10.1093/database/baaf018","url":null,"abstract":"Mitigating the effects of environmental exploitation on forests requires robust data analysis tools to inform sustainable management strategies and enhance ecosystem resilience. Access to extensive, integrated plant biodiversity data, spanning decades, is essential for this purpose. However, such data are often fragmented across diverse datasets with varying standards, posing two key challenges: first, integrating these datasets into a unified, well-structured data warehouse, and second, handling the vast volume of data using big data technologies to analyze and monitor the temporal evolution of ecosystems. To address these challenges, we developed and used an extract, transform, and load (ETL) protocol that curated and integrates 4482 forestry datasets from around the world, dating back to the 18th century, into a 100-GB data warehouse containing over 172 million records sourced from the Global Biodiversity Information Facility repository. We implemented Python scripts and a NoSQL MongoDB database to streamline and automate the ETL process, using the data warehouse to create the ForestForward web platform. ForestForward is a free, user-friendly application developed using the Django framework, which enables users to consult, download, and visualize the curated data. The platform allows users to explore data layers by year and observe the temporal evolution of ecosystems through visual representations. Database URL: https://forestforward.udl.cat.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11879282/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143556257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TcEVdb: a database for T-cell-derived small extracellular vesicles from single-cell transcriptomes.

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-02-28 DOI: 10.1093/database/baaf012

Tao Luo, Wen-Kang Shen, Chu-Yu Zhang, Dan-Dan Song, Xiu-Qing Zhang, An-Yuan Guo, Qian Lei

T-Cell-derived extracellular vesicles (TcEVs) play key roles in immune regulation and tumor microenvironment modulation. However, the heterogeneity of TcEV remains poorly understood due to technical limitations of EV analysis and the lack of comprehensive data. To address this, we constructed TcEVdb, a comprehensive database that explores the expression and cluster of TcEV by the SEVtras method from T-cell single-cell RNA sequencing data. TcEVdb contains 277 265 EV droplets from 51 T-cell types across 221 samples from 21 projects, covering 9 tissue sources and 23 disease conditions. The database provides two main functional modules. The Browse module enables users to investigate EV secretion activity indices across samples, visualize TcEV clusters, analyze differentially expressed genes (DEGs) and pathway enrichment in TcEV subpopulations, and compare TcEV transcriptomes with their cellular origins. The Search module allows users to query specific genes across all datasets and visualize their expression distribution. Furthermore, our analysis of TcEV in diffuse large B-cell lymphoma revealed increased EV secretion in CD4+ T exhausted cells compared to healthy controls. Subsequent analyses identified distinct droplet clusters with differential expression genes, including clusters enriched for genes associated with cell motility and mitochondrial function. Overall, TcEVdb serves as a comprehensive resource for exploring the transcriptome of TcEV, which will contribute to advancements in EV-based diagnostics and therapeutics across a wide range of diseases. Database URL: https://guolab.wchscu.cn/TcEVdb.

{"title":"TcEVdb: a database for T-cell-derived small extracellular vesicles from single-cell transcriptomes.","authors":"Tao Luo, Wen-Kang Shen, Chu-Yu Zhang, Dan-Dan Song, Xiu-Qing Zhang, An-Yuan Guo, Qian Lei","doi":"10.1093/database/baaf012","DOIUrl":"10.1093/database/baaf012","url":null,"abstract":"T-Cell-derived extracellular vesicles (TcEVs) play key roles in immune regulation and tumor microenvironment modulation. However, the heterogeneity of TcEV remains poorly understood due to technical limitations of EV analysis and the lack of comprehensive data. To address this, we constructed TcEVdb, a comprehensive database that explores the expression and cluster of TcEV by the SEVtras method from T-cell single-cell RNA sequencing data. TcEVdb contains 277 265 EV droplets from 51 T-cell types across 221 samples from 21 projects, covering 9 tissue sources and 23 disease conditions. The database provides two main functional modules. The Browse module enables users to investigate EV secretion activity indices across samples, visualize TcEV clusters, analyze differentially expressed genes (DEGs) and pathway enrichment in TcEV subpopulations, and compare TcEV transcriptomes with their cellular origins. The Search module allows users to query specific genes across all datasets and visualize their expression distribution. Furthermore, our analysis of TcEV in diffuse large B-cell lymphoma revealed increased EV secretion in CD4+ T exhausted cells compared to healthy controls. Subsequent analyses identified distinct droplet clusters with differential expression genes, including clusters enriched for genes associated with cell motility and mitochondrial function. Overall, TcEVdb serves as a comprehensive resource for exploring the transcriptome of TcEV, which will contribute to advancements in EV-based diagnostics and therapeutics across a wide range of diseases. Database URL: https://guolab.wchscu.cn/TcEVdb.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11885782/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143555417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PotatoBSLnc: a curated repository of potato long noncoding RNAs in response to biotic stress.

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-02-22 DOI: 10.1093/database/baaf015

Pingping Huang, Weilin Cao, Zhaojun Li, Qingshuai Chen, Guangchao Wang, Bailing Zhou, Jihua Wang

The biotic stress significantly influences the production of potato (Solanum tuberosum L.) all over the world. Long noncoding RNAs (lncRNAs) play key roles in the plant response to environmental stressors. However, their roles in potato resistance to pathogens, insects, and other biotic stress are still unclear. The PotatoBSLnc is a database for the study of potato lncRNAs in response to major biotic stress. Here, we collected 364 RNA sequencing (RNA-seq) data derived from 12 kinds of biotic stresses in 26 cultivars and wild potatoes. PotatoBSLnc currently contains 18 636 lncRNAs and 44 263 mRNAs. In addition, to select the functional lncRNAs and mRNAs under different stresses, the differential expression analyses and the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses related to the cis/trans-targets of differentially expressed lncRNAs (DElncRNAs) and to the differentially expressed mRNAs (DEmRNAs) were also conducted. The database contains five modules: Home, Browse, Expression, Biotic stress, and Download. Among these, the "Browse" module can be used to search detailed information about RNA-seq data (disease, cultivator, organ types, treatment of samples, and others), the exon numbers, length, location, and sequence of each lncRNA/mRNA. The "Expression" module can be used to search the transcripts per million/raw count value of lncRNAs/mRNAs at different RNA-seq data. The "Biotic stress" module shows the results of differential expression analyses under each of the 12 biotic stresses, the cis/trans-targets of DElncRNAs, the GO and KEGG analysis results of DEmRNAs, and the targets of DElncRNAs. The PotatoBSLnc platform provides researchers with detailed information on potato lncRNAs and mRNAs under biotic stress, which can speed up the breeding of resistant varieties based on the molecular methods. Database URL: https://www.sdklab-biophysics-dzu.net/PotatoBSLnc.

{"title":"PotatoBSLnc: a curated repository of potato long noncoding RNAs in response to biotic stress.","authors":"Pingping Huang, Weilin Cao, Zhaojun Li, Qingshuai Chen, Guangchao Wang, Bailing Zhou, Jihua Wang","doi":"10.1093/database/baaf015","DOIUrl":"10.1093/database/baaf015","url":null,"abstract":"The biotic stress significantly influences the production of potato (Solanum tuberosum L.) all over the world. Long noncoding RNAs (lncRNAs) play key roles in the plant response to environmental stressors. However, their roles in potato resistance to pathogens, insects, and other biotic stress are still unclear. The PotatoBSLnc is a database for the study of potato lncRNAs in response to major biotic stress. Here, we collected 364 RNA sequencing (RNA-seq) data derived from 12 kinds of biotic stresses in 26 cultivars and wild potatoes. PotatoBSLnc currently contains 18 636 lncRNAs and 44 263 mRNAs. In addition, to select the functional lncRNAs and mRNAs under different stresses, the differential expression analyses and the Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses related to the cis/trans-targets of differentially expressed lncRNAs (DElncRNAs) and to the differentially expressed mRNAs (DEmRNAs) were also conducted. The database contains five modules: Home, Browse, Expression, Biotic stress, and Download. Among these, the \"Browse\" module can be used to search detailed information about RNA-seq data (disease, cultivator, organ types, treatment of samples, and others), the exon numbers, length, location, and sequence of each lncRNA/mRNA. The \"Expression\" module can be used to search the transcripts per million/raw count value of lncRNAs/mRNAs at different RNA-seq data. The \"Biotic stress\" module shows the results of differential expression analyses under each of the 12 biotic stresses, the cis/trans-targets of DElncRNAs, the GO and KEGG analysis results of DEmRNAs, and the targets of DElncRNAs. The PotatoBSLnc platform provides researchers with detailed information on potato lncRNAs and mRNAs under biotic stress, which can speed up the breeding of resistant varieties based on the molecular methods. Database URL: https://www.sdklab-biophysics-dzu.net/PotatoBSLnc.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11846501/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143476331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

MANUDB: database and application to retrieve and visualize mammalian NUMTs.

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-02-22 DOI: 10.1093/database/baaf009

Bálint Biró, Zoltán Gál, Zsófia Nagy, Juan Francisco Garcia, Tsend-Ayush Batbold, Orsolya Ivett Hoffmann

There is an ongoing genetic flow from the mitochondrial genome to the nuclear genome. The mitochondrial sequences that have integrated into the nuclear genome have been shown to be drivers of evolutionary processes and cancerous transformations. In addition to their fundamental biological importance, these sequences have significant consequences for genome assembly and phylogenetic and forensic analyses as well. Previously, our research group developed a computational pipeline that provides a uniform way of identifying these sequences in mammalian genomes. In this paper, we publish MANUDB-the MAmmalian NUclear mitochondrial sequences DataBase, which makes the results of our pipeline publicly accessible. With MANUDB one can retrieve and visualize mitochondrial genome fragments that have been integrated into the nuclear genome of mammalian species. Database URL: manudb.streamlit.app.

引用次数: 0

Integrating AI-powered text mining from PubTator into the manual curation workflow at the Comparative Toxicogenomics Database.

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-02-21 DOI: 10.1093/database/baaf013

Thomas C Wiegers, Allan Peter Davis, Jolene Wiegers, Daniela Sciaky, Fern Barkalow, Brent Wyatt, Melissa Strong, Roy McMorran, Sakib Abrar, Carolyn J Mattingly

The Comparative Toxicogenomics Database (CTD) is a manually curated knowledge- and discovery-base that seeks to advance understanding about the relationship between environmental exposures and human health. CTD's manual curation process extracts from the biomedical literature molecular relationships between chemicals/drugs, genes/proteins, phenotypes, diseases, anatomical terms, and species. These relationships are organized in a highly systematic way in order to make them not only informative but also scientifically computational, enabling inferential hypotheses to be formed to address gaps in understanding. Integral to CTD's functionality is the use of structured, hierarchical ontologies and controlled vocabularies to describe these molecular relationships. Normalizing text (i.e. translating raw text from the literature into these controlled vocabularies) can be a time-consuming process for biocurators. To facilitate the normalization process and improve the efficiency with which our scientists curate the literature, CTD evaluated and integrated into the curation process PubTator 3.0, a state-of-the-art, AI-powered resource which extracts and normalizes from the literature many of the key biomedical concepts CTD curates. Here, we describe CTD's long-standing history with Natural Language Processing (NLP), how this history helped form our objectives for NLP integration, the evaluation of PubTator against our objectives, and the integration of PubTator into CTD's curation workflow. Database URL: https://ctdbase.org.

{"title":"Integrating AI-powered text mining from PubTator into the manual curation workflow at the Comparative Toxicogenomics Database.","authors":"Thomas C Wiegers, Allan Peter Davis, Jolene Wiegers, Daniela Sciaky, Fern Barkalow, Brent Wyatt, Melissa Strong, Roy McMorran, Sakib Abrar, Carolyn J Mattingly","doi":"10.1093/database/baaf013","DOIUrl":"10.1093/database/baaf013","url":null,"abstract":"The Comparative Toxicogenomics Database (CTD) is a manually curated knowledge- and discovery-base that seeks to advance understanding about the relationship between environmental exposures and human health. CTD's manual curation process extracts from the biomedical literature molecular relationships between chemicals/drugs, genes/proteins, phenotypes, diseases, anatomical terms, and species. These relationships are organized in a highly systematic way in order to make them not only informative but also scientifically computational, enabling inferential hypotheses to be formed to address gaps in understanding. Integral to CTD's functionality is the use of structured, hierarchical ontologies and controlled vocabularies to describe these molecular relationships. Normalizing text (i.e. translating raw text from the literature into these controlled vocabularies) can be a time-consuming process for biocurators. To facilitate the normalization process and improve the efficiency with which our scientists curate the literature, CTD evaluated and integrated into the curation process PubTator 3.0, a state-of-the-art, AI-powered resource which extracts and normalizes from the literature many of the key biomedical concepts CTD curates. Here, we describe CTD's long-standing history with Natural Language Processing (NLP), how this history helped form our objectives for NLP integration, the evaluation of PubTator against our objectives, and the integration of PubTator into CTD's curation workflow. Database URL: https://ctdbase.org.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11844237/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143472396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0