Pub Date : 2026-01-15DOI: 10.1093/database/baaf090
Runjia Ji, Yongxin Pan, Wei Lin
Magnetotactic bacteria (MTB) are a unique group of microorganisms capable of navigating along geomagnetic field lines through the biomineralization of intracellular magnetic nanocrystals called magnetosomes. While genomic analyses have substantially advanced our understanding of these predominantly uncultured microorganisms, MTB genomic data remain scattered across multiple databases with inconsistent quality profiles and incomplete metadata, limiting comprehensive research efforts. To address these challenges, we developed the Genomic Database of Magnetotactic Bacteria (GdbMTB), the first comprehensive, curated genomic resource dedicated to MTB. The current version of GdbMTB integrates 365 publicly available MTB genomes and their associated metadata. Through a standardized bioinformatics workflow, it provides detailed genome quality assessments, taxonomic classifications, and annotations of magnetosome biomineralization genes, ensuring reliable data for downstream analyses. The curated metadata, encompassing environmental context and publication details, offers crucial research background, enabling users to trace the provenance of each genome. Additionally, GdbMTB offers a suite of bioinformatics tools and an analysis pipeline to facilitate advanced MTB studies. GdbMTB enhances accessibility to MTB genomic data, thereby promoting interdisciplinary research in microbiology, geomicrobiology, and biomineralization studies. Database URL: https://www.gdbmtb.cn/.
{"title":"GdbMTB: a manually curated genomic database of magnetotactic bacteria.","authors":"Runjia Ji, Yongxin Pan, Wei Lin","doi":"10.1093/database/baaf090","DOIUrl":"10.1093/database/baaf090","url":null,"abstract":"<p><p>Magnetotactic bacteria (MTB) are a unique group of microorganisms capable of navigating along geomagnetic field lines through the biomineralization of intracellular magnetic nanocrystals called magnetosomes. While genomic analyses have substantially advanced our understanding of these predominantly uncultured microorganisms, MTB genomic data remain scattered across multiple databases with inconsistent quality profiles and incomplete metadata, limiting comprehensive research efforts. To address these challenges, we developed the Genomic Database of Magnetotactic Bacteria (GdbMTB), the first comprehensive, curated genomic resource dedicated to MTB. The current version of GdbMTB integrates 365 publicly available MTB genomes and their associated metadata. Through a standardized bioinformatics workflow, it provides detailed genome quality assessments, taxonomic classifications, and annotations of magnetosome biomineralization genes, ensuring reliable data for downstream analyses. The curated metadata, encompassing environmental context and publication details, offers crucial research background, enabling users to trace the provenance of each genome. Additionally, GdbMTB offers a suite of bioinformatics tools and an analysis pipeline to facilitate advanced MTB studies. GdbMTB enhances accessibility to MTB genomic data, thereby promoting interdisciplinary research in microbiology, geomicrobiology, and biomineralization studies. Database URL: https://www.gdbmtb.cn/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2026 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805115/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145970396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Proteoforms translated from alternatively spliced transcripts contribute to the functional repertoire of the cell by performing diverse biological functions, contributing to the functional diversity of genomics and proteomics. However, the lack of existing databases that integrate functional annotations of proteoforms, and analyse the drivers of their functional differences significantly hinders in-depth research into proteoform functions. We introduce ProteoformDB, a new web resource with integrated in-platform analytical capabilities, organizes transcript-level functional annotations of proteoforms across multiple species, and provides services for prediction of proteoform functions and analysis of functional regulatory mechanisms. ProteoformDB develops user-friendly interfaces for information search, visualization, function supplement, differential analysis, and data download services. Particularly, it enables users to investigate the impact of molecular events on the function of proteoforms at multiple levels, including sequences, domains, and post-translational modifications, among others, thereby uncovering the functional differences between protein variants. The current version includes processed data (154.83 GB) for 214 animal and 28 plant species, and will become a valuable and expandable proteoform functional resource for studying genome and transcriptome functions, disease mechanisms, and other related research.
{"title":"ProteoformDB: an integrative database for functional roles of proteoforms.","authors":"Hanwen Luo, Sichao Qiu, Maozu Guo, Beibei Xin, Jun Wang, Guoxian Yu","doi":"10.1093/database/baag005","DOIUrl":"https://doi.org/10.1093/database/baag005","url":null,"abstract":"<p><p>Proteoforms translated from alternatively spliced transcripts contribute to the functional repertoire of the cell by performing diverse biological functions, contributing to the functional diversity of genomics and proteomics. However, the lack of existing databases that integrate functional annotations of proteoforms, and analyse the drivers of their functional differences significantly hinders in-depth research into proteoform functions. We introduce ProteoformDB, a new web resource with integrated in-platform analytical capabilities, organizes transcript-level functional annotations of proteoforms across multiple species, and provides services for prediction of proteoform functions and analysis of functional regulatory mechanisms. ProteoformDB develops user-friendly interfaces for information search, visualization, function supplement, differential analysis, and data download services. Particularly, it enables users to investigate the impact of molecular events on the function of proteoforms at multiple levels, including sequences, domains, and post-translational modifications, among others, thereby uncovering the functional differences between protein variants. The current version includes processed data (154.83 GB) for 214 animal and 28 plant species, and will become a valuable and expandable proteoform functional resource for studying genome and transcriptome functions, disease mechanisms, and other related research.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2026 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146141300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-15DOI: 10.1093/database/baaf084
Samiksha Maurya, Jaidev Sharma, Amit Mandoli, Vibhor Kumar
Enhancers act as cis-regulatory elements, controlling the expression of genes according to developmental stages, external signalling, and cell states. Recent studies have shown the impact of perturbation of enhancer activity on expression of genes and cell properties. However, at the same time, perturbation of many enhancers does not always show substantial effect on the expression of genes or properties of cells. Hence, there is a need to identify enhancers that can be effectively targeted for therapeutics and understanding regulation. Therefore, a comprehensive resource containing information on the effect of knockdown of enhancers is needed. Here, we introduce a database ePerturbDB, which provides resources to search the effects of 83 743 experimental perturbations of enhancers. The ePerturbDB database allows users to compare their genomic loci to the list of perturbed enhancers to know their potential effect. It also provides enriched genes and ontology terms for query enhancer location overlapping with a known experimentally perturbed enhancer list. Thus, the resource and tool in ePerturbDB can help users build hypotheses and design experiments to find effective enhancer-based therapeutics and inferences about the regulation of cell states. Database URL: http://reggen.iiitd.edu.in:1207/ePerturbDB-html/.
{"title":"ePerturbDB: enhancer's experimental perturbation database.","authors":"Samiksha Maurya, Jaidev Sharma, Amit Mandoli, Vibhor Kumar","doi":"10.1093/database/baaf084","DOIUrl":"10.1093/database/baaf084","url":null,"abstract":"<p><p>Enhancers act as cis-regulatory elements, controlling the expression of genes according to developmental stages, external signalling, and cell states. Recent studies have shown the impact of perturbation of enhancer activity on expression of genes and cell properties. However, at the same time, perturbation of many enhancers does not always show substantial effect on the expression of genes or properties of cells. Hence, there is a need to identify enhancers that can be effectively targeted for therapeutics and understanding regulation. Therefore, a comprehensive resource containing information on the effect of knockdown of enhancers is needed. Here, we introduce a database ePerturbDB, which provides resources to search the effects of 83 743 experimental perturbations of enhancers. The ePerturbDB database allows users to compare their genomic loci to the list of perturbed enhancers to know their potential effect. It also provides enriched genes and ontology terms for query enhancer location overlapping with a known experimentally perturbed enhancer list. Thus, the resource and tool in ePerturbDB can help users build hypotheses and design experiments to find effective enhancer-based therapeutics and inferences about the regulation of cell states. Database URL: http://reggen.iiitd.edu.in:1207/ePerturbDB-html/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2026 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805113/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145970457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-15DOI: 10.1093/database/baaf085
Carlos Patron-Rivero, Carlos Yañez-Arenas, Sara Ruane, Xavier Chiappa-Carrara, Octavio R Rojas-Soto
Generating and sharing primary biological data is essential to support reproducible research, stimulate new hypotheses, and advance our understanding of biodiversity. Here, we present a comprehensive database of morphological traits for snakes of the genus Porthidium (Viperidae: Crotalinae). This database includes linear measurements, pholidosis (scale counts), and head shape data from preserved specimens across five different herpetological collections. These data comprise 13 morphological traits, 8 scale counts, and 55 landmarks collected from 484 individuals across 9 species. The specimens represent both juvenile and adult stages. All data were collected using standardized protocols to ensure comparability across individuals and species. The dataset is a valuable resource for studies in systematics, morphological evolution, ecological adaptation, and ontogeny, as well as facilitating reproducibility and reuse in the fields of evolutionary biology, herpetology, and comparative morphology.
{"title":"A comprehensive morphological database of hognose Porthidium pitvipers (Viperidae: Crotalinae).","authors":"Carlos Patron-Rivero, Carlos Yañez-Arenas, Sara Ruane, Xavier Chiappa-Carrara, Octavio R Rojas-Soto","doi":"10.1093/database/baaf085","DOIUrl":"10.1093/database/baaf085","url":null,"abstract":"<p><p>Generating and sharing primary biological data is essential to support reproducible research, stimulate new hypotheses, and advance our understanding of biodiversity. Here, we present a comprehensive database of morphological traits for snakes of the genus Porthidium (Viperidae: Crotalinae). This database includes linear measurements, pholidosis (scale counts), and head shape data from preserved specimens across five different herpetological collections. These data comprise 13 morphological traits, 8 scale counts, and 55 landmarks collected from 484 individuals across 9 species. The specimens represent both juvenile and adult stages. All data were collected using standardized protocols to ensure comparability across individuals and species. The dataset is a valuable resource for studies in systematics, morphological evolution, ecological adaptation, and ontogeny, as well as facilitating reproducibility and reuse in the fields of evolutionary biology, herpetology, and comparative morphology.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2026 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12813580/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145997451","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-15DOI: 10.1093/database/baag002
Devarakonda Himaja, Debashree Bandyopadhyay
Experimental characterization and annotation of amino acids belonging to domains of unknown function (DUF) proteins are expensive and time-consuming, which could be complemented by computational methods. Cysteine, being the second most reactive amino acid at the catalytic sites of enzymes, was selected for functional annotation and characterization on DUF proteins. Earlier, we reported functional annotation of cysteine on DUF proteins belonging to the COX-II family. However, holistic characterization of cysteine functions on DUF proteins was not known, to the best of our knowledge. Here, we annotated and characterized cysteine residues based on post-translational modifications (PTMs), biochemical pathways, diseases, taxonomy, and protein microenvironment. The information on uncharacterized DUF proteins was initially obtained from the literature, and the sequence, structure, pathways, taxonomy, and disease information were retrieved from the SCOPe database using DUF IDs. Protein microenvironments (MENV) around cysteine residues from DUF proteins were computed using protein structures (n = 70 342). The cysteine PTMs were predicted using the in-house cysteine-function prediction server, DeepCys https://deepcys.bits-hyderabad.ac.in). The accuracy of the prediction, validated against known experimental cysteine PTMs (n = 18 626), was 0.79. The information was consolidated in the database (https://cysduf.bits-hyderabad.ac.in/), retrievable in downloadable formats (CSV, JSON, or TXT) using the following inputs, DUF ID, PFAM ID, or PDB ID. For the first time, we annotated cysteine PTMs in DUF proteins belonging to seven different biochemical pathways and various species across the taxonomy, notably for the SARS-CoV-2 virus. The nature of MENV around cysteine from DUF proteins was mainly buried and hydrophobic. However, in the SARS-CoV-2 virus, a significant number of functional cysteine residues were exposed on the surface with hydrophilic microenvironment.
{"title":"CysDuF database: annotation and characterization of cysteine residues in domain of unknown function proteins based on cysteine post-translational modifications, their protein microenvironments, biochemical pathways, taxonomy, and diseases.","authors":"Devarakonda Himaja, Debashree Bandyopadhyay","doi":"10.1093/database/baag002","DOIUrl":"10.1093/database/baag002","url":null,"abstract":"<p><p>Experimental characterization and annotation of amino acids belonging to domains of unknown function (DUF) proteins are expensive and time-consuming, which could be complemented by computational methods. Cysteine, being the second most reactive amino acid at the catalytic sites of enzymes, was selected for functional annotation and characterization on DUF proteins. Earlier, we reported functional annotation of cysteine on DUF proteins belonging to the COX-II family. However, holistic characterization of cysteine functions on DUF proteins was not known, to the best of our knowledge. Here, we annotated and characterized cysteine residues based on post-translational modifications (PTMs), biochemical pathways, diseases, taxonomy, and protein microenvironment. The information on uncharacterized DUF proteins was initially obtained from the literature, and the sequence, structure, pathways, taxonomy, and disease information were retrieved from the SCOPe database using DUF IDs. Protein microenvironments (MENV) around cysteine residues from DUF proteins were computed using protein structures (n = 70 342). The cysteine PTMs were predicted using the in-house cysteine-function prediction server, DeepCys https://deepcys.bits-hyderabad.ac.in). The accuracy of the prediction, validated against known experimental cysteine PTMs (n = 18 626), was 0.79. The information was consolidated in the database (https://cysduf.bits-hyderabad.ac.in/), retrievable in downloadable formats (CSV, JSON, or TXT) using the following inputs, DUF ID, PFAM ID, or PDB ID. For the first time, we annotated cysteine PTMs in DUF proteins belonging to seven different biochemical pathways and various species across the taxonomy, notably for the SARS-CoV-2 virus. The nature of MENV around cysteine from DUF proteins was mainly buried and hydrophobic. However, in the SARS-CoV-2 virus, a significant number of functional cysteine residues were exposed on the surface with hydrophilic microenvironment.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2026 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12828279/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146028686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-15DOI: 10.1093/database/baaf086
Seung-Jin Park, Seon-Young Kim
Somatic mutations, key alterations in cancer development, exert differential effects across tissues and biological layers, such as transcriptomes, proteomes, and post-translational modifications (PTMs). Although previous pan-cancer studies have characterized the molecular landscape of cancer, the effects of individual somatic mutations across different tissues remain insufficiently explored. Here, we developed Panorama to evaluate the oncogenic potential of single somatic mutations across all cancer types. We collected cancer proteogenomics or multiomics data from over 10 000 individuals across 19 cancer types. Based on five evaluation criteria, we assessed whether a specific mutation affects the abundance of a particular gene's transcriptome, proteome, or phosphoproteome; the tumor microenvironment; specific RNA- or protein-based signaling pathways; and outlier-level overexpression of PTMs, aiding in potential drug target identification. By leveraging five oncogenic metrics, Panorama quantifies the oncogenic potential of individual somatic mutations and provides a framework for identifying driver mutations by incorporating their downstream effects. With Panorama, researchers can integrate cancer proteogenomics data, providing a comprehensive approach that enhances our understanding of single somatic mutations in specific tissues. Finally, Panorama was developed as a web-based database to ensure easy access for researchers and is freely available at http://139.150.65.64:8080/or https://github.com/prosium/panorama.
{"title":"Panorama: a database for the oncogenic evaluation of somatic mutations in pan-cancer.","authors":"Seung-Jin Park, Seon-Young Kim","doi":"10.1093/database/baaf086","DOIUrl":"10.1093/database/baaf086","url":null,"abstract":"<p><p>Somatic mutations, key alterations in cancer development, exert differential effects across tissues and biological layers, such as transcriptomes, proteomes, and post-translational modifications (PTMs). Although previous pan-cancer studies have characterized the molecular landscape of cancer, the effects of individual somatic mutations across different tissues remain insufficiently explored. Here, we developed Panorama to evaluate the oncogenic potential of single somatic mutations across all cancer types. We collected cancer proteogenomics or multiomics data from over 10 000 individuals across 19 cancer types. Based on five evaluation criteria, we assessed whether a specific mutation affects the abundance of a particular gene's transcriptome, proteome, or phosphoproteome; the tumor microenvironment; specific RNA- or protein-based signaling pathways; and outlier-level overexpression of PTMs, aiding in potential drug target identification. By leveraging five oncogenic metrics, Panorama quantifies the oncogenic potential of individual somatic mutations and provides a framework for identifying driver mutations by incorporating their downstream effects. With Panorama, researchers can integrate cancer proteogenomics data, providing a comprehensive approach that enhances our understanding of single somatic mutations in specific tissues. Finally, Panorama was developed as a web-based database to ensure easy access for researchers and is freely available at http://139.150.65.64:8080/or https://github.com/prosium/panorama.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2026 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12808847/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145988657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-15DOI: 10.1093/database/baaf088
Hao Li, Jiani Hu, Jie Song, Wei Zhou
Synthetic biology part discovery faces significant challenges due to inconsistent data organization and limited semantic search capabilities across existing repositories. We developed SynVectorDB, an embedding-based retrieval system that addresses these limitations through methodological innovations in data integration and AI-driven semantic search. Our approach integrates 19 850 biological parts from multiple sources (Addgene, iGEM Registry, laboratory collections), implementing systematic curation protocols that resulted in 7656 parts achieving verified status through literature-based validation and reliability assessment. We introduce a novel three-level hierarchical classification system organizing parts into functionally coherent categories (DNA Elements, RNA Elements, Coding Sequences, and Application Constructs) with detailed subcategorization. The core technical contribution employs BGE-M3 multilingual embeddings within a scalable vector database architecture to enable semantic similarity matching that significantly outperforms keyword-based retrieval methods. Standardized curation workflows enhance data comparability and search accuracy across heterogeneous sources. The dual deployment architecture ensures high performance through cloud services while maintaining open-source accessibility and deployment flexibility. The system maintains SBOL3 compatibility while providing innovative solutions for biological part organization and retrieval. Database URL: SynVectorDB is available in multiple deployment modes: web interface (https://svdb.sjtu.bio), local installation and source code (https://github.com/AilurusBio/synbio-parts-db), and MCP server integration for AI assistants (https://www.npmjs.com/package/synvectordb).
{"title":"SynVectorDB: embedding-based retrieval system for synthetic biology parts.","authors":"Hao Li, Jiani Hu, Jie Song, Wei Zhou","doi":"10.1093/database/baaf088","DOIUrl":"10.1093/database/baaf088","url":null,"abstract":"<p><p>Synthetic biology part discovery faces significant challenges due to inconsistent data organization and limited semantic search capabilities across existing repositories. We developed SynVectorDB, an embedding-based retrieval system that addresses these limitations through methodological innovations in data integration and AI-driven semantic search. Our approach integrates 19 850 biological parts from multiple sources (Addgene, iGEM Registry, laboratory collections), implementing systematic curation protocols that resulted in 7656 parts achieving verified status through literature-based validation and reliability assessment. We introduce a novel three-level hierarchical classification system organizing parts into functionally coherent categories (DNA Elements, RNA Elements, Coding Sequences, and Application Constructs) with detailed subcategorization. The core technical contribution employs BGE-M3 multilingual embeddings within a scalable vector database architecture to enable semantic similarity matching that significantly outperforms keyword-based retrieval methods. Standardized curation workflows enhance data comparability and search accuracy across heterogeneous sources. The dual deployment architecture ensures high performance through cloud services while maintaining open-source accessibility and deployment flexibility. The system maintains SBOL3 compatibility while providing innovative solutions for biological part organization and retrieval. Database URL: SynVectorDB is available in multiple deployment modes: web interface (https://svdb.sjtu.bio), local installation and source code (https://github.com/AilurusBio/synbio-parts-db), and MCP server integration for AI assistants (https://www.npmjs.com/package/synvectordb).</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2026 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805114/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145970546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-15DOI: 10.1093/database/baaf089
Ágnes Becsei, Patrick Munk, Alessandro Fuschi, Saria Otani, József Stéger, Dávid Visontai, Krisztián Papp, Christian Brinch, Ravi Kant, Ilya Weinstein, Olli Vapalahti, Miranda de Graaf, Claudia M E Schapendonk, Jeroen Roelfsema, Maaike van den Beld, Roan Pijnacker, Eelco Franz, Patricia Alba, Antonio Battisti, Alessandra De Cesare, Valentina Indio, Fulvia Troja, Tarja Sironen, Chiara Oliveri, Frédérique Pasquali, Ivan Liachko, Benjamin Auch, Colman O'Cathail, Krisztián Bányai, Magdolna Makó, Péter Pollner, Marion Koopmans, Istvan Csabai, Daniel Remondini, Frank M Aarestrup
Sewage metagenomics is a powerful tool for proactive pathogen surveillance and understanding microbial community dynamics. To support such efforts, we present a highly curated and accessible longitudinal dataset of 239 sewage samples collected from five European cities. The dataset, processed through metagenomic sequencing, includes rich analytical outputs such as taxonomic profiles, identified antimicrobial resistance genes, assembled contigs with annotated origins, metagenome-assembled genomes with functional gene annotations, and metadata. Given the computational intensity and time required to reproduce such analyses, we share this dataset to promote reuse and advance research. In addition to the metagenomic data, qPCR was used to identify specific pathogens, and Hi-C sequencing was performed on a subset of the samples to strengthen genomic linkage analysis. Central to this resource is a publicly available PostgreSQL database, designed to facilitate efficient exploration and reuse of the data. This comprehensive database allows users to perform targeted queries, subset data, and streamline access to this extensive resource.
{"title":"A comprehensive database for biological data derived from sewage in five European cities.","authors":"Ágnes Becsei, Patrick Munk, Alessandro Fuschi, Saria Otani, József Stéger, Dávid Visontai, Krisztián Papp, Christian Brinch, Ravi Kant, Ilya Weinstein, Olli Vapalahti, Miranda de Graaf, Claudia M E Schapendonk, Jeroen Roelfsema, Maaike van den Beld, Roan Pijnacker, Eelco Franz, Patricia Alba, Antonio Battisti, Alessandra De Cesare, Valentina Indio, Fulvia Troja, Tarja Sironen, Chiara Oliveri, Frédérique Pasquali, Ivan Liachko, Benjamin Auch, Colman O'Cathail, Krisztián Bányai, Magdolna Makó, Péter Pollner, Marion Koopmans, Istvan Csabai, Daniel Remondini, Frank M Aarestrup","doi":"10.1093/database/baaf089","DOIUrl":"10.1093/database/baaf089","url":null,"abstract":"<p><p>Sewage metagenomics is a powerful tool for proactive pathogen surveillance and understanding microbial community dynamics. To support such efforts, we present a highly curated and accessible longitudinal dataset of 239 sewage samples collected from five European cities. The dataset, processed through metagenomic sequencing, includes rich analytical outputs such as taxonomic profiles, identified antimicrobial resistance genes, assembled contigs with annotated origins, metagenome-assembled genomes with functional gene annotations, and metadata. Given the computational intensity and time required to reproduce such analyses, we share this dataset to promote reuse and advance research. In addition to the metagenomic data, qPCR was used to identify specific pathogens, and Hi-C sequencing was performed on a subset of the samples to strengthen genomic linkage analysis. Central to this resource is a publicly available PostgreSQL database, designed to facilitate efficient exploration and reuse of the data. This comprehensive database allows users to perform targeted queries, subset data, and streamline access to this extensive resource.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2026 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12817144/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146009259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tear fluid is a clinically accessible, minimally invasive biofluid with a complex and dynamic proteome. Molecular alterations in tear composition have been linked to a broad spectrum of ocular and systemic diseases; however, the small volume of tear samples presents substantial challenges for obtaining high-quality proteomic data. To overcome this limitation, we developed a highly sensitive mass spectrometry workflow capable of identifying more than 1,000 proteins from individual tear samples. Applying this workflow to a large and diverse cohort, we generated a representative and comprehensive profile of the human tear fluid proteome and established reference abundance ranges for proteins commonly detected in tear fluid. In parallel with protein quantification, we collected detailed clinical annotations for each participant. As the database continues to grow, these analyses will increasingly support the identification of disease-associated proteomic signatures, deepen our understanding of underlying biological mechanisms, and accelerate the discovery of clinically relevant biomarkers. To make these data broadly accessible, we created a user-friendly website for exploring protein measurements alongside associated clinical metadata. The current release includes proteomic profiles from 74 human tear samples, encompassing 2,134 unique proteins. The TearFluid Database serves as a foundational resource for biomarker discovery, comparative proteomics, and systems-level investigations in tear biology, offering the scientific community a robust and expandable platform for advancing tear fluid proteomics research. Database URL: https://tearfluid.org/.
{"title":"Tear fluid database: a reference website for tear fluid proteomics.","authors":"Drew Mayernik, Saleh Ahmed, Eliza Williams, Tae Jin Lee, Amy Estes, Pamela Martin, Wenbo Zhi, Vishal Jhanji, Shruti Sharma, Ashok Sharma","doi":"10.1093/database/baaf091","DOIUrl":"10.1093/database/baaf091","url":null,"abstract":"<p><p>Tear fluid is a clinically accessible, minimally invasive biofluid with a complex and dynamic proteome. Molecular alterations in tear composition have been linked to a broad spectrum of ocular and systemic diseases; however, the small volume of tear samples presents substantial challenges for obtaining high-quality proteomic data. To overcome this limitation, we developed a highly sensitive mass spectrometry workflow capable of identifying more than 1,000 proteins from individual tear samples. Applying this workflow to a large and diverse cohort, we generated a representative and comprehensive profile of the human tear fluid proteome and established reference abundance ranges for proteins commonly detected in tear fluid. In parallel with protein quantification, we collected detailed clinical annotations for each participant. As the database continues to grow, these analyses will increasingly support the identification of disease-associated proteomic signatures, deepen our understanding of underlying biological mechanisms, and accelerate the discovery of clinically relevant biomarkers. To make these data broadly accessible, we created a user-friendly website for exploring protein measurements alongside associated clinical metadata. The current release includes proteomic profiles from 74 human tear samples, encompassing 2,134 unique proteins. The TearFluid Database serves as a foundational resource for biomarker discovery, comparative proteomics, and systems-level investigations in tear biology, offering the scientific community a robust and expandable platform for advancing tear fluid proteomics research. Database URL: https://tearfluid.org/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2026 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12808846/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145988626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-01-15DOI: 10.1093/database/baaf083
Marios Tomazou, Marilena M Bourdakou, Eleni Nicolaidou, Grigoris Georgiou, Kyriaki Savva, Efi Athieniti, Styliana Menelaou, Sotiroula Afxenti, George M Spyrou
H-SPAR DB is a comprehensive database designed to support space health research by providing a unified platform for data integration, analysis, and interpretation. The database simplifies the complex workflows associated with spaceflight-related biology studies by combining curated molecular lists, transcriptomic datasets from NASA's GeneLab, and user-uploaded data into a streamlined, user-friendly interface. H-SPAR DB enables researchers to perform differential expression analysis, set operations, and association analyses while also generating integrative knowledge graphs around a space-related biological theme. The platform reduces the time required for data gathering and processing by offering a single platform for data exploration, analysis, and visualization. By integrating interactive visualizations and data tables, H-SPAR DB facilitates the interpretation of results, ultimately enhancing the efficiency of space biology research and fostering discoveries that address human health challenges in space. Researchers can access H-SPAR DB freely at https://bioinformatics.cing.ac.cy/H-SPARDB/ without login or other requirements.
{"title":"H-SPAR DB: human spaceflight platform for analysis and research-an integrative omics database for space health.","authors":"Marios Tomazou, Marilena M Bourdakou, Eleni Nicolaidou, Grigoris Georgiou, Kyriaki Savva, Efi Athieniti, Styliana Menelaou, Sotiroula Afxenti, George M Spyrou","doi":"10.1093/database/baaf083","DOIUrl":"10.1093/database/baaf083","url":null,"abstract":"<p><p>H-SPAR DB is a comprehensive database designed to support space health research by providing a unified platform for data integration, analysis, and interpretation. The database simplifies the complex workflows associated with spaceflight-related biology studies by combining curated molecular lists, transcriptomic datasets from NASA's GeneLab, and user-uploaded data into a streamlined, user-friendly interface. H-SPAR DB enables researchers to perform differential expression analysis, set operations, and association analyses while also generating integrative knowledge graphs around a space-related biological theme. The platform reduces the time required for data gathering and processing by offering a single platform for data exploration, analysis, and visualization. By integrating interactive visualizations and data tables, H-SPAR DB facilitates the interpretation of results, ultimately enhancing the efficiency of space biology research and fostering discoveries that address human health challenges in space. Researchers can access H-SPAR DB freely at https://bioinformatics.cing.ac.cy/H-SPARDB/ without login or other requirements.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2026 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2026-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12805116/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145970438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}