首页 > 最新文献

Database: The Journal of Biological Databases and Curation最新文献

英文 中文
PLoV: a comprehensive database of genetic variants leading to pregnancy loss. PLoV:导致流产的基因变异的综合数据库。
IF 3.6 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-01-18 DOI: 10.1093/database/baaf037
Evgeniia M Maksiutenko, Igor V Bezdvornykh, Yury A Barbitoff, Yulia A Nasykhova, Andrey S Glotov

Pregnancy loss is an important reproductive health problem that affects many couples. Genetic factors play an important role in both spontaneous miscarriage and recurrent pregnancy loss, and the effect of genomic variants is recognized as one of the major causes of pregnancy loss in euploid foetuses. In this work, we extend our previous analysis of the genetic landscape of pregnancy loss and develop a Pregnancy Loss genetic Variant (PLoV) database to aggregate information about mutations that have been implicated in pregnancy loss. The database contains information about 534 genetic variants that have been observed in 421 cases across 47 studies, including foetus-only, parent-only, and trio-based studies. For each case, the database includes a detailed description of the phenotype, including ultrasound data (if provided in the original article). The genetic variants are scattered across all chromosomes in the human genome and affect a total of 292 unique genes. We provide a public access to the PLoV database at https://plovdb.ott.ru/. Database URL: https://plovdb.ott.ru/.

流产是影响许多夫妇的重要生殖健康问题。遗传因素在自然流产和复发性流产中都起着重要作用,基因组变异的影响被认为是整倍体胎儿流产的主要原因之一。在这项工作中,我们扩展了之前对妊娠丢失的遗传景观的分析,并开发了一个妊娠丢失遗传变异(PLoV)数据库,以汇总与妊娠丢失有关的突变信息。该数据库包含在47项研究中421例中观察到的534种遗传变异的信息,包括仅针对胎儿的研究、仅针对父母的研究和基于三人的研究。对于每个病例,数据库包括表型的详细描述,包括超声数据(如果在原始文章中提供)。这些基因变异分散在人类基因组的所有染色体上,总共影响292个独特的基因。我们在https://plovdb.ott.ru/上提供了对PLoV数据库的公共访问。数据库地址:https://plovdb.ott.ru/。
{"title":"PLoV: a comprehensive database of genetic variants leading to pregnancy loss.","authors":"Evgeniia M Maksiutenko, Igor V Bezdvornykh, Yury A Barbitoff, Yulia A Nasykhova, Andrey S Glotov","doi":"10.1093/database/baaf037","DOIUrl":"10.1093/database/baaf037","url":null,"abstract":"<p><p>Pregnancy loss is an important reproductive health problem that affects many couples. Genetic factors play an important role in both spontaneous miscarriage and recurrent pregnancy loss, and the effect of genomic variants is recognized as one of the major causes of pregnancy loss in euploid foetuses. In this work, we extend our previous analysis of the genetic landscape of pregnancy loss and develop a Pregnancy Loss genetic Variant (PLoV) database to aggregate information about mutations that have been implicated in pregnancy loss. The database contains information about 534 genetic variants that have been observed in 421 cases across 47 studies, including foetus-only, parent-only, and trio-based studies. For each case, the database includes a detailed description of the phenotype, including ultrasound data (if provided in the original article). The genetic variants are scattered across all chromosomes in the human genome and affect a total of 292 unique genes. We provide a public access to the PLoV database at https://plovdb.ott.ru/. Database URL: https://plovdb.ott.ru/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":" ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12462621/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144583339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
p53motifDB: integration of genomic information and tumour suppressor p53 binding motifs. p53motifDB:整合基因组信息和肿瘤抑制因子p53结合基序。
IF 3.6 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-01-18 DOI: 10.1093/database/baaf053
Gabriele Baniulyte, Sawyer M Hicks, Morgan A Sammons

The tumour suppressor gene TP53 encodes the DNA binding transcription factor p53 and is one of the most mutated genes in human cancer. Tumour suppressor activity requires binding of p53 to its DNA response elements and subsequent transcriptional activation of a diverse set of target genes. Despite decades of close study, the logic underlying p53 interactions with its numerous potential genomic binding sites and target genes is not yet fully understood. Here, we present a database of DNA and chromatin-based information focused on putative p53 binding sites in the human genome to allow users to generate and test new hypotheses related to p53 activity in the genome. Users can query genomic locations based on experimentally observed p53 binding, regulatory element activity, genetic variation, evolutionary conservation, chromatin modification state, and chromatin structure. We present multiple use cases demonstrating the utility of this database for generating novel biological hypotheses, such as chromatin-based determinants of p53 binding and potential cell type-specific p53 activity. All database information is also available as a precompiled SQLite database for use in local analysis or as a Shiny web application. Database URL: https://p53motifDB.its.albany.edu.

肿瘤抑制基因TP53编码DNA结合转录因子p53,是人类癌症中突变最多的基因之一。肿瘤抑制因子的活性需要p53与其DNA应答元件的结合,以及随后多种靶基因的转录激活。尽管经过数十年的深入研究,p53与其众多潜在的基因组结合位点和靶基因相互作用的逻辑尚未完全了解。在这里,我们提出了一个基于DNA和染色质的信息数据库,重点关注人类基因组中假定的p53结合位点,以允许用户生成和测试与基因组中p53活性相关的新假设。用户可以根据实验观察到的p53结合、调控元件活性、遗传变异、进化守恒、染色质修饰状态和染色质结构来查询基因组位置。我们提出了多个用例,展示了该数据库在产生新的生物学假设方面的效用,例如基于染色质的p53结合决定因素和潜在的细胞类型特异性p53活性。所有数据库信息也可以作为预编译的SQLite数据库用于本地分析或作为Shiny web应用程序。数据库地址:https://p53motifDB.its.albany.edu。
{"title":"p53motifDB: integration of genomic information and tumour suppressor p53 binding motifs.","authors":"Gabriele Baniulyte, Sawyer M Hicks, Morgan A Sammons","doi":"10.1093/database/baaf053","DOIUrl":"10.1093/database/baaf053","url":null,"abstract":"<p><p>The tumour suppressor gene TP53 encodes the DNA binding transcription factor p53 and is one of the most mutated genes in human cancer. Tumour suppressor activity requires binding of p53 to its DNA response elements and subsequent transcriptional activation of a diverse set of target genes. Despite decades of close study, the logic underlying p53 interactions with its numerous potential genomic binding sites and target genes is not yet fully understood. Here, we present a database of DNA and chromatin-based information focused on putative p53 binding sites in the human genome to allow users to generate and test new hypotheses related to p53 activity in the genome. Users can query genomic locations based on experimentally observed p53 binding, regulatory element activity, genetic variation, evolutionary conservation, chromatin modification state, and chromatin structure. We present multiple use cases demonstrating the utility of this database for generating novel biological hypotheses, such as chromatin-based determinants of p53 binding and potential cell type-specific p53 activity. All database information is also available as a precompiled SQLite database for use in local analysis or as a Shiny web application. Database URL: https://p53motifDB.its.albany.edu.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12462388/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145136787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Automated annotation and validation of human respiratory virus sequences using VADR. 使用VADR自动注释和验证人类呼吸道病毒序列。
IF 3.6 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-01-18 DOI: 10.1093/database/baaf078
Jeffrey Furlong, Stephanie Goya, Eric P Nawrocki, Vincent Calhoun, Eneida Hatcher, Linda Yankie, Alexander L Greninger

Accurate annotation of viral genomes is essential for reliable downstream analysis and public data sharing. While National Center for Biotechnology Information's (NCBI's) Viral Annotation DefineR (VADR) pipeline provides standardized annotation and quality control, it only supports six viral groups to date. Here, we developed and validated 12 new reference sequence-based VADR models targeting key human respiratory viruses: measles virus, mumps virus, rubella virus, human metapneumovirus, human parainfluenza virus types 1-4, and seasonal coronaviruses (229E, NL63, OC43, and HKU1). Model construction was guided by a comprehensive analysis of intra-species genomic and phylogenetic diversity, enabling the development of genotype-specific models associated with reference genomes that defined expected genome structure and annotation. Models were trained on 5327 publicly available complete viral genomes and tested on 372 viral genomes not yet submitted to GenBank. VADR passed 96.3% of publicly available viral genomes and 98.1% of viral genomes not in the training set, correctly identifying overlapping ORFs, mature peptides, and transcriptional slippage as well as genome misassemblies. VADR detected novel viral biology including the first reported HCoV-OC43 NS2 knockout in a human infection and novel G and SH coding sequence lengths in human metapneumovirus. These VADR models are publicly available and are used by NCBI curators as part of the GenBank submission pipeline, supporting high-quality, scalable viral genome annotation for research and public health.

病毒基因组的准确注释对于可靠的下游分析和公共数据共享至关重要。虽然国家生物技术信息中心(NCBI)的病毒注释定义器(VADR)管道提供了标准化的注释和质量控制,但迄今为止它只支持六个病毒组。在这里,我们开发并验证了12个新的基于参考序列的VADR模型,这些模型针对主要的人类呼吸道病毒:麻疹病毒、腮腺炎病毒、风疹病毒、人偏肺病毒、人副流感病毒1-4型和季节性冠状病毒(229E、NL63、OC43和HKU1)。模型构建以种内基因组和系统发育多样性的综合分析为指导,使与参考基因组相关的基因型特异性模型的开发能够定义预期的基因组结构和注释。模型在5327个公开的完整病毒基因组上进行了训练,并在372个尚未提交给GenBank的病毒基因组上进行了测试。VADR通过了96.3%公开可用的病毒基因组和98.1%不在训练集中的病毒基因组,正确识别了重叠的orf、成熟肽、转录滑动以及基因组错组装。VADR检测到新的病毒生物学,包括在人感染中首次报道的HCoV-OC43 NS2敲除,以及在人偏肺病毒中发现新的G和SH编码序列长度。这些VADR模型是公开可用的,并被NCBI管理员作为GenBank提交管道的一部分使用,支持高质量、可扩展的病毒基因组注释,用于研究和公共卫生。
{"title":"Automated annotation and validation of human respiratory virus sequences using VADR.","authors":"Jeffrey Furlong, Stephanie Goya, Eric P Nawrocki, Vincent Calhoun, Eneida Hatcher, Linda Yankie, Alexander L Greninger","doi":"10.1093/database/baaf078","DOIUrl":"10.1093/database/baaf078","url":null,"abstract":"<p><p>Accurate annotation of viral genomes is essential for reliable downstream analysis and public data sharing. While National Center for Biotechnology Information's (NCBI's) Viral Annotation DefineR (VADR) pipeline provides standardized annotation and quality control, it only supports six viral groups to date. Here, we developed and validated 12 new reference sequence-based VADR models targeting key human respiratory viruses: measles virus, mumps virus, rubella virus, human metapneumovirus, human parainfluenza virus types 1-4, and seasonal coronaviruses (229E, NL63, OC43, and HKU1). Model construction was guided by a comprehensive analysis of intra-species genomic and phylogenetic diversity, enabling the development of genotype-specific models associated with reference genomes that defined expected genome structure and annotation. Models were trained on 5327 publicly available complete viral genomes and tested on 372 viral genomes not yet submitted to GenBank. VADR passed 96.3% of publicly available viral genomes and 98.1% of viral genomes not in the training set, correctly identifying overlapping ORFs, mature peptides, and transcriptional slippage as well as genome misassemblies. VADR detected novel viral biology including the first reported HCoV-OC43 NS2 knockout in a human infection and novel G and SH coding sequence lengths in human metapneumovirus. These VADR models are publicly available and are used by NCBI curators as part of the GenBank submission pipeline, supporting high-quality, scalable viral genome annotation for research and public health.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12648392/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145602905","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HCoVDB: a comprehensive database encompassing viral genomes, drug targets, and therapeutics of human coronaviruses. HCoVDB:包含病毒基因组、药物靶点和人类冠状病毒治疗方法的综合数据库。
IF 3.6 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-01-18 DOI: 10.1093/database/baaf079
Pan Zhang, Tianxiang Ouyang, Xiaowen Hu, Jie Huang, Biao Xiao, Zhijian Huang, Xingyang Shi, Xinyi Wu, Linying Chen, Yongkang Wu, Hanyue Wang, Ying Zhang, Guangdi Li, Hui Liu, Lei Deng

Over the past few decades, coronavirus outbreaks have been reported globally. To date, seven human coronaviruses have been identified, among which only SARS-CoV-2 has been extensively studied, resulting in the development of several approved antiviral drugs. To effectively combat both current and emerging coronaviruses, there is an urgent need for a comprehensive database that consolidates information on all known human coronaviruses and their potential antiviral compounds. In response, we present HCoVDB-a comprehensive database that integrates genomic data, viral proteins, and antiviral agents with demonstrated in vitro or in vivo activity against the seven human coronaviruses. Compared to existing coronavirus databases, HCoVDB offers three distinctive features: (i) a curated collection and annotation of over 4 million genomic sequences from all seven human coronaviruses, including key amino acid substitutions that influence viral fitness, drug resistance, and immune evasion; (ii) a protein-drug docking platform for predicting the binding interactions of antiviral agents with demonstrated activity; and (iii) an extensive compilation of antiviral agents, along with their chemical properties and antiviral efficacy profiles (IC50, EC50, or CC50) as reported in the literature. Overall, HCoVDB provides a valuable resource for tracking the evolutionary dynamics of coronaviruses and accelerating the development of broad-spectrum antiviral agents against coronavirus infections in the future. Database URL: http://hcovdb.denglab.org/.

在过去的几十年里,全球都有冠状病毒爆发的报道。迄今为止,已经发现了7种人类冠状病毒,其中只有SARS-CoV-2得到了广泛的研究,从而开发了几种获批的抗病毒药物。为了有效地对抗现有的和新出现的冠状病毒,迫切需要建立一个综合数据库,整合所有已知的人类冠状病毒及其潜在抗病毒化合物的信息。为此,我们提出了hcovdb -一个综合数据库,整合了基因组数据、病毒蛋白和抗病毒药物,这些药物在体外或体内已证明具有抗七种人类冠状病毒的活性。与现有的冠状病毒数据库相比,HCoVDB提供了三个独特的特点:(i)收集和注释了来自所有七种人类冠状病毒的400多万个基因组序列,包括影响病毒适应性、耐药性和免疫逃避的关键氨基酸替换;(ii)一个蛋白质-药物对接平台,用于预测具有活性的抗病毒药物的结合相互作用;(iii)文献中报道的抗病毒药物的广泛汇编,以及它们的化学性质和抗病毒功效概况(IC50, EC50或CC50)。总的来说,HCoVDB为跟踪冠状病毒的进化动态和加速未来针对冠状病毒感染的广谱抗病毒药物的开发提供了宝贵的资源。数据库地址:http://hcovdb.denglab.org/。
{"title":"HCoVDB: a comprehensive database encompassing viral genomes, drug targets, and therapeutics of human coronaviruses.","authors":"Pan Zhang, Tianxiang Ouyang, Xiaowen Hu, Jie Huang, Biao Xiao, Zhijian Huang, Xingyang Shi, Xinyi Wu, Linying Chen, Yongkang Wu, Hanyue Wang, Ying Zhang, Guangdi Li, Hui Liu, Lei Deng","doi":"10.1093/database/baaf079","DOIUrl":"10.1093/database/baaf079","url":null,"abstract":"<p><p>Over the past few decades, coronavirus outbreaks have been reported globally. To date, seven human coronaviruses have been identified, among which only SARS-CoV-2 has been extensively studied, resulting in the development of several approved antiviral drugs. To effectively combat both current and emerging coronaviruses, there is an urgent need for a comprehensive database that consolidates information on all known human coronaviruses and their potential antiviral compounds. In response, we present HCoVDB-a comprehensive database that integrates genomic data, viral proteins, and antiviral agents with demonstrated in vitro or in vivo activity against the seven human coronaviruses. Compared to existing coronavirus databases, HCoVDB offers three distinctive features: (i) a curated collection and annotation of over 4 million genomic sequences from all seven human coronaviruses, including key amino acid substitutions that influence viral fitness, drug resistance, and immune evasion; (ii) a protein-drug docking platform for predicting the binding interactions of antiviral agents with demonstrated activity; and (iii) an extensive compilation of antiviral agents, along with their chemical properties and antiviral efficacy profiles (IC50, EC50, or CC50) as reported in the literature. Overall, HCoVDB provides a valuable resource for tracking the evolutionary dynamics of coronaviruses and accelerating the development of broad-spectrum antiviral agents against coronavirus infections in the future. Database URL: http://hcovdb.denglab.org/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12648390/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145602880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PASS2: update of database of structure-based sequence alignments. PASS2:更新基于结构的序列比对数据库。
IF 3.6 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-01-18 DOI: 10.1093/database/baaf072
Revathy Menon, Soumya Nayak, Rama Rajesh, Ramanathan Sowdhamini

Protein sequence alignments are evolutionary models and offer as starting points for the recognition of additional members of a homologous family and design of experiments. However, the accuracy of sequence alignments is obscured at the superfamily level due to distant relationships. Where structures of proteins are available, distantly related proteins can be aligned, guided by structural features. The Protein Alignment Organized as Structural Superfamilies (PASS2) database offers such structure-based sequence alignments for protein domains classified within superfamilies, as per the Structural Classification of Proteins extended (SCOPe) framework. The present update of PASS2 (PASS2.8) corresponds to the latest SCOPe release (version 2.08). This release comprises data for 26 690 protein domains exhibiting less than 40% sequence identity, organized into 2058 superfamilies. Several features derived from these alignments, including conserved secondary structural motifs, hidden Markov models (HMMs), conserved residues, and interactions across superfamilies, are also provided. For superfamilies containing divergent members, a k-means clustering algorithm has been employed to identify outliers and partition domains into split superfamilies. Novel features in this update include topological diagrams of the domains, potential interactors for each domain, and an updated methodology for identifying conserved interactions across superfamilies. This version of the database can be reached from http://caps.ncbs.res.in/pass2.

蛋白质序列比对是一种进化模型,为识别同源家族的其他成员和设计实验提供了起点。然而,由于远亲关系,序列比对的准确性在超家族水平上是模糊的。在蛋白质结构可用的地方,可以根据结构特征对远亲蛋白质进行排列。按照蛋白质结构分类扩展(SCOPe)框架,组织为结构超家族的蛋白质结构域(Protein Alignment Organized as Structural Superfamilies, PASS2)数据库提供了这种基于结构的序列比对。PASS2的当前更新(PASS2.8)对应于最新的SCOPe版本(2.08版本)。该版本包括26690个蛋白质结构域的数据,显示少于40%的序列同一性,组织成2058个超家族。本文还提供了这些比对的几个特征,包括保守的二级结构基序、隐马尔可夫模型(hmm)、保守残数和超家族之间的相互作用。对于包含不同成员的超家族,采用k-means聚类算法识别异常值并将域划分为分裂的超家族。本次更新的新特性包括域的拓扑图,每个域的潜在交互器,以及用于识别跨超家族的保守交互的更新方法。这个版本的数据库可以从http://caps.ncbs.res.in/pass2访问。
{"title":"PASS2: update of database of structure-based sequence alignments.","authors":"Revathy Menon, Soumya Nayak, Rama Rajesh, Ramanathan Sowdhamini","doi":"10.1093/database/baaf072","DOIUrl":"10.1093/database/baaf072","url":null,"abstract":"<p><p>Protein sequence alignments are evolutionary models and offer as starting points for the recognition of additional members of a homologous family and design of experiments. However, the accuracy of sequence alignments is obscured at the superfamily level due to distant relationships. Where structures of proteins are available, distantly related proteins can be aligned, guided by structural features. The Protein Alignment Organized as Structural Superfamilies (PASS2) database offers such structure-based sequence alignments for protein domains classified within superfamilies, as per the Structural Classification of Proteins extended (SCOPe) framework. The present update of PASS2 (PASS2.8) corresponds to the latest SCOPe release (version 2.08). This release comprises data for 26 690 protein domains exhibiting less than 40% sequence identity, organized into 2058 superfamilies. Several features derived from these alignments, including conserved secondary structural motifs, hidden Markov models (HMMs), conserved residues, and interactions across superfamilies, are also provided. For superfamilies containing divergent members, a k-means clustering algorithm has been employed to identify outliers and partition domains into split superfamilies. Novel features in this update include topological diagrams of the domains, potential interactors for each domain, and an updated methodology for identifying conserved interactions across superfamilies. This version of the database can be reached from http://caps.ncbs.res.in/pass2.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12612674/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145502494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel taxonomic database for eukaryotic mitochondrial cytochrome oxidase subunit I gene (eKOI), with a focus on protists diversity. 一个新的真核线粒体细胞色素氧化酶亚基I基因(eKOI)的分类数据库,重点关注原生生物的多样性。
IF 3.6 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-01-18 DOI: 10.1093/database/baaf057
Rubén González-Miguéns, Àlex Gàlvez-Morante, Margarita Skamnelou, Meritxell Antó, Elena Casacuberta, Daniel J Richter, Enrique Lara, Daniel Vaulot, Javier Del Campo, Iñaki Ruiz-Trillo

Metabarcoding has emerged as a robust method for assessing biodiversity patterns by retrieving environmental DNA directly from ecosystems. While the 18S rRNA gene is the primary genetic marker used for broad eukaryotic metabarcoding, it has limitations in resolving lower taxonomic levels. A potential alternative is the mitochondrial cytochrome oxidase subunit I (COI) gene because it offers resolution at the species level. However, the COI gene lacks a comprehensive, curated taxonomically informed database including protists. To address this gap, we introduce eKOI, a novel, curated COI gene database designed to enhance the taxonomic annotation for protists that can be used for COI-based metabarcoding. eKOI integrates data from GenBank and mitochondrial genomes, followed by extensive manual curation to eliminate redundancies and contaminants, recovering 15 947 sequences within 80 eukaryotic phyla. We validated the use of eKOI by reannotating several COI metabarcoding datasets, revealing previously unidentified protist biodiversity and demonstrating the database utility for community-level analyses.

元条形码已经成为一种通过直接从生态系统中检索环境DNA来评估生物多样性模式的强大方法。虽然18S rRNA基因是用于广泛真核生物元条形码编码的主要遗传标记,但它在解决较低分类水平方面存在局限性。一个潜在的替代方案是线粒体细胞色素氧化酶亚基I (COI)基因,因为它在物种水平上提供了解决方案。然而,COI基因缺乏一个包括原生生物在内的全面的、精心策划的分类信息数据库。为了解决这一差距,我们引入了eKOI,这是一个新的COI基因数据库,旨在增强原生生物的分类注释,可用于基于COI的元条形码。eKOI整合了来自GenBank和线粒体基因组的数据,随后进行了大量的人工管理,以消除冗余和污染物,在80个真核生物门中恢复了15947个序列。我们通过重新标注几个COI元条形码数据集来验证eKOI的使用,揭示了以前未识别的原生生物多样性,并展示了数据库在社区层面分析的实用性。
{"title":"A novel taxonomic database for eukaryotic mitochondrial cytochrome oxidase subunit I gene (eKOI), with a focus on protists diversity.","authors":"Rubén González-Miguéns, Àlex Gàlvez-Morante, Margarita Skamnelou, Meritxell Antó, Elena Casacuberta, Daniel J Richter, Enrique Lara, Daniel Vaulot, Javier Del Campo, Iñaki Ruiz-Trillo","doi":"10.1093/database/baaf057","DOIUrl":"10.1093/database/baaf057","url":null,"abstract":"<p><p>Metabarcoding has emerged as a robust method for assessing biodiversity patterns by retrieving environmental DNA directly from ecosystems. While the 18S rRNA gene is the primary genetic marker used for broad eukaryotic metabarcoding, it has limitations in resolving lower taxonomic levels. A potential alternative is the mitochondrial cytochrome oxidase subunit I (COI) gene because it offers resolution at the species level. However, the COI gene lacks a comprehensive, curated taxonomically informed database including protists. To address this gap, we introduce eKOI, a novel, curated COI gene database designed to enhance the taxonomic annotation for protists that can be used for COI-based metabarcoding. eKOI integrates data from GenBank and mitochondrial genomes, followed by extensive manual curation to eliminate redundancies and contaminants, recovering 15 947 sequences within 80 eukaryotic phyla. We validated the use of eKOI by reannotating several COI metabarcoding datasets, revealing previously unidentified protist biodiversity and demonstrating the database utility for community-level analyses.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12462617/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145136613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BrAPI v2: real-world applications for data integration and collaboration in the breeding and genetics community. BrAPI v2:育种和遗传社区数据集成和协作的实际应用。
IF 3.6 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-01-18 DOI: 10.1093/database/baaf048
Peter Selby, Rafael Abbeloos, Anne-Francoise Adam-Blondon, Francisco J Agosto-Pérez, Michael Alaux, Isabelle Alic, Khaled Al-Shamaa, Johan Steven Aparicio, Jan Erik Backlund, Aldrin Batac, Sebastian Beier, Gabriel Besombes, Alice Boizet, Matthijs Brouwer, Terry Casstevens, Arnaud Charleroy, Keo Corak, Chaney Courtney, Mariano Crimi, Gouripriya Davuluri, Kauê de Sousa, Jeremy Destin, Stijn Dhondt, Ajay Dhungana, Bert Droesbeke, Manuel Feser, Mirella Flores-Gonzalez, Valentin Guignon, Corina Habito, Asis Hallab, Jenna Hershberger, Puthick Hok, Amanda M Hulse-Kemp, Lynn Carol Johnson, Sook Jung, Paul Kersey, Andrzej Kilian, Patrick König, Suman Kumar, Josh Lamos-Sweeney, Laszlo Lang, Matthias Lange, Marie-Angélique Laporte, Taein Lee, Erwan Le Floch, Francisco López, Brandon Madriz, Dorrie Main, Marco Marsella, Maud Marty, Célia Michotey, Zachary Miller, Iain Milne, Lukas A Mueller, Moses Nderitu, Pascal Neveu, Nick Palladino, Tim Parsons, Cyril Pommier, Jean-François Rami, Sebastian Raubach, Trevor Rife, Kelly Robbins, Mathieu Rouard, Joseph Ruff, Guilhem Sempéré, Romil Mayank Shah, Paul Shaw, Becky Smith, Nahuel Soldevilla, Anne Tireau, Clarysabel Tovar, Grzegorz Uszynski, Vivian Bass Vega, Stephan Weise, Shawn C Yarnes, The BrAPI Consortium

Population growth and the impacts of climate change are placing increasing pressure on global agriculture and breeding programmes. Recent advancements in phenotyping techniques, genotyping technologies, and predictive modelling are accelerating genetic gains in breeding programmes, helping researchers and breeders develop improved crops more efficiently. However, these advancements have also led to an overwhelming torrent of fragmented data, creating significant challenges in data integration and management. To address this issue, the Breeding Application Programming Interface (BrAPI) project was established as a standardized data model for breeding data. BrAPI is an international, community-driven effort that facilitates interoperability among databases and tools, improving the sharing and interpretation of breeding-related data. This open-source standard is software-agnostic and can be used by anyone interested in breeding, phenotyping, germplasm, genotyping, and agronomy data management. This manuscript provides an overview of the BrAPI project, highlighting the significant progress made in the development of the data standard and the expansion of its community. It also presents a showcase of the wide variety of BrAPI-compatible tools that have been built to enhance breeding and research activities, demonstrating how the project is advancing agricultural innovation and data management practices.

人口增长和气候变化的影响正在给全球农业和育种计划带来越来越大的压力。表型分型技术、基因分型技术和预测建模的最新进展正在加速育种计划中的遗传成果,帮助研究人员和育种者更有效地开发改良作物。然而,这些进步也导致了大量碎片数据,给数据集成和管理带来了重大挑战。为解决这一问题,建立了育种应用程序编程接口(BrAPI)项目,作为育种数据的标准化数据模型。BrAPI是一个国际性的、社区驱动的项目,旨在促进数据库和工具之间的互操作性,改善育种相关数据的共享和解释。这个开源标准与软件无关,任何对育种、表型、种质、基因分型和农学数据管理感兴趣的人都可以使用它。这份手稿概述了BrAPI项目,强调了在数据标准的发展和社区扩展方面取得的重大进展。它还展示了为加强育种和研究活动而建立的各种与brapi兼容的工具,展示了该项目如何推动农业创新和数据管理实践。
{"title":"BrAPI v2: real-world applications for data integration and collaboration in the breeding and genetics community.","authors":"Peter Selby, Rafael Abbeloos, Anne-Francoise Adam-Blondon, Francisco J Agosto-Pérez, Michael Alaux, Isabelle Alic, Khaled Al-Shamaa, Johan Steven Aparicio, Jan Erik Backlund, Aldrin Batac, Sebastian Beier, Gabriel Besombes, Alice Boizet, Matthijs Brouwer, Terry Casstevens, Arnaud Charleroy, Keo Corak, Chaney Courtney, Mariano Crimi, Gouripriya Davuluri, Kauê de Sousa, Jeremy Destin, Stijn Dhondt, Ajay Dhungana, Bert Droesbeke, Manuel Feser, Mirella Flores-Gonzalez, Valentin Guignon, Corina Habito, Asis Hallab, Jenna Hershberger, Puthick Hok, Amanda M Hulse-Kemp, Lynn Carol Johnson, Sook Jung, Paul Kersey, Andrzej Kilian, Patrick König, Suman Kumar, Josh Lamos-Sweeney, Laszlo Lang, Matthias Lange, Marie-Angélique Laporte, Taein Lee, Erwan Le Floch, Francisco López, Brandon Madriz, Dorrie Main, Marco Marsella, Maud Marty, Célia Michotey, Zachary Miller, Iain Milne, Lukas A Mueller, Moses Nderitu, Pascal Neveu, Nick Palladino, Tim Parsons, Cyril Pommier, Jean-François Rami, Sebastian Raubach, Trevor Rife, Kelly Robbins, Mathieu Rouard, Joseph Ruff, Guilhem Sempéré, Romil Mayank Shah, Paul Shaw, Becky Smith, Nahuel Soldevilla, Anne Tireau, Clarysabel Tovar, Grzegorz Uszynski, Vivian Bass Vega, Stephan Weise, Shawn C Yarnes, The BrAPI Consortium","doi":"10.1093/database/baaf048","DOIUrl":"10.1093/database/baaf048","url":null,"abstract":"<p><p>Population growth and the impacts of climate change are placing increasing pressure on global agriculture and breeding programmes. Recent advancements in phenotyping techniques, genotyping technologies, and predictive modelling are accelerating genetic gains in breeding programmes, helping researchers and breeders develop improved crops more efficiently. However, these advancements have also led to an overwhelming torrent of fragmented data, creating significant challenges in data integration and management. To address this issue, the Breeding Application Programming Interface (BrAPI) project was established as a standardized data model for breeding data. BrAPI is an international, community-driven effort that facilitates interoperability among databases and tools, improving the sharing and interpretation of breeding-related data. This open-source standard is software-agnostic and can be used by anyone interested in breeding, phenotyping, germplasm, genotyping, and agronomy data management. This manuscript provides an overview of the BrAPI project, highlighting the significant progress made in the development of the data standard and the expansion of its community. It also presents a showcase of the wide variety of BrAPI-compatible tools that have been built to enhance breeding and research activities, demonstrating how the project is advancing agricultural innovation and data management practices.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12462623/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145136701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GExplore 1.5: a comprehensive Caenorhabditis elegans database for the analysis of gene function with a new user-friendly web interface. GExplore 1.5:一个全面的秀丽隐杆线虫基因功能分析数据库,具有新的用户友好的web界面。
IF 3.6 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-01-18 DOI: 10.1093/database/baaf044
Harald Hutter, Mehrdad Moosavi, Nelly Mafi

GExplore is an online tool to assist with large-scale data mining of selected datasets related to gene and protein function in Caenorhabditis elegans. Here, we describe the current version GExplore 1.5, which contains new datasets and display options as well as a completely redesigned web interface. GExplore now consists of six databases. The gene database contains protein domain information, general expression, and phenotype data as well as interacting genes, gene ontology annotations, and disease associations. The mutation database contains a curated list of more than 200 000 mutations affecting the protein sequences of all protein-coding genes. The protein database contains proteome data from 19 different nematode species, four genetic model organisms and the human proteome for comparison. Three genome-scale RNAseq expression databases contain expression profiles of different developmental stages from embryo to adult, tissues-specific expression profiles at the L2 stage, and expression profiles of the major tissues in the developing embryo at five different time points from gastrulation to the beginning of terminal differentiation. The web-based user interface has been completely redeveloped for the current version. The search interfaces allow users to explore content of the individual databases in detail. The interactive display pages enable the user to fine-tune the results, display additional data, and download the results. GExplore is a tool to quickly obtain an overview of biological and biochemical functions of large groups of genes or identify genes with a certain combination of features for further experimental analysis. Database URL: https://genome.science.sfu.ca/gexplore.

GExplore是一个在线工具,用于协助对秀丽隐杆线虫基因和蛋白质功能相关的选定数据集进行大规模数据挖掘。在这里,我们介绍当前版本的GExplore 1.5,它包含了新的数据集和显示选项,以及一个完全重新设计的web界面。GExplore现在由六个数据库组成。基因数据库包含蛋白质结构域信息、一般表达和表型数据以及相互作用基因、基因本体注释和疾病关联。突变数据库包含超过20万个影响所有蛋白质编码基因的蛋白质序列的突变。蛋白质数据库包含19种不同线虫物种的蛋白质组数据,4种遗传模式生物和人类蛋白质组进行比较。三个基因组尺度的RNAseq表达数据库包含从胚胎到成体的不同发育阶段的表达谱,L2阶段的组织特异性表达谱,以及从原肠形成到终末分化开始的5个不同时间点的发育胚胎主要组织的表达谱。基于web的用户界面已经为当前版本完全重新开发。搜索界面允许用户详细地探索各个数据库的内容。交互式显示页面使用户能够微调结果、显示其他数据和下载结果。GExplore是一种快速获得大量基因的生物学和生化功能概览或识别具有一定组合特征的基因以进行进一步实验分析的工具。数据库地址:https://genome.science.sfu.ca/gexplore。
{"title":"GExplore 1.5: a comprehensive Caenorhabditis elegans database for the analysis of gene function with a new user-friendly web interface.","authors":"Harald Hutter, Mehrdad Moosavi, Nelly Mafi","doi":"10.1093/database/baaf044","DOIUrl":"10.1093/database/baaf044","url":null,"abstract":"<p><p>GExplore is an online tool to assist with large-scale data mining of selected datasets related to gene and protein function in Caenorhabditis elegans. Here, we describe the current version GExplore 1.5, which contains new datasets and display options as well as a completely redesigned web interface. GExplore now consists of six databases. The gene database contains protein domain information, general expression, and phenotype data as well as interacting genes, gene ontology annotations, and disease associations. The mutation database contains a curated list of more than 200 000 mutations affecting the protein sequences of all protein-coding genes. The protein database contains proteome data from 19 different nematode species, four genetic model organisms and the human proteome for comparison. Three genome-scale RNAseq expression databases contain expression profiles of different developmental stages from embryo to adult, tissues-specific expression profiles at the L2 stage, and expression profiles of the major tissues in the developing embryo at five different time points from gastrulation to the beginning of terminal differentiation. The web-based user interface has been completely redeveloped for the current version. The search interfaces allow users to explore content of the individual databases in detail. The interactive display pages enable the user to fine-tune the results, display additional data, and download the results. GExplore is a tool to quickly obtain an overview of biological and biochemical functions of large groups of genes or identify genes with a certain combination of features for further experimental analysis. Database URL: https://genome.science.sfu.ca/gexplore.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12462625/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145136731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Empirical substitution models of protein evolution: database, relationships, and modeling considerations. 蛋白质进化的经验替代模型:数据库、关系和建模考虑。
IF 3.6 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-01-18 DOI: 10.1093/database/baaf052
Paula Iglesias-Rivas, Roberto Del Amparo, Javier A Cabaleiro, Miguel Arenas

Substitution models of protein evolution describe the patterns of amino acid substitutions over evolutionary time and are fundamental for probabilistic methods of phylogenetic inference. At the protein level, a variety of substitution models are available, but only empirical substitution models are well established in phylogenetics due to their mathematical simplicity. Despite their importance, a database compiling the large number of currently available empirical substitution models of protein evolution is lacking, although such a resource could facilitate access, assessment, and subsequent implementation of these models into phylogenetic frameworks. Besides, little is known about formal comparisons between the current set of empirical substitution models. We present EModelDB, a database of empirical substitution models of protein evolution required for probabilistic protein phylogenetics that includes the corresponding exchangeability matrices, model classification, and model-specific biological information. The database is integrated into a graphical user interface, written in Python and SQL, that facilitates its usability. We also compared common empirical substitution models in terms of the distance between their relative rates of amino acid substitution and amino frequencies at equilibrium. We found that substitution models derived from proteins related in nature tend to cluster together, reflecting similar evolutionary patterns. Indeed, we evaluated the empirical substitution models in terms of the folding stability of the derived modeled proteins and found that they generally produce less stable proteins compared to real proteins, suggesting that substitution models with additional evolutionary constraints can be preferred for studying protein evolution accounting for folding stability. Database URL: https://github.com/Paula-Iglesias-Rivas/EModelDB.

蛋白质进化的替代模型描述了氨基酸在进化时间中的替代模式,是系统发育推断的概率方法的基础。在蛋白质水平上,有多种替代模型可用,但由于其数学简单,只有经验替代模型在系统发育中得到了很好的建立。尽管它们很重要,但目前缺乏一个汇编大量目前可用的蛋白质进化经验替代模型的数据库,尽管这样的资源可以促进这些模型在系统发育框架中的获取、评估和后续实施。此外,对现有的一组经验替代模型之间的形式比较知之甚少。我们提出了EModelDB,这是一个蛋白质进化的经验替代模型数据库,用于概率蛋白质系统发育,包括相应的可交换性矩阵、模型分类和模型特异性生物信息。数据库被集成到一个图形用户界面中,用Python和SQL编写,方便了它的可用性。我们还比较了常见的经验取代模型在它们的相对氨基酸取代率和氨基酸频率之间的距离平衡。我们发现从自然界相关蛋白质衍生的替代模型倾向于聚集在一起,反映了相似的进化模式。事实上,我们根据衍生模型蛋白质的折叠稳定性评估了经验替代模型,发现它们通常产生的蛋白质比真实蛋白质更不稳定,这表明具有额外进化约束的替代模型可以优先用于研究蛋白质进化对折叠稳定性的影响。数据库地址:https://github.com/Paula-Iglesias-Rivas/EModelDB。
{"title":"Empirical substitution models of protein evolution: database, relationships, and modeling considerations.","authors":"Paula Iglesias-Rivas, Roberto Del Amparo, Javier A Cabaleiro, Miguel Arenas","doi":"10.1093/database/baaf052","DOIUrl":"10.1093/database/baaf052","url":null,"abstract":"<p><p>Substitution models of protein evolution describe the patterns of amino acid substitutions over evolutionary time and are fundamental for probabilistic methods of phylogenetic inference. At the protein level, a variety of substitution models are available, but only empirical substitution models are well established in phylogenetics due to their mathematical simplicity. Despite their importance, a database compiling the large number of currently available empirical substitution models of protein evolution is lacking, although such a resource could facilitate access, assessment, and subsequent implementation of these models into phylogenetic frameworks. Besides, little is known about formal comparisons between the current set of empirical substitution models. We present EModelDB, a database of empirical substitution models of protein evolution required for probabilistic protein phylogenetics that includes the corresponding exchangeability matrices, model classification, and model-specific biological information. The database is integrated into a graphical user interface, written in Python and SQL, that facilitates its usability. We also compared common empirical substitution models in terms of the distance between their relative rates of amino acid substitution and amino frequencies at equilibrium. We found that substitution models derived from proteins related in nature tend to cluster together, reflecting similar evolutionary patterns. Indeed, we evaluated the empirical substitution models in terms of the folding stability of the derived modeled proteins and found that they generally produce less stable proteins compared to real proteins, suggesting that substitution models with additional evolutionary constraints can be preferred for studying protein evolution accounting for folding stability. Database URL: https://github.com/Paula-Iglesias-Rivas/EModelDB.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12462380/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145136768","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MRdb: a comprehensive database of univariate and multivariate Mendelian randomization with large-scale GWAS summary data. MRdb:包含大规模GWAS汇总数据的单变量和多变量孟德尔随机化的综合数据库。
IF 3.6 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-01-18 DOI: 10.1093/database/baaf054
Qian Liu, Yujie Zhang, Houxing Li, Jiatong Li, Mengyu Xin, Rui Sun, Yifan Dai, Xinxin Shan, Yuting He, Borui Xu, Shangwei Ning, Peng Wang, Qiuyan Guo

Recent advancements highlight the importance of large-scale causal inference in elucidating disease mechanisms and guiding public health strategies. Mendelian randomization (MR) has become a cornerstone method for identifying causal relationships by leveraging genetic variants as instrumental variables. However, existing tools lack flexibility for multivariable analyses and fail to integrate diverse datasets effectively. To address these challenges, we introduce MRdb, a comprehensive database designed for conducting both univariable and multivariable MR analyses. MRdb encompasses 12 distinct categories of exposure data, including but not limited to 19 126 expression quantitative trait loci genes, 4907 plasma proteins, and 1400 plasma metabolites. Additionally, it integrates 48 507 disease outcomes sourced from FinnGen R10 and the IEU Open GWAS Project. MRdb offers robust data preprocessing features, including handling missing statistics, harmonizing datasets, and selecting instrumental variables to ensure high-quality analyses. Collectively, MRdb bridges the gaps in existing tools by integrating diverse datasets with user-friendly functionalities, empowering researchers to explore complex causal mechanisms.

最近的进展突出了大规模因果推理在阐明疾病机制和指导公共卫生战略方面的重要性。孟德尔随机化(MR)已经成为通过利用遗传变异作为工具变量来识别因果关系的基础方法。然而,现有的工具缺乏多变量分析的灵活性,不能有效地整合不同的数据集。为了应对这些挑战,我们介绍了MRdb,这是一个旨在进行单变量和多变量MR分析的综合数据库。MRdb包含12个不同类别的暴露数据,包括但不限于19126个表达数量性状位点基因,4907个血浆蛋白和1400个血浆代谢物。此外,它还整合了来自FinnGen R10和IEU开放GWAS项目的48507种疾病结果。MRdb提供了强大的数据预处理功能,包括处理丢失的统计数据、协调数据集和选择工具变量以确保高质量的分析。总的来说,MRdb通过将不同的数据集与用户友好的功能集成在一起,弥补了现有工具的不足,使研究人员能够探索复杂的因果机制。
{"title":"MRdb: a comprehensive database of univariate and multivariate Mendelian randomization with large-scale GWAS summary data.","authors":"Qian Liu, Yujie Zhang, Houxing Li, Jiatong Li, Mengyu Xin, Rui Sun, Yifan Dai, Xinxin Shan, Yuting He, Borui Xu, Shangwei Ning, Peng Wang, Qiuyan Guo","doi":"10.1093/database/baaf054","DOIUrl":"10.1093/database/baaf054","url":null,"abstract":"<p><p>Recent advancements highlight the importance of large-scale causal inference in elucidating disease mechanisms and guiding public health strategies. Mendelian randomization (MR) has become a cornerstone method for identifying causal relationships by leveraging genetic variants as instrumental variables. However, existing tools lack flexibility for multivariable analyses and fail to integrate diverse datasets effectively. To address these challenges, we introduce MRdb, a comprehensive database designed for conducting both univariable and multivariable MR analyses. MRdb encompasses 12 distinct categories of exposure data, including but not limited to 19 126 expression quantitative trait loci genes, 4907 plasma proteins, and 1400 plasma metabolites. Additionally, it integrates 48 507 disease outcomes sourced from FinnGen R10 and the IEU Open GWAS Project. MRdb offers robust data preprocessing features, including handling missing statistics, harmonizing datasets, and selecting instrumental variables to ensure high-quality analyses. Collectively, MRdb bridges the gaps in existing tools by integrating diverse datasets with user-friendly functionalities, empowering researchers to explore complex causal mechanisms.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2025-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12462376/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145136794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Database: The Journal of Biological Databases and Curation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1