首页 > 最新文献

Database: The Journal of Biological Databases and Curation最新文献

英文 中文
Correction to: CardioHotspots: a database of mutational hotspots for cardiac disorders. cardiohotspot:心脏疾病突变热点数据库。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-02-17 DOI: 10.1093/database/baaf014
{"title":"Correction to: CardioHotspots: a database of mutational hotspots for cardiac disorders.","authors":"","doi":"10.1093/database/baaf014","DOIUrl":"https://doi.org/10.1093/database/baaf014","url":null,"abstract":"","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143972477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the performance of generative artificial intelligence in retrieving information against manually curated genetic and genomic data. 评估生成式人工智能在检索人工整理的遗传和基因组数据信息方面的性能。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-02-17 DOI: 10.1093/database/baaf011
Elly Poretsky, Victoria C Blake, Carson M Andorf, Taner Z Sen

Curated resources at centralized repositories provide high-value service to users by enhancing data veracity. Curation, however, comes with a cost, as it requires dedicated time and effort from personnel with deep domain knowledge. In this paper, we investigate the performance of a large language model (LLM), specifically generative pre-trained transformer (GPT)-3.5 and GPT-4, in extracting and presenting data against a human curator. In order to accomplish this task, we used a small set of journal articles on wheat and barley genetics, focusing on traits, such as salinity tolerance and disease resistance, which are becoming more important. The 36 papers were then curated by a professional curator for the GrainGenes database (https://wheat.pw.usda.gov). In parallel, we developed a GPT-based retrieval-augmented generation question-answering system and compared how GPT performed in answering questions about traits and quantitative trait loci (QTLs). Our findings show that on average GPT-4 correctly categorized manuscripts 97% of the time, correctly extracted 80% of traits, and 61% of marker-trait associations. Furthermore, we assessed the ability of a GPT-based DataFrame agent to filter and summarize curated wheat genetics data, showing the potential of human and computational curators working side-by-side. In one case study, our findings show that GPT-4 was able to retrieve up to 91% of disease related, human-curated QTLs across the whole genome, and up to 96% across a specific genomic region through prompt engineering. Also, we observed that across most tasks, GPT-4 consistently outperformed GPT-3.5 while generating less hallucinations, suggesting that improvements in LLM models will make generative artificial intelligence a much more accurate companion for curators in extracting information from scientific literature. Despite their limitations, LLMs demonstrated a potential to extract and present information to curators and users of biological databases, as long as users are aware of potential inaccuracies and the possibility of incomplete information extraction.

集中存储库中的策划资源通过增强数据的准确性为用户提供高价值的服务。然而,管理是有成本的,因为它需要具有深厚领域知识的人员投入时间和精力。在本文中,我们研究了大型语言模型(LLM)的性能,特别是生成预训练转换器(GPT)-3.5和GPT-4,在针对人类管理员提取和呈现数据方面。为了完成这项任务,我们使用了一小部分关于小麦和大麦遗传学的期刊文章,重点关注诸如耐盐性和抗病性等性状,这些性状正变得越来越重要。随后,这36篇论文由GrainGenes数据库(https://wheat.pw.usda.gov)的专业管理员进行了整理。同时,我们开发了一个基于GPT的检索增强生成问答系统,并比较了GPT在回答性状和数量性状位点(qtl)问题中的表现。我们的研究结果表明,平均而言,GPT-4对手稿的正确分类率为97%,正确提取了80%的特征,并正确提取了61%的标记-性状关联。此外,我们评估了基于gpt的DataFrame代理过滤和汇总整理小麦遗传数据的能力,显示了人类和计算管理员并肩工作的潜力。在一个案例研究中,我们的研究结果表明,GPT-4能够在整个基因组中检索高达91%的与疾病相关的、人类策划的qtl,通过快速工程在特定基因组区域检索高达96%的qtl。此外,我们观察到,在大多数任务中,GPT-4的表现始终优于GPT-3.5,同时产生的幻觉更少,这表明LLM模型的改进将使生成式人工智能成为策展人从科学文献中提取信息的更准确的伙伴。尽管存在局限性,法学硕士展示了提取和呈现信息给生物数据库管理员和用户的潜力,只要用户意识到潜在的不准确性和信息提取不完整的可能性。
{"title":"Assessing the performance of generative artificial intelligence in retrieving information against manually curated genetic and genomic data.","authors":"Elly Poretsky, Victoria C Blake, Carson M Andorf, Taner Z Sen","doi":"10.1093/database/baaf011","DOIUrl":"https://doi.org/10.1093/database/baaf011","url":null,"abstract":"<p><p>Curated resources at centralized repositories provide high-value service to users by enhancing data veracity. Curation, however, comes with a cost, as it requires dedicated time and effort from personnel with deep domain knowledge. In this paper, we investigate the performance of a large language model (LLM), specifically generative pre-trained transformer (GPT)-3.5 and GPT-4, in extracting and presenting data against a human curator. In order to accomplish this task, we used a small set of journal articles on wheat and barley genetics, focusing on traits, such as salinity tolerance and disease resistance, which are becoming more important. The 36 papers were then curated by a professional curator for the GrainGenes database (https://wheat.pw.usda.gov). In parallel, we developed a GPT-based retrieval-augmented generation question-answering system and compared how GPT performed in answering questions about traits and quantitative trait loci (QTLs). Our findings show that on average GPT-4 correctly categorized manuscripts 97% of the time, correctly extracted 80% of traits, and 61% of marker-trait associations. Furthermore, we assessed the ability of a GPT-based DataFrame agent to filter and summarize curated wheat genetics data, showing the potential of human and computational curators working side-by-side. In one case study, our findings show that GPT-4 was able to retrieve up to 91% of disease related, human-curated QTLs across the whole genome, and up to 96% across a specific genomic region through prompt engineering. Also, we observed that across most tasks, GPT-4 consistently outperformed GPT-3.5 while generating less hallucinations, suggesting that improvements in LLM models will make generative artificial intelligence a much more accurate companion for curators in extracting information from scientific literature. Despite their limitations, LLMs demonstrated a potential to extract and present information to curators and users of biological databases, as long as users are aware of potential inaccuracies and the possibility of incomplete information extraction.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: CardioHotspots: a database of mutational hotspots for cardiac disorders. cardiohotspot:心脏疾病突变热点数据库。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-02-17 DOI: 10.1093/database/baaf014
{"title":"Correction to: CardioHotspots: a database of mutational hotspots for cardiac disorders.","authors":"","doi":"10.1093/database/baaf014","DOIUrl":"https://doi.org/10.1093/database/baaf014","url":null,"abstract":"","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Correction to: CardioHotspots: a database of mutational hotspots for cardiac disorders. cardiohotspot:心脏疾病突变热点数据库。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-02-17 DOI: 10.1093/database/baaf014
{"title":"Correction to: CardioHotspots: a database of mutational hotspots for cardiac disorders.","authors":"","doi":"10.1093/database/baaf014","DOIUrl":"10.1093/database/baaf014","url":null,"abstract":"","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11833235/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143440159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing the performance of generative artificial intelligence in retrieving information against manually curated genetic and genomic data. 评估生成式人工智能在检索人工整理的遗传和基因组数据信息方面的性能。
IF 3.6 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-02-17 DOI: 10.1093/database/baaf011
Elly Poretsky, Victoria C Blake, Carson M Andorf, Taner Z Sen

Curated resources at centralized repositories provide high-value service to users by enhancing data veracity. Curation, however, comes with a cost, as it requires dedicated time and effort from personnel with deep domain knowledge. In this paper, we investigate the performance of a large language model (LLM), specifically generative pre-trained transformer (GPT)-3.5 and GPT-4, in extracting and presenting data against a human curator. In order to accomplish this task, we used a small set of journal articles on wheat and barley genetics, focusing on traits, such as salinity tolerance and disease resistance, which are becoming more important. The 36 papers were then curated by a professional curator for the GrainGenes database (https://wheat.pw.usda.gov). In parallel, we developed a GPT-based retrieval-augmented generation question-answering system and compared how GPT performed in answering questions about traits and quantitative trait loci (QTLs). Our findings show that on average GPT-4 correctly categorized manuscripts 97% of the time, correctly extracted 80% of traits, and 61% of marker-trait associations. Furthermore, we assessed the ability of a GPT-based DataFrame agent to filter and summarize curated wheat genetics data, showing the potential of human and computational curators working side-by-side. In one case study, our findings show that GPT-4 was able to retrieve up to 91% of disease related, human-curated QTLs across the whole genome, and up to 96% across a specific genomic region through prompt engineering. Also, we observed that across most tasks, GPT-4 consistently outperformed GPT-3.5 while generating less hallucinations, suggesting that improvements in LLM models will make generative artificial intelligence a much more accurate companion for curators in extracting information from scientific literature. Despite their limitations, LLMs demonstrated a potential to extract and present information to curators and users of biological databases, as long as users are aware of potential inaccuracies and the possibility of incomplete information extraction.

集中存储库中的策划资源通过增强数据的准确性为用户提供高价值的服务。然而,管理是有成本的,因为它需要具有深厚领域知识的人员投入时间和精力。在本文中,我们研究了大型语言模型(LLM)的性能,特别是生成预训练转换器(GPT)-3.5和GPT-4,在针对人类管理员提取和呈现数据方面。为了完成这项任务,我们使用了一小部分关于小麦和大麦遗传学的期刊文章,重点关注诸如耐盐性和抗病性等性状,这些性状正变得越来越重要。随后,这36篇论文由GrainGenes数据库(https://wheat.pw.usda.gov)的专业管理员进行了整理。同时,我们开发了一个基于GPT的检索增强生成问答系统,并比较了GPT在回答性状和数量性状位点(qtl)问题中的表现。我们的研究结果表明,平均而言,GPT-4对手稿的正确分类率为97%,正确提取了80%的特征,并正确提取了61%的标记-性状关联。此外,我们评估了基于gpt的DataFrame代理过滤和汇总整理小麦遗传数据的能力,显示了人类和计算管理员并肩工作的潜力。在一个案例研究中,我们的研究结果表明,GPT-4能够在整个基因组中检索高达91%的与疾病相关的、人类策划的qtl,通过快速工程在特定基因组区域检索高达96%的qtl。此外,我们观察到,在大多数任务中,GPT-4的表现始终优于GPT-3.5,同时产生的幻觉更少,这表明LLM模型的改进将使生成式人工智能成为策展人从科学文献中提取信息的更准确的伙伴。尽管存在局限性,法学硕士展示了提取和呈现信息给生物数据库管理员和用户的潜力,只要用户意识到潜在的不准确性和信息提取不完整的可能性。
{"title":"Assessing the performance of generative artificial intelligence in retrieving information against manually curated genetic and genomic data.","authors":"Elly Poretsky, Victoria C Blake, Carson M Andorf, Taner Z Sen","doi":"10.1093/database/baaf011","DOIUrl":"10.1093/database/baaf011","url":null,"abstract":"<p><p>Curated resources at centralized repositories provide high-value service to users by enhancing data veracity. Curation, however, comes with a cost, as it requires dedicated time and effort from personnel with deep domain knowledge. In this paper, we investigate the performance of a large language model (LLM), specifically generative pre-trained transformer (GPT)-3.5 and GPT-4, in extracting and presenting data against a human curator. In order to accomplish this task, we used a small set of journal articles on wheat and barley genetics, focusing on traits, such as salinity tolerance and disease resistance, which are becoming more important. The 36 papers were then curated by a professional curator for the GrainGenes database (https://wheat.pw.usda.gov). In parallel, we developed a GPT-based retrieval-augmented generation question-answering system and compared how GPT performed in answering questions about traits and quantitative trait loci (QTLs). Our findings show that on average GPT-4 correctly categorized manuscripts 97% of the time, correctly extracted 80% of traits, and 61% of marker-trait associations. Furthermore, we assessed the ability of a GPT-based DataFrame agent to filter and summarize curated wheat genetics data, showing the potential of human and computational curators working side-by-side. In one case study, our findings show that GPT-4 was able to retrieve up to 91% of disease related, human-curated QTLs across the whole genome, and up to 96% across a specific genomic region through prompt engineering. Also, we observed that across most tasks, GPT-4 consistently outperformed GPT-3.5 while generating less hallucinations, suggesting that improvements in LLM models will make generative artificial intelligence a much more accurate companion for curators in extracting information from scientific literature. Despite their limitations, LLMs demonstrated a potential to extract and present information to curators and users of biological databases, as long as users are aware of potential inaccuracies and the possibility of incomplete information extraction.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11833239/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143440157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TumorAgDB1.0: tumor neoantigen database platform. 肿瘤新抗原数据库平台。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-02-13 DOI: 10.1093/database/baaf010
Yan Shao, Yang Gao, Ling-Yu Wu, Shu-Guang Ge, Peng-Bo Wen

With the continuous advancements in cancer immunotherapy, neoantigen-based therapies have demonstrated remarkable clinical efficacy. However, accurately predicting the immunogenicity of neoantigens remains a significant challenge. This is mainly due to two core factors: the scarcity of high-quality neoantigen datasets and the limited prediction accuracy of existing immunogenicity prediction tools. This study addressed these issues through several key steps. First, it collected and organized immunogenic neoantigen peptide data from publicly available literature and neoantigen databases. Second, it analyzed the data to identify key features influencing neoantigen immunogenicity prediction. Finally, it integrated existing prediction tools to create TumorAgDB1.0, a comprehensive tumor neoantigen database. TumorAgDB1.0 offers a user-friendly platform. Users can efficiently search for neoantigen data using parameters like amino acid sequence and peptide length. The platform also offers detailed information on the characteristics of neoantigens and tools for predicting tumor neoantigen immunogenicity. Additionally, the database includes a data download function, allowing researchers to easily access high-quality data to support the development and improvement of neoantigen immunogenicity prediction tools. In summary, TumorAgDB1.0 is a powerful tool for neoantigen screening and validation in tumor immunotherapy. It offers strong support to researchers. Database URL: https://tumoragdb.com.cn.

随着肿瘤免疫治疗的不断进步,以新抗原为基础的治疗已显示出显著的临床疗效。然而,准确预测新抗原的免疫原性仍然是一个重大挑战。这主要是由于两个核心因素:缺乏高质量的新抗原数据集和现有免疫原性预测工具的预测精度有限。本研究通过几个关键步骤解决了这些问题。首先,从公开的文献和新抗原数据库中收集和整理免疫原性新抗原肽数据。其次,对数据进行分析,确定影响新抗原免疫原性预测的关键特征。最后,结合现有的预测工具,构建肿瘤新抗原综合数据库TumorAgDB1.0。TumorAgDB1.0提供了一个用户友好的平台。用户可以使用氨基酸序列和肽长度等参数有效地搜索新抗原数据。该平台还提供了新抗原特征的详细信息和预测肿瘤新抗原免疫原性的工具。此外,该数据库还包括数据下载功能,允许研究人员轻松访问高质量数据,以支持新抗原免疫原性预测工具的开发和改进。综上所述,TumorAgDB1.0是肿瘤免疫治疗中新抗原筛选和验证的有力工具。它为研究人员提供了强有力的支持。数据库地址:https://tumoragdb.com.cn。
{"title":"TumorAgDB1.0: tumor neoantigen database platform.","authors":"Yan Shao, Yang Gao, Ling-Yu Wu, Shu-Guang Ge, Peng-Bo Wen","doi":"10.1093/database/baaf010","DOIUrl":"10.1093/database/baaf010","url":null,"abstract":"<p><p>With the continuous advancements in cancer immunotherapy, neoantigen-based therapies have demonstrated remarkable clinical efficacy. However, accurately predicting the immunogenicity of neoantigens remains a significant challenge. This is mainly due to two core factors: the scarcity of high-quality neoantigen datasets and the limited prediction accuracy of existing immunogenicity prediction tools. This study addressed these issues through several key steps. First, it collected and organized immunogenic neoantigen peptide data from publicly available literature and neoantigen databases. Second, it analyzed the data to identify key features influencing neoantigen immunogenicity prediction. Finally, it integrated existing prediction tools to create TumorAgDB1.0, a comprehensive tumor neoantigen database. TumorAgDB1.0 offers a user-friendly platform. Users can efficiently search for neoantigen data using parameters like amino acid sequence and peptide length. The platform also offers detailed information on the characteristics of neoantigens and tools for predicting tumor neoantigen immunogenicity. Additionally, the database includes a data download function, allowing researchers to easily access high-quality data to support the development and improvement of neoantigen immunogenicity prediction tools. In summary, TumorAgDB1.0 is a powerful tool for neoantigen screening and validation in tumor immunotherapy. It offers strong support to researchers. Database URL: https://tumoragdb.com.cn.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11836679/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143448503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TumorAgDB1.0: tumor neoantigen database platform. 肿瘤新抗原数据库平台。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-02-13 DOI: 10.1093/database/baaf010
Yan Shao, Yang Gao, Ling-Yu Wu, Shu-Guang Ge, Peng-Bo Wen

With the continuous advancements in cancer immunotherapy, neoantigen-based therapies have demonstrated remarkable clinical efficacy. However, accurately predicting the immunogenicity of neoantigens remains a significant challenge. This is mainly due to two core factors: the scarcity of high-quality neoantigen datasets and the limited prediction accuracy of existing immunogenicity prediction tools. This study addressed these issues through several key steps. First, it collected and organized immunogenic neoantigen peptide data from publicly available literature and neoantigen databases. Second, it analyzed the data to identify key features influencing neoantigen immunogenicity prediction. Finally, it integrated existing prediction tools to create TumorAgDB1.0, a comprehensive tumor neoantigen database. TumorAgDB1.0 offers a user-friendly platform. Users can efficiently search for neoantigen data using parameters like amino acid sequence and peptide length. The platform also offers detailed information on the characteristics of neoantigens and tools for predicting tumor neoantigen immunogenicity. Additionally, the database includes a data download function, allowing researchers to easily access high-quality data to support the development and improvement of neoantigen immunogenicity prediction tools. In summary, TumorAgDB1.0 is a powerful tool for neoantigen screening and validation in tumor immunotherapy. It offers strong support to researchers. Database URL: https://tumoragdb.com.cn.

随着肿瘤免疫治疗的不断进步,以新抗原为基础的治疗已显示出显著的临床疗效。然而,准确预测新抗原的免疫原性仍然是一个重大挑战。这主要是由于两个核心因素:缺乏高质量的新抗原数据集和现有免疫原性预测工具的预测精度有限。本研究通过几个关键步骤解决了这些问题。首先,从公开的文献和新抗原数据库中收集和整理免疫原性新抗原肽数据。其次,对数据进行分析,确定影响新抗原免疫原性预测的关键特征。最后,结合现有的预测工具,构建肿瘤新抗原综合数据库TumorAgDB1.0。TumorAgDB1.0提供了一个用户友好的平台。用户可以使用氨基酸序列和肽长度等参数有效地搜索新抗原数据。该平台还提供了新抗原特征的详细信息和预测肿瘤新抗原免疫原性的工具。此外,该数据库还包括数据下载功能,允许研究人员轻松访问高质量数据,以支持新抗原免疫原性预测工具的开发和改进。综上所述,TumorAgDB1.0是肿瘤免疫治疗中新抗原筛选和验证的有力工具。它为研究人员提供了强有力的支持。数据库地址:https://tumoragdb.com.cn。
{"title":"TumorAgDB1.0: tumor neoantigen database platform.","authors":"Yan Shao, Yang Gao, Ling-Yu Wu, Shu-Guang Ge, Peng-Bo Wen","doi":"10.1093/database/baaf010","DOIUrl":"https://doi.org/10.1093/database/baaf010","url":null,"abstract":"<p><p>With the continuous advancements in cancer immunotherapy, neoantigen-based therapies have demonstrated remarkable clinical efficacy. However, accurately predicting the immunogenicity of neoantigens remains a significant challenge. This is mainly due to two core factors: the scarcity of high-quality neoantigen datasets and the limited prediction accuracy of existing immunogenicity prediction tools. This study addressed these issues through several key steps. First, it collected and organized immunogenic neoantigen peptide data from publicly available literature and neoantigen databases. Second, it analyzed the data to identify key features influencing neoantigen immunogenicity prediction. Finally, it integrated existing prediction tools to create TumorAgDB1.0, a comprehensive tumor neoantigen database. TumorAgDB1.0 offers a user-friendly platform. Users can efficiently search for neoantigen data using parameters like amino acid sequence and peptide length. The platform also offers detailed information on the characteristics of neoantigens and tools for predicting tumor neoantigen immunogenicity. Additionally, the database includes a data download function, allowing researchers to easily access high-quality data to support the development and improvement of neoantigen immunogenicity prediction tools. In summary, TumorAgDB1.0 is a powerful tool for neoantigen screening and validation in tumor immunotherapy. It offers strong support to researchers. Database URL: https://tumoragdb.com.cn.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Working in biocuration: contemporary experiences and perspectives. 在生物学领域工作:当代经验和观点。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-02-12 DOI: 10.1093/database/baaf003
Sarah R Davies

This perspective article synthesizes current knowledge regarding what is known regarding biocuration as a career and the challenges facing the field. It draws on existing literature and ongoing qualitative research to discuss the nature of biocuration, biocurators' career trajectories, key challenges that biocurators face, and strategies for overcoming these. Overall, biocurators express a high degree of satisfaction with their work and see it as central to the wider biosciences. The central challenges that they face relate to the underfunding and under-recognition of this work, meaning that there is minimal stable funding for the field and that the work of human biocurators is often invisible to those who use curated resources. The article closes by critically discussing existing and potential strategies for responding to these challenges.

这篇观点文章综合了目前关于生物固化作为一种职业的已知知识以及该领域面临的挑战。它利用现有文献和正在进行的定性研究来讨论生物馆长的本质、生物馆长的职业轨迹、生物馆长面临的主要挑战以及克服这些挑战的策略。总体而言,生物策展人对他们的工作表达了高度的满意度,并将其视为更广泛的生物科学的核心。他们面临的主要挑战与这项工作的资金不足和不被认可有关,这意味着该领域的稳定资金很少,而且人类生物策展人的工作通常对那些使用策展资源的人来说是不可见的。文章最后批判性地讨论了应对这些挑战的现有和潜在战略。
{"title":"Working in biocuration: contemporary experiences and perspectives.","authors":"Sarah R Davies","doi":"10.1093/database/baaf003","DOIUrl":"10.1093/database/baaf003","url":null,"abstract":"<p><p>This perspective article synthesizes current knowledge regarding what is known regarding biocuration as a career and the challenges facing the field. It draws on existing literature and ongoing qualitative research to discuss the nature of biocuration, biocurators' career trajectories, key challenges that biocurators face, and strategies for overcoming these. Overall, biocurators express a high degree of satisfaction with their work and see it as central to the wider biosciences. The central challenges that they face relate to the underfunding and under-recognition of this work, meaning that there is minimal stable funding for the field and that the work of human biocurators is often invisible to those who use curated resources. The article closes by critically discussing existing and potential strategies for responding to these challenges.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11817794/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143406176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Building resource-efficient community databases using open-source software. 使用开源软件构建资源高效的社区数据库。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-02-12 DOI: 10.1093/database/baaf005
Sook Jung, Chun-Huai Cheng, Taein Lee, Katheryn Buble, Jodi Humann, Ping Zheng, Jing Yu, Dorrie Main

The unprecedented volume of big data being routinely generated for nonmodel crop species, coupled with advanced technology enabling the use of big data in breeding, gives further impetus for the need to have access to crop community databases, where all relevant data are curated and integrated. Funding for such databases is, however, insufficient and intermittent, resulting in the data being underutilized. While increased awareness of the importance of funding databases is important, it is practically necessary to find a more efficient way to build a community database. To meet the need for integrated database resources for various crop genomics, genetics, and breeding research communities, we have built five crop databases over the last decade using an open-source database platform and software. We describe the system and methods used for database construction, curation, and analysis protocols, and the data and tools that are available in these five crop databases. Database URL: The Genome Database for Rosaceae (GDR, www.rosaceae.org), the Genome Database for Vaccinium (GDV, www.vaccinium.org), the Citrus Genome Database (CGD, www.citrusgenomedb.org), the Pulse Crop Database (PCD, www.pulsedb.org), and CottonGen (www.cottongen.org).

非常规作物品种的常规大数据量前所未有,再加上能够在育种中使用大数据的先进技术,进一步推动了访问作物社区数据库的需求,所有相关数据都是经过整理和整合的。但是,这种数据库的经费不足,而且时断时续,导致数据没有得到充分利用。虽然提高对数据库筹资重要性的认识是重要的,但实际上必须找到一种更有效的方法来建立社区数据库。为了满足各种作物基因组学、遗传学和育种研究团体对综合数据库资源的需求,我们在过去十年中使用开源数据库平台和软件建立了五个作物数据库。我们描述了用于数据库构建、管理和分析协议的系统和方法,以及这五个作物数据库中可用的数据和工具。数据库链接:蔷薇科基因组数据库(GDR, www.rosaceae.org)、Vaccinium基因组数据库(GDV, www.vaccinium.org)、Citrus基因组数据库(CGD, www.citrusgenomedb.org)、Pulse Crop数据库(PCD, www.pulsedb.org)和CottonGen基因组数据库(www.cottongen.org)。
{"title":"Building resource-efficient community databases using open-source software.","authors":"Sook Jung, Chun-Huai Cheng, Taein Lee, Katheryn Buble, Jodi Humann, Ping Zheng, Jing Yu, Dorrie Main","doi":"10.1093/database/baaf005","DOIUrl":"10.1093/database/baaf005","url":null,"abstract":"<p><p>The unprecedented volume of big data being routinely generated for nonmodel crop species, coupled with advanced technology enabling the use of big data in breeding, gives further impetus for the need to have access to crop community databases, where all relevant data are curated and integrated. Funding for such databases is, however, insufficient and intermittent, resulting in the data being underutilized. While increased awareness of the importance of funding databases is important, it is practically necessary to find a more efficient way to build a community database. To meet the need for integrated database resources for various crop genomics, genetics, and breeding research communities, we have built five crop databases over the last decade using an open-source database platform and software. We describe the system and methods used for database construction, curation, and analysis protocols, and the data and tools that are available in these five crop databases. Database URL: The Genome Database for Rosaceae (GDR, www.rosaceae.org), the Genome Database for Vaccinium (GDV, www.vaccinium.org), the Citrus Genome Database (CGD, www.citrusgenomedb.org), the Pulse Crop Database (PCD, www.pulsedb.org), and CottonGen (www.cottongen.org).</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11833237/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143406241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AnnCovDB: a manually curated annotation database for mutations in SARS-CoV-2 spike protein. AnnCovDB: SARS-CoV-2刺突蛋白突变的人工编辑注释数据库。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-02-12 DOI: 10.1093/database/baaf002
Xiaomin Zhang, Zhongyi Lei, Jiarong Zhang, Tingting Yang, Xian Liu, Jiguo Xue, Ming Ni

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been circulating and adapting within the human population for >4 years. A large number of mutations have occurred in the viral genome, resulting in significant variants known as variants of concern (VOCs) and variants of interest (VOIs). The spike (S) protein harbors many of the characteristic mutations of VOCs and VOIs, and significant efforts have been made to explore functional effects of the mutations in the S protein, which can cause or contribute to viral infection, transmission, immune evasion, pathogenicity, and illness severity. However, the knowledge and understanding are dispersed throughout various publications, and there is a lack of a well-structured database for functional annotation that is based on manual curation. AnnCovDB is a database that provides manually curated functional annotations for mutations in the S protein of SARS-CoV-2. Mutations in the S protein carried by at least 8000 variants in the GISAID were chosen, and the mutations were then utilized as query keywords to search in the PubMed database. The searched publications revealed that 2093 annotation entities for 205 single mutations and 93 multiple mutations were manually curated. These entities were organized into multilevel hierarchical categories for user convenience. For example, one annotation entity of N501Y mutation was 'Infectious cycle➔Attachment➔ACE2 binding affinity➔Increase'. AnnCovDB can be used to query specific mutations and browse through function annotation entities. Database URL: https://AnnCovDB.app.bio-it.tech/.

严重急性呼吸综合征冠状病毒2型(SARS-CoV-2)已经在人群中传播和适应了40年。病毒基因组中发生了大量突变,导致了重要的变异,称为关注变异(VOCs)和感兴趣变异(VOIs)。刺突(S)蛋白包含许多VOCs和voi的特征突变,并且已经做出了大量的努力来探索S蛋白突变的功能影响,这些突变可能导致或促成病毒感染、传播、免疫逃避、致病性和疾病严重程度。然而,知识和理解分散在各种出版物中,并且缺乏基于手动管理的功能注释的结构良好的数据库。AnnCovDB是一个数据库,为SARS-CoV-2的S蛋白突变提供人工整理的功能注释。选择GISAID中至少8000个变异携带的S蛋白突变,并将这些突变作为查询关键词在PubMed数据库中进行检索。检索到的出版物显示,205个单突变和93个多突变的2093个注释实体是手动策划的。为了方便用户,这些实体被组织成多层次的分类。例如,N501Y突变的一个注释实体是“感染周期、附着、ACE2结合亲和、增加”。AnnCovDB可用于查询特定的突变和浏览函数注释实体。数据库地址:https://AnnCovDB.app.bio-it.tech/。
{"title":"AnnCovDB: a manually curated annotation database for mutations in SARS-CoV-2 spike protein.","authors":"Xiaomin Zhang, Zhongyi Lei, Jiarong Zhang, Tingting Yang, Xian Liu, Jiguo Xue, Ming Ni","doi":"10.1093/database/baaf002","DOIUrl":"10.1093/database/baaf002","url":null,"abstract":"<p><p>Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been circulating and adapting within the human population for >4 years. A large number of mutations have occurred in the viral genome, resulting in significant variants known as variants of concern (VOCs) and variants of interest (VOIs). The spike (S) protein harbors many of the characteristic mutations of VOCs and VOIs, and significant efforts have been made to explore functional effects of the mutations in the S protein, which can cause or contribute to viral infection, transmission, immune evasion, pathogenicity, and illness severity. However, the knowledge and understanding are dispersed throughout various publications, and there is a lack of a well-structured database for functional annotation that is based on manual curation. AnnCovDB is a database that provides manually curated functional annotations for mutations in the S protein of SARS-CoV-2. Mutations in the S protein carried by at least 8000 variants in the GISAID were chosen, and the mutations were then utilized as query keywords to search in the PubMed database. The searched publications revealed that 2093 annotation entities for 205 single mutations and 93 multiple mutations were manually curated. These entities were organized into multilevel hierarchical categories for user convenience. For example, one annotation entity of N501Y mutation was 'Infectious cycle➔Attachment➔ACE2 binding affinity➔Increase'. AnnCovDB can be used to query specific mutations and browse through function annotation entities. Database URL: https://AnnCovDB.app.bio-it.tech/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11817795/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143406239","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Database: The Journal of Biological Databases and Curation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1