Pub Date : 2025-02-19DOI: 10.1093/database/baaf001
Lei Gong, Fufeng Liu, Chuanxi Zhang, Yongfan Ming, Yulan Mou, ZhaoTing Yuan, Haiming Jiang, Bei Gao, Fuping Lu, Lujia Zhang
Enzymes, serving as eco-friendly catalysts, are progressively supplanting traditional chemical catalysts in light industry sectors such as feed, papermaking, textiles, detergents, leather, and sugar production. Despite this advancement, the variability in the performance of natural enzymes and the fragmentation and diversity of existing data formats pose significant challenges to researchers. Furthermore, AI-driven enzyme design is limited by the quality and quantity of available data. To address these issues, we introduce the light industrial core enzyme database (LICEDB), the first database dedicated exclusively to managing and standardizing enzymes for light industry applications. LICEDB, with its integrated modules for data retrieval, similarity analysis, and structural analysis, will enhance the efficient industrial application of enzymes and strengthen AI-driven predictive research, thereby advancing data sharing and utilization in the field of enzyme innovation. Database URL: http://lujialab.org.cn/on-line-databases/.
{"title":"LICEDB: light industrial core enzyme database for industrial applications and AI enzyme design.","authors":"Lei Gong, Fufeng Liu, Chuanxi Zhang, Yongfan Ming, Yulan Mou, ZhaoTing Yuan, Haiming Jiang, Bei Gao, Fuping Lu, Lujia Zhang","doi":"10.1093/database/baaf001","DOIUrl":"https://doi.org/10.1093/database/baaf001","url":null,"abstract":"<p><p>Enzymes, serving as eco-friendly catalysts, are progressively supplanting traditional chemical catalysts in light industry sectors such as feed, papermaking, textiles, detergents, leather, and sugar production. Despite this advancement, the variability in the performance of natural enzymes and the fragmentation and diversity of existing data formats pose significant challenges to researchers. Furthermore, AI-driven enzyme design is limited by the quality and quantity of available data. To address these issues, we introduce the light industrial core enzyme database (LICEDB), the first database dedicated exclusively to managing and standardizing enzymes for light industry applications. LICEDB, with its integrated modules for data retrieval, similarity analysis, and structural analysis, will enhance the efficient industrial application of enzymes and strengthen AI-driven predictive research, thereby advancing data sharing and utilization in the field of enzyme innovation. Database URL: http://lujialab.org.cn/on-line-databases/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-17DOI: 10.1093/database/baaf014
{"title":"Correction to: CardioHotspots: a database of mutational hotspots for cardiac disorders.","authors":"","doi":"10.1093/database/baaf014","DOIUrl":"https://doi.org/10.1093/database/baaf014","url":null,"abstract":"","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143972477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-17DOI: 10.1093/database/baaf011
Elly Poretsky, Victoria C Blake, Carson M Andorf, Taner Z Sen
Curated resources at centralized repositories provide high-value service to users by enhancing data veracity. Curation, however, comes with a cost, as it requires dedicated time and effort from personnel with deep domain knowledge. In this paper, we investigate the performance of a large language model (LLM), specifically generative pre-trained transformer (GPT)-3.5 and GPT-4, in extracting and presenting data against a human curator. In order to accomplish this task, we used a small set of journal articles on wheat and barley genetics, focusing on traits, such as salinity tolerance and disease resistance, which are becoming more important. The 36 papers were then curated by a professional curator for the GrainGenes database (https://wheat.pw.usda.gov). In parallel, we developed a GPT-based retrieval-augmented generation question-answering system and compared how GPT performed in answering questions about traits and quantitative trait loci (QTLs). Our findings show that on average GPT-4 correctly categorized manuscripts 97% of the time, correctly extracted 80% of traits, and 61% of marker-trait associations. Furthermore, we assessed the ability of a GPT-based DataFrame agent to filter and summarize curated wheat genetics data, showing the potential of human and computational curators working side-by-side. In one case study, our findings show that GPT-4 was able to retrieve up to 91% of disease related, human-curated QTLs across the whole genome, and up to 96% across a specific genomic region through prompt engineering. Also, we observed that across most tasks, GPT-4 consistently outperformed GPT-3.5 while generating less hallucinations, suggesting that improvements in LLM models will make generative artificial intelligence a much more accurate companion for curators in extracting information from scientific literature. Despite their limitations, LLMs demonstrated a potential to extract and present information to curators and users of biological databases, as long as users are aware of potential inaccuracies and the possibility of incomplete information extraction.
{"title":"Assessing the performance of generative artificial intelligence in retrieving information against manually curated genetic and genomic data.","authors":"Elly Poretsky, Victoria C Blake, Carson M Andorf, Taner Z Sen","doi":"10.1093/database/baaf011","DOIUrl":"https://doi.org/10.1093/database/baaf011","url":null,"abstract":"<p><p>Curated resources at centralized repositories provide high-value service to users by enhancing data veracity. Curation, however, comes with a cost, as it requires dedicated time and effort from personnel with deep domain knowledge. In this paper, we investigate the performance of a large language model (LLM), specifically generative pre-trained transformer (GPT)-3.5 and GPT-4, in extracting and presenting data against a human curator. In order to accomplish this task, we used a small set of journal articles on wheat and barley genetics, focusing on traits, such as salinity tolerance and disease resistance, which are becoming more important. The 36 papers were then curated by a professional curator for the GrainGenes database (https://wheat.pw.usda.gov). In parallel, we developed a GPT-based retrieval-augmented generation question-answering system and compared how GPT performed in answering questions about traits and quantitative trait loci (QTLs). Our findings show that on average GPT-4 correctly categorized manuscripts 97% of the time, correctly extracted 80% of traits, and 61% of marker-trait associations. Furthermore, we assessed the ability of a GPT-based DataFrame agent to filter and summarize curated wheat genetics data, showing the potential of human and computational curators working side-by-side. In one case study, our findings show that GPT-4 was able to retrieve up to 91% of disease related, human-curated QTLs across the whole genome, and up to 96% across a specific genomic region through prompt engineering. Also, we observed that across most tasks, GPT-4 consistently outperformed GPT-3.5 while generating less hallucinations, suggesting that improvements in LLM models will make generative artificial intelligence a much more accurate companion for curators in extracting information from scientific literature. Despite their limitations, LLMs demonstrated a potential to extract and present information to curators and users of biological databases, as long as users are aware of potential inaccuracies and the possibility of incomplete information extraction.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-17DOI: 10.1093/database/baaf014
{"title":"Correction to: CardioHotspots: a database of mutational hotspots for cardiac disorders.","authors":"","doi":"10.1093/database/baaf014","DOIUrl":"https://doi.org/10.1093/database/baaf014","url":null,"abstract":"","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126772","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-17DOI: 10.1093/database/baaf014
{"title":"Correction to: CardioHotspots: a database of mutational hotspots for cardiac disorders.","authors":"","doi":"10.1093/database/baaf014","DOIUrl":"10.1093/database/baaf014","url":null,"abstract":"","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11833235/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143440159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-17DOI: 10.1093/database/baaf011
Elly Poretsky, Victoria C Blake, Carson M Andorf, Taner Z Sen
Curated resources at centralized repositories provide high-value service to users by enhancing data veracity. Curation, however, comes with a cost, as it requires dedicated time and effort from personnel with deep domain knowledge. In this paper, we investigate the performance of a large language model (LLM), specifically generative pre-trained transformer (GPT)-3.5 and GPT-4, in extracting and presenting data against a human curator. In order to accomplish this task, we used a small set of journal articles on wheat and barley genetics, focusing on traits, such as salinity tolerance and disease resistance, which are becoming more important. The 36 papers were then curated by a professional curator for the GrainGenes database (https://wheat.pw.usda.gov). In parallel, we developed a GPT-based retrieval-augmented generation question-answering system and compared how GPT performed in answering questions about traits and quantitative trait loci (QTLs). Our findings show that on average GPT-4 correctly categorized manuscripts 97% of the time, correctly extracted 80% of traits, and 61% of marker-trait associations. Furthermore, we assessed the ability of a GPT-based DataFrame agent to filter and summarize curated wheat genetics data, showing the potential of human and computational curators working side-by-side. In one case study, our findings show that GPT-4 was able to retrieve up to 91% of disease related, human-curated QTLs across the whole genome, and up to 96% across a specific genomic region through prompt engineering. Also, we observed that across most tasks, GPT-4 consistently outperformed GPT-3.5 while generating less hallucinations, suggesting that improvements in LLM models will make generative artificial intelligence a much more accurate companion for curators in extracting information from scientific literature. Despite their limitations, LLMs demonstrated a potential to extract and present information to curators and users of biological databases, as long as users are aware of potential inaccuracies and the possibility of incomplete information extraction.
{"title":"Assessing the performance of generative artificial intelligence in retrieving information against manually curated genetic and genomic data.","authors":"Elly Poretsky, Victoria C Blake, Carson M Andorf, Taner Z Sen","doi":"10.1093/database/baaf011","DOIUrl":"10.1093/database/baaf011","url":null,"abstract":"<p><p>Curated resources at centralized repositories provide high-value service to users by enhancing data veracity. Curation, however, comes with a cost, as it requires dedicated time and effort from personnel with deep domain knowledge. In this paper, we investigate the performance of a large language model (LLM), specifically generative pre-trained transformer (GPT)-3.5 and GPT-4, in extracting and presenting data against a human curator. In order to accomplish this task, we used a small set of journal articles on wheat and barley genetics, focusing on traits, such as salinity tolerance and disease resistance, which are becoming more important. The 36 papers were then curated by a professional curator for the GrainGenes database (https://wheat.pw.usda.gov). In parallel, we developed a GPT-based retrieval-augmented generation question-answering system and compared how GPT performed in answering questions about traits and quantitative trait loci (QTLs). Our findings show that on average GPT-4 correctly categorized manuscripts 97% of the time, correctly extracted 80% of traits, and 61% of marker-trait associations. Furthermore, we assessed the ability of a GPT-based DataFrame agent to filter and summarize curated wheat genetics data, showing the potential of human and computational curators working side-by-side. In one case study, our findings show that GPT-4 was able to retrieve up to 91% of disease related, human-curated QTLs across the whole genome, and up to 96% across a specific genomic region through prompt engineering. Also, we observed that across most tasks, GPT-4 consistently outperformed GPT-3.5 while generating less hallucinations, suggesting that improvements in LLM models will make generative artificial intelligence a much more accurate companion for curators in extracting information from scientific literature. Despite their limitations, LLMs demonstrated a potential to extract and present information to curators and users of biological databases, as long as users are aware of potential inaccuracies and the possibility of incomplete information extraction.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11833239/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143440157","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-13DOI: 10.1093/database/baaf010
Yan Shao, Yang Gao, Ling-Yu Wu, Shu-Guang Ge, Peng-Bo Wen
With the continuous advancements in cancer immunotherapy, neoantigen-based therapies have demonstrated remarkable clinical efficacy. However, accurately predicting the immunogenicity of neoantigens remains a significant challenge. This is mainly due to two core factors: the scarcity of high-quality neoantigen datasets and the limited prediction accuracy of existing immunogenicity prediction tools. This study addressed these issues through several key steps. First, it collected and organized immunogenic neoantigen peptide data from publicly available literature and neoantigen databases. Second, it analyzed the data to identify key features influencing neoantigen immunogenicity prediction. Finally, it integrated existing prediction tools to create TumorAgDB1.0, a comprehensive tumor neoantigen database. TumorAgDB1.0 offers a user-friendly platform. Users can efficiently search for neoantigen data using parameters like amino acid sequence and peptide length. The platform also offers detailed information on the characteristics of neoantigens and tools for predicting tumor neoantigen immunogenicity. Additionally, the database includes a data download function, allowing researchers to easily access high-quality data to support the development and improvement of neoantigen immunogenicity prediction tools. In summary, TumorAgDB1.0 is a powerful tool for neoantigen screening and validation in tumor immunotherapy. It offers strong support to researchers. Database URL: https://tumoragdb.com.cn.
{"title":"TumorAgDB1.0: tumor neoantigen database platform.","authors":"Yan Shao, Yang Gao, Ling-Yu Wu, Shu-Guang Ge, Peng-Bo Wen","doi":"10.1093/database/baaf010","DOIUrl":"10.1093/database/baaf010","url":null,"abstract":"<p><p>With the continuous advancements in cancer immunotherapy, neoantigen-based therapies have demonstrated remarkable clinical efficacy. However, accurately predicting the immunogenicity of neoantigens remains a significant challenge. This is mainly due to two core factors: the scarcity of high-quality neoantigen datasets and the limited prediction accuracy of existing immunogenicity prediction tools. This study addressed these issues through several key steps. First, it collected and organized immunogenic neoantigen peptide data from publicly available literature and neoantigen databases. Second, it analyzed the data to identify key features influencing neoantigen immunogenicity prediction. Finally, it integrated existing prediction tools to create TumorAgDB1.0, a comprehensive tumor neoantigen database. TumorAgDB1.0 offers a user-friendly platform. Users can efficiently search for neoantigen data using parameters like amino acid sequence and peptide length. The platform also offers detailed information on the characteristics of neoantigens and tools for predicting tumor neoantigen immunogenicity. Additionally, the database includes a data download function, allowing researchers to easily access high-quality data to support the development and improvement of neoantigen immunogenicity prediction tools. In summary, TumorAgDB1.0 is a powerful tool for neoantigen screening and validation in tumor immunotherapy. It offers strong support to researchers. Database URL: https://tumoragdb.com.cn.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11836679/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143448503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-13DOI: 10.1093/database/baaf010
Yan Shao, Yang Gao, Ling-Yu Wu, Shu-Guang Ge, Peng-Bo Wen
With the continuous advancements in cancer immunotherapy, neoantigen-based therapies have demonstrated remarkable clinical efficacy. However, accurately predicting the immunogenicity of neoantigens remains a significant challenge. This is mainly due to two core factors: the scarcity of high-quality neoantigen datasets and the limited prediction accuracy of existing immunogenicity prediction tools. This study addressed these issues through several key steps. First, it collected and organized immunogenic neoantigen peptide data from publicly available literature and neoantigen databases. Second, it analyzed the data to identify key features influencing neoantigen immunogenicity prediction. Finally, it integrated existing prediction tools to create TumorAgDB1.0, a comprehensive tumor neoantigen database. TumorAgDB1.0 offers a user-friendly platform. Users can efficiently search for neoantigen data using parameters like amino acid sequence and peptide length. The platform also offers detailed information on the characteristics of neoantigens and tools for predicting tumor neoantigen immunogenicity. Additionally, the database includes a data download function, allowing researchers to easily access high-quality data to support the development and improvement of neoantigen immunogenicity prediction tools. In summary, TumorAgDB1.0 is a powerful tool for neoantigen screening and validation in tumor immunotherapy. It offers strong support to researchers. Database URL: https://tumoragdb.com.cn.
{"title":"TumorAgDB1.0: tumor neoantigen database platform.","authors":"Yan Shao, Yang Gao, Ling-Yu Wu, Shu-Guang Ge, Peng-Bo Wen","doi":"10.1093/database/baaf010","DOIUrl":"https://doi.org/10.1093/database/baaf010","url":null,"abstract":"<p><p>With the continuous advancements in cancer immunotherapy, neoantigen-based therapies have demonstrated remarkable clinical efficacy. However, accurately predicting the immunogenicity of neoantigens remains a significant challenge. This is mainly due to two core factors: the scarcity of high-quality neoantigen datasets and the limited prediction accuracy of existing immunogenicity prediction tools. This study addressed these issues through several key steps. First, it collected and organized immunogenic neoantigen peptide data from publicly available literature and neoantigen databases. Second, it analyzed the data to identify key features influencing neoantigen immunogenicity prediction. Finally, it integrated existing prediction tools to create TumorAgDB1.0, a comprehensive tumor neoantigen database. TumorAgDB1.0 offers a user-friendly platform. Users can efficiently search for neoantigen data using parameters like amino acid sequence and peptide length. The platform also offers detailed information on the characteristics of neoantigens and tools for predicting tumor neoantigen immunogenicity. Additionally, the database includes a data download function, allowing researchers to easily access high-quality data to support the development and improvement of neoantigen immunogenicity prediction tools. In summary, TumorAgDB1.0 is a powerful tool for neoantigen screening and validation in tumor immunotherapy. It offers strong support to researchers. Database URL: https://tumoragdb.com.cn.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-02-12DOI: 10.1093/database/baaf003
Sarah R Davies
This perspective article synthesizes current knowledge regarding what is known regarding biocuration as a career and the challenges facing the field. It draws on existing literature and ongoing qualitative research to discuss the nature of biocuration, biocurators' career trajectories, key challenges that biocurators face, and strategies for overcoming these. Overall, biocurators express a high degree of satisfaction with their work and see it as central to the wider biosciences. The central challenges that they face relate to the underfunding and under-recognition of this work, meaning that there is minimal stable funding for the field and that the work of human biocurators is often invisible to those who use curated resources. The article closes by critically discussing existing and potential strategies for responding to these challenges.
{"title":"Working in biocuration: contemporary experiences and perspectives.","authors":"Sarah R Davies","doi":"10.1093/database/baaf003","DOIUrl":"10.1093/database/baaf003","url":null,"abstract":"<p><p>This perspective article synthesizes current knowledge regarding what is known regarding biocuration as a career and the challenges facing the field. It draws on existing literature and ongoing qualitative research to discuss the nature of biocuration, biocurators' career trajectories, key challenges that biocurators face, and strategies for overcoming these. Overall, biocurators express a high degree of satisfaction with their work and see it as central to the wider biosciences. The central challenges that they face relate to the underfunding and under-recognition of this work, meaning that there is minimal stable funding for the field and that the work of human biocurators is often invisible to those who use curated resources. The article closes by critically discussing existing and potential strategies for responding to these challenges.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11817794/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143406176","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The unprecedented volume of big data being routinely generated for nonmodel crop species, coupled with advanced technology enabling the use of big data in breeding, gives further impetus for the need to have access to crop community databases, where all relevant data are curated and integrated. Funding for such databases is, however, insufficient and intermittent, resulting in the data being underutilized. While increased awareness of the importance of funding databases is important, it is practically necessary to find a more efficient way to build a community database. To meet the need for integrated database resources for various crop genomics, genetics, and breeding research communities, we have built five crop databases over the last decade using an open-source database platform and software. We describe the system and methods used for database construction, curation, and analysis protocols, and the data and tools that are available in these five crop databases. Database URL: The Genome Database for Rosaceae (GDR, www.rosaceae.org), the Genome Database for Vaccinium (GDV, www.vaccinium.org), the Citrus Genome Database (CGD, www.citrusgenomedb.org), the Pulse Crop Database (PCD, www.pulsedb.org), and CottonGen (www.cottongen.org).
{"title":"Building resource-efficient community databases using open-source software.","authors":"Sook Jung, Chun-Huai Cheng, Taein Lee, Katheryn Buble, Jodi Humann, Ping Zheng, Jing Yu, Dorrie Main","doi":"10.1093/database/baaf005","DOIUrl":"10.1093/database/baaf005","url":null,"abstract":"<p><p>The unprecedented volume of big data being routinely generated for nonmodel crop species, coupled with advanced technology enabling the use of big data in breeding, gives further impetus for the need to have access to crop community databases, where all relevant data are curated and integrated. Funding for such databases is, however, insufficient and intermittent, resulting in the data being underutilized. While increased awareness of the importance of funding databases is important, it is practically necessary to find a more efficient way to build a community database. To meet the need for integrated database resources for various crop genomics, genetics, and breeding research communities, we have built five crop databases over the last decade using an open-source database platform and software. We describe the system and methods used for database construction, curation, and analysis protocols, and the data and tools that are available in these five crop databases. Database URL: The Genome Database for Rosaceae (GDR, www.rosaceae.org), the Genome Database for Vaccinium (GDV, www.vaccinium.org), the Citrus Genome Database (CGD, www.citrusgenomedb.org), the Pulse Crop Database (PCD, www.pulsedb.org), and CottonGen (www.cottongen.org).</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11833237/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143406241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}