Database: The Journal of Biological Databases and Curation最新文献_第3页

GenDiS3 database: census on the prevalence of protein domain superfamilies of known structure in the entire sequence database. GenDiS3数据库：对整个序列数据库中已知结构的蛋白质结构域超家族的流行情况进行普查。

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-05-09 DOI: 10.1093/database/baaf035

Sarthak Joshi, Shailendu Mohapatra, Dhwani Kumar, Adwait Joshi, Meenakshi Iyer, Ramanathan Sowdhamini

Despite the vast amount of sequence data available, a significant disparity exists between the number of protein sequences identified and the relatively few structures that have been resolved. This disparity highlights the challenge in structural biology to bridge the gap between sequence information and 3D structural data, and the necessity for robust databases capable of linking distant homologs to known structures. Studies have indicated that there are a limited number of structural folds, despite the vast diversity of proteins. Hence, computational tools can enhance our ability to classify protein sequences, much before their structures are determined or their functions are characterized, thereby bridging the gap between sequence and structural data. GenDiS (Genomic Distribution of Superfamilies) is a repository with information on the genomic distribution of protein domain superfamilies, involving a one-time computational exercise to search for trusted homologs of protein domains of known structures against the vast sequence database. We have updated this database employing advanced bioinformatics tools, including DELTA-BLAST (domain enhanced lookup time accelerated BLAST) for initial detection of hits and HMMSCAN for validation, significantly improving the accuracy of domain identification. Using these tools, over 151 million sequence homologs for 2060 superfamilies [SCOPe (Structural Classification of Proteins extended)] were identified and 116 million out of them were validated as true positives. Through a case study on glycolysis-related enzymes, variations in domain architectures of these enzymes are explored, revealing evolutionary changes and functional diversity among these essential proteins. We present another case, LOG gene, where one can tune in and find significant mutations across the evolutionary lineage. The GenDiS database, GenDiS3, and the associated tools made available at https://caps.ncbs.res.in/gendis3/ offer a powerful resource for researchers in functional annotation and evolutionary studies. Database URL: https://caps.ncbs.res.in/gendis3/.

尽管有大量可用的序列数据，但已确定的蛋白质序列数量与已解决的相对较少的结构之间存在显着差异。这种差异凸显了结构生物学在序列信息和3D结构数据之间建立桥梁的挑战，以及建立能够将遥远同源物与已知结构联系起来的强大数据库的必要性。研究表明，尽管蛋白质种类繁多，但结构褶皱的数量有限。因此，计算工具可以提高我们对蛋白质序列进行分类的能力，在它们的结构被确定或功能被表征之前，从而弥合了序列和结构数据之间的差距。GenDiS（基因组分布超家族）是蛋白质结构域超家族基因组分布信息的存储库，涉及一次性计算练习，以搜索已知结构的蛋白质结构域的可靠同源物，而不是庞大的序列数据库。我们使用先进的生物信息学工具更新了该数据库，包括用于初始检测命中的DELTA-BLAST（域增强查找时间加速BLAST）和用于验证的HMMSCAN，显着提高了域识别的准确性。使用这些工具，鉴定了2060个超家族[SCOPe (Structural Classification of Proteins extended)]的1.51亿个序列同源物，其中1.16亿个被验证为真阳性。通过对糖酵解相关酶的案例研究，探讨了这些酶结构域结构的变化，揭示了这些必需蛋白质的进化变化和功能多样性。我们提出了另一种情况，LOG基因，在这种情况下，人们可以调谐并发现进化谱系中的重大突变。GenDiS数据库，GenDiS3和相关的工具在https://caps.ncbs.res.in/gendis3/上提供了功能注释和进化研究的研究人员一个强大的资源。数据库地址：https://caps.ncbs.res.in/gendis3/。

{"title":"GenDiS3 database: census on the prevalence of protein domain superfamilies of known structure in the entire sequence database.","authors":"Sarthak Joshi, Shailendu Mohapatra, Dhwani Kumar, Adwait Joshi, Meenakshi Iyer, Ramanathan Sowdhamini","doi":"10.1093/database/baaf035","DOIUrl":"https://doi.org/10.1093/database/baaf035","url":null,"abstract":"Despite the vast amount of sequence data available, a significant disparity exists between the number of protein sequences identified and the relatively few structures that have been resolved. This disparity highlights the challenge in structural biology to bridge the gap between sequence information and 3D structural data, and the necessity for robust databases capable of linking distant homologs to known structures. Studies have indicated that there are a limited number of structural folds, despite the vast diversity of proteins. Hence, computational tools can enhance our ability to classify protein sequences, much before their structures are determined or their functions are characterized, thereby bridging the gap between sequence and structural data. GenDiS (Genomic Distribution of Superfamilies) is a repository with information on the genomic distribution of protein domain superfamilies, involving a one-time computational exercise to search for trusted homologs of protein domains of known structures against the vast sequence database. We have updated this database employing advanced bioinformatics tools, including DELTA-BLAST (domain enhanced lookup time accelerated BLAST) for initial detection of hits and HMMSCAN for validation, significantly improving the accuracy of domain identification. Using these tools, over 151 million sequence homologs for 2060 superfamilies [SCOPe (Structural Classification of Proteins extended)] were identified and 116 million out of them were validated as true positives. Through a case study on glycolysis-related enzymes, variations in domain architectures of these enzymes are explored, revealing evolutionary changes and functional diversity among these essential proteins. We present another case, LOG gene, where one can tune in and find significant mutations across the evolutionary lineage. The GenDiS database, GenDiS3, and the associated tools made available at https://caps.ncbs.res.in/gendis3/ offer a powerful resource for researchers in functional annotation and evolutionary studies. Database URL: https://caps.ncbs.res.in/gendis3/.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CancerPPD2: an updated repository of anticancer peptides and proteins. CancerPPD2：抗癌肽和蛋白质的更新库。

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-05-07 DOI: 10.1093/database/baaf030

Milind Chauhan, Amisha Gupta, Ritu Tomer, Gajendra P S Raghava

CancerPPD2 (http://webs.iiitd.edu.in/raghava/cancerppd2/) is an updated version of CancerPPD, developed to maintain comprehensive information about anticancer peptides and proteins. It contains 6521 entries, each entry provides detailed information about an anticancer peptide/protein that include origin of the peptide, cancer cell line, type of cancer, peptide sequence, and structure. These anticancer peptides have been tested against 392 types of cancer cell lines and 28 types of cancer-associated tissues. In addition to natural anticancer peptides, CancerPPD2 contains 781 entries for chemically modified and 3018 entries for N-/C- terminus modified anticancer peptides. Few entries are also linked with 47 clinical studies and have provided the cross reference to Uniprot, DrugBank, and ThPDB2. The possible entries also linked with clinical trials. On average, CancerPPD2 contains around 85% more information than its previous version, CancerPPD. The structures of these anticancer peptides and proteins were either obtained from the Protein Data Bank (PDB) or predicted using PEPstrMOD, I-TASSER, and AlphaFold. A wide range of tools have been integrated into CancerPPD2 for data retrieval and similarity searches. Additionally, we integrated a REST API into this repository to facilitate automatic data retrieval via program. Database URL: https://webs.iiitd.edu.in/raghava/cancerppd2/api/rest.html.

CancerPPD2 （http://webs.iiitd.edu.in/raghava/cancerppd2/）是CancerPPD的更新版本，旨在维护有关抗癌肽和蛋白质的全面信息。它包含6521个条目，每个条目提供有关抗癌肽/蛋白质的详细信息，包括肽的来源、癌细胞系、癌症类型、肽序列和结构。这些抗癌肽已经对392种癌细胞系和28种癌症相关组织进行了测试。除天然抗癌肽外，CancerPPD2含有781个化学修饰的片段和3018个N /C末端修饰的抗癌肽片段。少数条目还与47项临床研究相关联，并为Uniprot、DrugBank和ThPDB2提供了交叉参考。可能的条目还与临床试验有关。平均而言，CancerPPD2比之前的版本CancerPPD多包含约85%的信息。这些抗癌肽和蛋白质的结构要么从蛋白质数据库（PDB）中获得，要么使用PEPstrMOD、I-TASSER和AlphaFold预测。CancerPPD2中集成了多种工具，用于数据检索和相似性搜索。此外，我们将一个REST API集成到这个存储库中，以方便通过程序自动检索数据。数据库地址：https://webs.iiitd.edu.in/raghava/cancerppd2/api/rest.html。

{"title":"CancerPPD2: an updated repository of anticancer peptides and proteins.","authors":"Milind Chauhan, Amisha Gupta, Ritu Tomer, Gajendra P S Raghava","doi":"10.1093/database/baaf030","DOIUrl":"https://doi.org/10.1093/database/baaf030","url":null,"abstract":"CancerPPD2 (http://webs.iiitd.edu.in/raghava/cancerppd2/) is an updated version of CancerPPD, developed to maintain comprehensive information about anticancer peptides and proteins. It contains 6521 entries, each entry provides detailed information about an anticancer peptide/protein that include origin of the peptide, cancer cell line, type of cancer, peptide sequence, and structure. These anticancer peptides have been tested against 392 types of cancer cell lines and 28 types of cancer-associated tissues. In addition to natural anticancer peptides, CancerPPD2 contains 781 entries for chemically modified and 3018 entries for N-/C- terminus modified anticancer peptides. Few entries are also linked with 47 clinical studies and have provided the cross reference to Uniprot, DrugBank, and ThPDB2. The possible entries also linked with clinical trials. On average, CancerPPD2 contains around 85% more information than its previous version, CancerPPD. The structures of these anticancer peptides and proteins were either obtained from the Protein Data Bank (PDB) or predicted using PEPstrMOD, I-TASSER, and AlphaFold. A wide range of tools have been integrated into CancerPPD2 for data retrieval and similarity searches. Additionally, we integrated a REST API into this repository to facilitate automatic data retrieval via program. Database URL: https://webs.iiitd.edu.in/raghava/cancerppd2/api/rest.html.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A longitudinal analysis of function annotations of the human proteome reveals consistently high biases. 对人类蛋白质组功能注释的纵向分析显示出一贯的高偏差。

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-05-07 DOI: 10.1093/database/baaf036

An Phan, Parnal Joshi, Claus Kadelka, Iddo Friedberg

The resources required to study gene function are limited, especially when considering the number of genes in the human genome and the complexity of their function. Therefore, genes are prioritized for experimental studies based on many different considerations, including, but not limited to, perceived biomedical importance, such as disease-associated genes, or the understanding of biological processes, such as cell signalling pathways. At the same time, most genes are not studied or are under-characterized, which hampers our understanding of their function and potential effects on human health and wellness. Understanding function annotation disparity is a necessary first step toward understanding how much functional knowledge is gained from the human genome, and toward guidelines for better targeting future studies of the genes in the human genome effectively. Here, we present a comprehensive longitudinal analysis of the human proteome utilizing data analysis tools from economics and information theory. Specifically, we view the human proteome as a population of proteins within a knowledge economy: we treat the quantified knowledge of the protein's function as the analogue of wealth and examine the distribution of information in a population of proteins in the proteome in the same manner distribution of wealth is studied in societies. Our results show a highly skewed distribution of information about human proteins over the last decade, in which the inequality in the annotations given to the proteins remains high. Additionally, we examine the correlation between the knowledge about protein function as captured in databases and the interest in proteins as reflected by mentions in the scientific literature. We show a large gap between knowledge and interest and dissect the factors leading to this gap. In conclusion, our study shows that research efforts should be redirected to less studied proteins to mitigate the disparity among human proteins both in databases and literature.

研究基因功能所需的资源是有限的，特别是考虑到人类基因组中基因的数量及其功能的复杂性。因此，基于许多不同的考虑因素，包括但不限于感知到的生物医学重要性，如疾病相关基因，或对生物过程的理解，如细胞信号传导途径，对实验研究的基因进行优先排序。与此同时，大多数基因没有被研究或特征不充分，这阻碍了我们对它们的功能和对人类健康的潜在影响的理解。了解功能注释差异是了解人类基因组功能知识的必要第一步，也是更好地针对人类基因组中基因的未来研究的指导方针。在这里，我们利用经济学和信息论的数据分析工具，对人类蛋白质组进行了全面的纵向分析。具体而言，我们将人类蛋白质组视为知识经济中的蛋白质群体：我们将蛋白质功能的量化知识视为财富的类似物，并以研究社会中财富分布的相同方式检查蛋白质组中蛋白质群体中的信息分布。我们的结果表明，在过去十年中，关于人类蛋白质的信息分布高度倾斜，其中给予蛋白质的注释中的不平等仍然很高。此外，我们研究了数据库中捕获的关于蛋白质功能的知识与科学文献中提到的对蛋白质的兴趣之间的相关性。我们展示了知识和兴趣之间的巨大差距，并剖析了导致这种差距的因素。总之，我们的研究表明，研究工作应该转向研究较少的蛋白质，以减轻数据库和文献中人类蛋白质之间的差异。

{"title":"A longitudinal analysis of function annotations of the human proteome reveals consistently high biases.","authors":"An Phan, Parnal Joshi, Claus Kadelka, Iddo Friedberg","doi":"10.1093/database/baaf036","DOIUrl":"https://doi.org/10.1093/database/baaf036","url":null,"abstract":"The resources required to study gene function are limited, especially when considering the number of genes in the human genome and the complexity of their function. Therefore, genes are prioritized for experimental studies based on many different considerations, including, but not limited to, perceived biomedical importance, such as disease-associated genes, or the understanding of biological processes, such as cell signalling pathways. At the same time, most genes are not studied or are under-characterized, which hampers our understanding of their function and potential effects on human health and wellness. Understanding function annotation disparity is a necessary first step toward understanding how much functional knowledge is gained from the human genome, and toward guidelines for better targeting future studies of the genes in the human genome effectively. Here, we present a comprehensive longitudinal analysis of the human proteome utilizing data analysis tools from economics and information theory. Specifically, we view the human proteome as a population of proteins within a knowledge economy: we treat the quantified knowledge of the protein's function as the analogue of wealth and examine the distribution of information in a population of proteins in the proteome in the same manner distribution of wealth is studied in societies. Our results show a highly skewed distribution of information about human proteins over the last decade, in which the inequality in the annotations given to the proteins remains high. Additionally, we examine the correlation between the knowledge about protein function as captured in databases and the interest in proteins as reflected by mentions in the scientific literature. We show a large gap between knowledge and interest and dissect the factors leading to this gap. In conclusion, our study shows that research efforts should be redirected to less studied proteins to mitigate the disparity among human proteins both in databases and literature.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126981","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A longitudinal analysis of function annotations of the human proteome reveals consistently high biases. 对人类蛋白质组功能注释的纵向分析显示出一贯的高偏差。

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-05-07 DOI: 10.1093/database/baaf036

An Phan, Parnal Joshi, Claus Kadelka, Iddo Friedberg

The resources required to study gene function are limited, especially when considering the number of genes in the human genome and the complexity of their function. Therefore, genes are prioritized for experimental studies based on many different considerations, including, but not limited to, perceived biomedical importance, such as disease-associated genes, or the understanding of biological processes, such as cell signalling pathways. At the same time, most genes are not studied or are under-characterized, which hampers our understanding of their function and potential effects on human health and wellness. Understanding function annotation disparity is a necessary first step toward understanding how much functional knowledge is gained from the human genome, and toward guidelines for better targeting future studies of the genes in the human genome effectively. Here, we present a comprehensive longitudinal analysis of the human proteome utilizing data analysis tools from economics and information theory. Specifically, we view the human proteome as a population of proteins within a knowledge economy: we treat the quantified knowledge of the protein's function as the analogue of wealth and examine the distribution of information in a population of proteins in the proteome in the same manner distribution of wealth is studied in societies. Our results show a highly skewed distribution of information about human proteins over the last decade, in which the inequality in the annotations given to the proteins remains high. Additionally, we examine the correlation between the knowledge about protein function as captured in databases and the interest in proteins as reflected by mentions in the scientific literature. We show a large gap between knowledge and interest and dissect the factors leading to this gap. In conclusion, our study shows that research efforts should be redirected to less studied proteins to mitigate the disparity among human proteins both in databases and literature.

研究基因功能所需的资源是有限的，特别是考虑到人类基因组中基因的数量及其功能的复杂性。因此，基于许多不同的考虑因素，包括但不限于感知到的生物医学重要性，如疾病相关基因，或对生物过程的理解，如细胞信号传导途径，对实验研究的基因进行优先排序。与此同时，大多数基因没有被研究或特征不充分，这阻碍了我们对它们的功能和对人类健康的潜在影响的理解。了解功能注释差异是了解人类基因组功能知识的必要第一步，也是更好地针对人类基因组中基因的未来研究的指导方针。在这里，我们利用经济学和信息论的数据分析工具，对人类蛋白质组进行了全面的纵向分析。具体而言，我们将人类蛋白质组视为知识经济中的蛋白质群体：我们将蛋白质功能的量化知识视为财富的类似物，并以研究社会中财富分布的相同方式检查蛋白质组中蛋白质群体中的信息分布。我们的结果表明，在过去十年中，关于人类蛋白质的信息分布高度倾斜，其中给予蛋白质的注释中的不平等仍然很高。此外，我们研究了数据库中捕获的关于蛋白质功能的知识与科学文献中提到的对蛋白质的兴趣之间的相关性。我们展示了知识和兴趣之间的巨大差距，并剖析了导致这种差距的因素。总之，我们的研究表明，研究工作应该转向研究较少的蛋白质，以减轻数据库和文献中人类蛋白质之间的差异。

{"title":"A longitudinal analysis of function annotations of the human proteome reveals consistently high biases.","authors":"An Phan, Parnal Joshi, Claus Kadelka, Iddo Friedberg","doi":"10.1093/database/baaf036","DOIUrl":"10.1093/database/baaf036","url":null,"abstract":"The resources required to study gene function are limited, especially when considering the number of genes in the human genome and the complexity of their function. Therefore, genes are prioritized for experimental studies based on many different considerations, including, but not limited to, perceived biomedical importance, such as disease-associated genes, or the understanding of biological processes, such as cell signalling pathways. At the same time, most genes are not studied or are under-characterized, which hampers our understanding of their function and potential effects on human health and wellness. Understanding function annotation disparity is a necessary first step toward understanding how much functional knowledge is gained from the human genome, and toward guidelines for better targeting future studies of the genes in the human genome effectively. Here, we present a comprehensive longitudinal analysis of the human proteome utilizing data analysis tools from economics and information theory. Specifically, we view the human proteome as a population of proteins within a knowledge economy: we treat the quantified knowledge of the protein's function as the analogue of wealth and examine the distribution of information in a population of proteins in the proteome in the same manner distribution of wealth is studied in societies. Our results show a highly skewed distribution of information about human proteins over the last decade, in which the inequality in the annotations given to the proteins remains high. Additionally, we examine the correlation between the knowledge about protein function as captured in databases and the interest in proteins as reflected by mentions in the scientific literature. We show a large gap between knowledge and interest and dissect the factors leading to this gap. In conclusion, our study shows that research efforts should be redirected to less studied proteins to mitigate the disparity among human proteins both in databases and literature.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12060720/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143984205","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CancerPPD2: an updated repository of anticancer peptides and proteins. CancerPPD2：抗癌肽和蛋白质的更新库。

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-05-07 DOI: 10.1093/database/baaf030

Milind Chauhan, Amisha Gupta, Ritu Tomer, Gajendra P S Raghava

CancerPPD2 (http://webs.iiitd.edu.in/raghava/cancerppd2/) is an updated version of CancerPPD, developed to maintain comprehensive information about anticancer peptides and proteins. It contains 6521 entries, each entry provides detailed information about an anticancer peptide/protein that include origin of the peptide, cancer cell line, type of cancer, peptide sequence, and structure. These anticancer peptides have been tested against 392 types of cancer cell lines and 28 types of cancer-associated tissues. In addition to natural anticancer peptides, CancerPPD2 contains 781 entries for chemically modified and 3018 entries for N-/C- terminus modified anticancer peptides. Few entries are also linked with 47 clinical studies and have provided the cross reference to Uniprot, DrugBank, and ThPDB2. The possible entries also linked with clinical trials. On average, CancerPPD2 contains around 85% more information than its previous version, CancerPPD. The structures of these anticancer peptides and proteins were either obtained from the Protein Data Bank (PDB) or predicted using PEPstrMOD, I-TASSER, and AlphaFold. A wide range of tools have been integrated into CancerPPD2 for data retrieval and similarity searches. Additionally, we integrated a REST API into this repository to facilitate automatic data retrieval via program. Database URL: https://webs.iiitd.edu.in/raghava/cancerppd2/api/rest.html.

CancerPPD2 （http://webs.iiitd.edu.in/raghava/cancerppd2/）是CancerPPD的更新版本，旨在维护有关抗癌肽和蛋白质的全面信息。它包含6521个条目，每个条目提供有关抗癌肽/蛋白质的详细信息，包括肽的来源、癌细胞系、癌症类型、肽序列和结构。这些抗癌肽已经对392种癌细胞系和28种癌症相关组织进行了测试。除天然抗癌肽外，CancerPPD2含有781个化学修饰的片段和3018个N /C末端修饰的抗癌肽片段。少数条目还与47项临床研究相关联，并为Uniprot、DrugBank和ThPDB2提供了交叉参考。可能的条目还与临床试验有关。平均而言，CancerPPD2比之前的版本CancerPPD多包含约85%的信息。这些抗癌肽和蛋白质的结构要么从蛋白质数据库（PDB）中获得，要么使用PEPstrMOD、I-TASSER和AlphaFold预测。CancerPPD2中集成了多种工具，用于数据检索和相似性搜索。此外，我们将一个REST API集成到这个存储库中，以方便通过程序自动检索数据。数据库地址：https://webs.iiitd.edu.in/raghava/cancerppd2/api/rest.html。

{"title":"CancerPPD2: an updated repository of anticancer peptides and proteins.","authors":"Milind Chauhan, Amisha Gupta, Ritu Tomer, Gajendra P S Raghava","doi":"10.1093/database/baaf030","DOIUrl":"10.1093/database/baaf030","url":null,"abstract":"CancerPPD2 (http://webs.iiitd.edu.in/raghava/cancerppd2/) is an updated version of CancerPPD, developed to maintain comprehensive information about anticancer peptides and proteins. It contains 6521 entries, each entry provides detailed information about an anticancer peptide/protein that include origin of the peptide, cancer cell line, type of cancer, peptide sequence, and structure. These anticancer peptides have been tested against 392 types of cancer cell lines and 28 types of cancer-associated tissues. In addition to natural anticancer peptides, CancerPPD2 contains 781 entries for chemically modified and 3018 entries for N-/C- terminus modified anticancer peptides. Few entries are also linked with 47 clinical studies and have provided the cross reference to Uniprot, DrugBank, and ThPDB2. The possible entries also linked with clinical trials. On average, CancerPPD2 contains around 85% more information than its previous version, CancerPPD. The structures of these anticancer peptides and proteins were either obtained from the Protein Data Bank (PDB) or predicted using PEPstrMOD, I-TASSER, and AlphaFold. A wide range of tools have been integrated into CancerPPD2 for data retrieval and similarity searches. Additionally, we integrated a REST API into this repository to facilitate automatic data retrieval via program. Database URL: https://webs.iiitd.edu.in/raghava/cancerppd2/api/rest.html.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12060709/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143981885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

STCDB4ND: a signal transduction classification database for neurological diseases. STCDB4ND：神经系统疾病信号转导分类数据库。

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-05-02 DOI: 10.1093/database/baaf032

Boyan Gong, Sida Li, Yifan Chen, Liya Liu, Ralf Hofestädt, Ming Chen

Neurological disorders pose significant global health challenges due to their complex etiology and insufficient understanding of underlying mechanisms. Signal transduction pathways are critical in the pathophysiology of these diseases and have been extensively studied to develop therapeutic interventions. However, existing databases for biological signal pathways often overlook the dynamic interactions between entities within these pathways and lack standardized representations of the signaling processes. To address these limitations, we present STCDB4ND, a specialized database focused on signal transduction pathways associated with neurological diseases. Utilizing the ST classification system, STCDB4ND provides a unified framework for pathway representation, emphasizing interactions and pathway characteristics. The database features advanced visualization tools, network analysis capabilities, and a key factor identification module, enabling researchers to comprehensively study these complex networks. Our analysis of neurological disease-related pathways using STCDB4ND revealed key signaling factors and supported existing findings on pathogenic mechanisms STCDB4ND serves as a valuable resource for advancing the understanding of neurological disease pathways and promoting novel therapeutic approaches. And we believe that STCDB will provide greater convenience for researchers in various fields as we expand the STCDB system's database in the future. Database URL: https://bis.zju.edu.cn/STCDB.

神经系统疾病由于其复杂的病因和对其潜在机制的了解不足，构成了重大的全球健康挑战。信号转导通路在这些疾病的病理生理学中至关重要，并已被广泛研究以开发治疗干预措施。然而，现有的生物信号通路数据库往往忽略了这些通路中实体之间的动态相互作用，并且缺乏信号过程的标准化表示。为了解决这些限制，我们提出了STCDB4ND，一个专门的数据库，专注于与神经系统疾病相关的信号转导途径。利用ST分类系统，STCDB4ND提供了一个统一的通路表示框架，强调相互作用和通路特征。该数据库具有先进的可视化工具、网络分析能力和关键因素识别模块，使研究人员能够全面研究这些复杂的网络。我们使用STCDB4ND对神经系统疾病相关通路进行分析，揭示了关键的信号因子，并支持了现有的致病机制发现，STCDB4ND为促进对神经系统疾病通路的理解和促进新的治疗方法提供了宝贵的资源。我们相信，随着未来STCDB系统数据库的扩展，STCDB将为各个领域的研究人员提供更大的便利。数据库地址：https://bis.zju.edu.cn/STCDB。

{"title":"STCDB4ND: a signal transduction classification database for neurological diseases.","authors":"Boyan Gong, Sida Li, Yifan Chen, Liya Liu, Ralf Hofestädt, Ming Chen","doi":"10.1093/database/baaf032","DOIUrl":"10.1093/database/baaf032","url":null,"abstract":"Neurological disorders pose significant global health challenges due to their complex etiology and insufficient understanding of underlying mechanisms. Signal transduction pathways are critical in the pathophysiology of these diseases and have been extensively studied to develop therapeutic interventions. However, existing databases for biological signal pathways often overlook the dynamic interactions between entities within these pathways and lack standardized representations of the signaling processes. To address these limitations, we present STCDB4ND, a specialized database focused on signal transduction pathways associated with neurological diseases. Utilizing the ST classification system, STCDB4ND provides a unified framework for pathway representation, emphasizing interactions and pathway characteristics. The database features advanced visualization tools, network analysis capabilities, and a key factor identification module, enabling researchers to comprehensively study these complex networks. Our analysis of neurological disease-related pathways using STCDB4ND revealed key signaling factors and supported existing findings on pathogenic mechanisms STCDB4ND serves as a valuable resource for advancing the understanding of neurological disease pathways and promoting novel therapeutic approaches. And we believe that STCDB will provide greater convenience for researchers in various fields as we expand the STCDB system's database in the future. Database URL: https://bis.zju.edu.cn/STCDB.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12047452/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143968084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

STCDB4ND: a signal transduction classification database for neurological diseases. STCDB4ND：神经系统疾病信号转导分类数据库。

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-05-02 DOI: 10.1093/database/baaf032

Boyan Gong, Sida Li, Yifan Chen, Liya Liu, Ralf Hofestädt, Ming Chen

Neurological disorders pose significant global health challenges due to their complex etiology and insufficient understanding of underlying mechanisms. Signal transduction pathways are critical in the pathophysiology of these diseases and have been extensively studied to develop therapeutic interventions. However, existing databases for biological signal pathways often overlook the dynamic interactions between entities within these pathways and lack standardized representations of the signaling processes. To address these limitations, we present STCDB4ND, a specialized database focused on signal transduction pathways associated with neurological diseases. Utilizing the ST classification system, STCDB4ND provides a unified framework for pathway representation, emphasizing interactions and pathway characteristics. The database features advanced visualization tools, network analysis capabilities, and a key factor identification module, enabling researchers to comprehensively study these complex networks. Our analysis of neurological disease-related pathways using STCDB4ND revealed key signaling factors and supported existing findings on pathogenic mechanisms STCDB4ND serves as a valuable resource for advancing the understanding of neurological disease pathways and promoting novel therapeutic approaches. And we believe that STCDB will provide greater convenience for researchers in various fields as we expand the STCDB system's database in the future. Database URL: https://bis.zju.edu.cn/STCDB.

神经系统疾病由于其复杂的病因和对其潜在机制的了解不足，构成了重大的全球健康挑战。信号转导通路在这些疾病的病理生理学中至关重要，并已被广泛研究以开发治疗干预措施。然而，现有的生物信号通路数据库往往忽略了这些通路中实体之间的动态相互作用，并且缺乏信号过程的标准化表示。为了解决这些限制，我们提出了STCDB4ND，一个专门的数据库，专注于与神经系统疾病相关的信号转导途径。利用ST分类系统，STCDB4ND提供了一个统一的通路表示框架，强调相互作用和通路特征。该数据库具有先进的可视化工具、网络分析能力和关键因素识别模块，使研究人员能够全面研究这些复杂的网络。我们使用STCDB4ND对神经系统疾病相关通路进行分析，揭示了关键的信号因子，并支持了现有的致病机制发现，STCDB4ND为促进对神经系统疾病通路的理解和促进新的治疗方法提供了宝贵的资源。我们相信，随着未来STCDB系统数据库的扩展，STCDB将为各个领域的研究人员提供更大的便利。数据库地址：https://bis.zju.edu.cn/STCDB。

{"title":"STCDB4ND: a signal transduction classification database for neurological diseases.","authors":"Boyan Gong, Sida Li, Yifan Chen, Liya Liu, Ralf Hofestädt, Ming Chen","doi":"10.1093/database/baaf032","DOIUrl":"https://doi.org/10.1093/database/baaf032","url":null,"abstract":"Neurological disorders pose significant global health challenges due to their complex etiology and insufficient understanding of underlying mechanisms. Signal transduction pathways are critical in the pathophysiology of these diseases and have been extensively studied to develop therapeutic interventions. However, existing databases for biological signal pathways often overlook the dynamic interactions between entities within these pathways and lack standardized representations of the signaling processes. To address these limitations, we present STCDB4ND, a specialized database focused on signal transduction pathways associated with neurological diseases. Utilizing the ST classification system, STCDB4ND provides a unified framework for pathway representation, emphasizing interactions and pathway characteristics. The database features advanced visualization tools, network analysis capabilities, and a key factor identification module, enabling researchers to comprehensively study these complex networks. Our analysis of neurological disease-related pathways using STCDB4ND revealed key signaling factors and supported existing findings on pathogenic mechanisms STCDB4ND serves as a valuable resource for advancing the understanding of neurological disease pathways and promoting novel therapeutic approaches. And we believe that STCDB will provide greater convenience for researchers in various fields as we expand the STCDB system's database in the future. Database URL: https://bis.zju.edu.cn/STCDB.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mapping assays to the key characteristics of carcinogens to support decision-making. 对致癌物的关键特征进行制图分析，以支持决策。

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-04-22 DOI: 10.1093/database/baaf026

Gabrielle Rigutto, Cliona M McHale, Ettayapuram Ramaprasad Azhagiya Singam, Iemaan Rana, Luoping Zhang, Martyn T Smith

The key characteristics (KCs) of carcinogens are the properties common to known human carcinogens that can be used to search for, organize, and evaluate mechanistic data in support of hazard identification. A limiting factor in this approach is that relevant in vitro and in vivo assays, as well as corresponding biomarkers and endpoints, have been only partially documented for each of the 10 KCs (Smith MT, Guyton KZ, Kleinstreuer N et al. The key characteristics of carcinogens: relationship to the hallmarks of cancer, relevant biomarkers, and assays to measure them. Cancer Epidemiol Biomarkers Prev 2020;29:1887-903. https://doi.org/10.1158/1055-9965.EPI-19-1346). To address this limitation, a comprehensive database is described that catalogues these previously described methods and endpoints/biomarkers pertinent to the 10 KCs of carcinogens as well as those referenced as supporting evidence for each KC in the International Agency of Research on Cancer Monograph Volumes 112-131. Our comprehensive mapping of KCs to assays and endpoints can be used to facilitate mechanistic data searches, presents a useful tool for searching for assays and endpoints relevant to the 10 KCs, and can be used to create a roadmap for utilizing data to evaluate the strength of the evidence for each KC. The KC-Assay database is available to the public on the web at https://kcad.cchem.berkeley.edu and acts as a 'living document', with the ability to be updated and refined. Database URL: https://kcad.cchem.berkeley.edu.

致癌物的关键特征（KCs）是已知人类致癌物的共同特性，可用于搜索、组织和评估支持危害识别的机制数据。这种方法的一个限制因素是，相关的体外和体内试验，以及相应的生物标志物和终点，仅部分记录了10种KCs中的每一种（Smith MT, Guyton KZ， Kleinstreuer N等）。致癌物的主要特征：与癌症特征的关系，相关的生物标志物，以及测量它们的方法。癌症流行病学杂志，2020;29:1887-903。https://doi.org/10.1158/1055 - 9965. - epi - 19 - 1346)。为了解决这一限制，本文描述了一个综合数据库，该数据库将这些先前描述的与10种致癌物质相关的方法和终点/生物标志物以及国际癌症研究机构专著第112-131卷中作为每种致癌物质的支持证据的方法和终点/生物标志物进行了分类。我们全面的映射;化验和端点可以用来促进机械的数据搜索,提出了一种有用的工具,寻找相关化验和端点10;,,可以用来创建一个路线图利用数据来评估每个KC的证据的力量。KC-Assay数据库向公众提供在网络上https://kcad.cchem.berkeley.edu和充当“活文件”,能够被更新和改进。数据库地址：https://kcad.cchem.berkeley.edu。

{"title":"Mapping assays to the key characteristics of carcinogens to support decision-making.","authors":"Gabrielle Rigutto, Cliona M McHale, Ettayapuram Ramaprasad Azhagiya Singam, Iemaan Rana, Luoping Zhang, Martyn T Smith","doi":"10.1093/database/baaf026","DOIUrl":"10.1093/database/baaf026","url":null,"abstract":"The key characteristics (KCs) of carcinogens are the properties common to known human carcinogens that can be used to search for, organize, and evaluate mechanistic data in support of hazard identification. A limiting factor in this approach is that relevant in vitro and in vivo assays, as well as corresponding biomarkers and endpoints, have been only partially documented for each of the 10 KCs (Smith MT, Guyton KZ, Kleinstreuer N et al. The key characteristics of carcinogens: relationship to the hallmarks of cancer, relevant biomarkers, and assays to measure them. Cancer Epidemiol Biomarkers Prev 2020;29:1887-903. https://doi.org/10.1158/1055-9965.EPI-19-1346). To address this limitation, a comprehensive database is described that catalogues these previously described methods and endpoints/biomarkers pertinent to the 10 KCs of carcinogens as well as those referenced as supporting evidence for each KC in the International Agency of Research on Cancer Monograph Volumes 112-131. Our comprehensive mapping of KCs to assays and endpoints can be used to facilitate mechanistic data searches, presents a useful tool for searching for assays and endpoints relevant to the 10 KCs, and can be used to create a roadmap for utilizing data to evaluate the strength of the evidence for each KC. The KC-Assay database is available to the public on the web at https://kcad.cchem.berkeley.edu and acts as a 'living document', with the ability to be updated and refined. Database URL: https://kcad.cchem.berkeley.edu.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12013474/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143968082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Mapping assays to the key characteristics of carcinogens to support decision-making. 对致癌物的关键特征进行制图分析，以支持决策。

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-04-22 DOI: 10.1093/database/baaf026

Gabrielle Rigutto, Cliona M McHale, Ettayapuram Ramaprasad Azhagiya Singam, Iemaan Rana, Luoping Zhang, Martyn T Smith

The key characteristics (KCs) of carcinogens are the properties common to known human carcinogens that can be used to search for, organize, and evaluate mechanistic data in support of hazard identification. A limiting factor in this approach is that relevant in vitro and in vivo assays, as well as corresponding biomarkers and endpoints, have been only partially documented for each of the 10 KCs (Smith MT, Guyton KZ, Kleinstreuer N et al. The key characteristics of carcinogens: relationship to the hallmarks of cancer, relevant biomarkers, and assays to measure them. Cancer Epidemiol Biomarkers Prev 2020;29:1887-903. https://doi.org/10.1158/1055-9965.EPI-19-1346). To address this limitation, a comprehensive database is described that catalogues these previously described methods and endpoints/biomarkers pertinent to the 10 KCs of carcinogens as well as those referenced as supporting evidence for each KC in the International Agency of Research on Cancer Monograph Volumes 112-131. Our comprehensive mapping of KCs to assays and endpoints can be used to facilitate mechanistic data searches, presents a useful tool for searching for assays and endpoints relevant to the 10 KCs, and can be used to create a roadmap for utilizing data to evaluate the strength of the evidence for each KC. The KC-Assay database is available to the public on the web at https://kcad.cchem.berkeley.edu and acts as a 'living document', with the ability to be updated and refined. Database URL: https://kcad.cchem.berkeley.edu.

致癌物的关键特征（KCs）是已知人类致癌物的共同特性，可用于搜索、组织和评估支持危害识别的机制数据。这种方法的一个限制因素是，相关的体外和体内试验，以及相应的生物标志物和终点，仅部分记录了10种KCs中的每一种（Smith MT, Guyton KZ， Kleinstreuer N等）。致癌物的主要特征：与癌症特征的关系，相关的生物标志物，以及测量它们的方法。癌症流行病学杂志，2020;29:1887-903。https://doi.org/10.1158/1055 - 9965. - epi - 19 - 1346)。为了解决这一限制，本文描述了一个综合数据库，该数据库将这些先前描述的与10种致癌物质相关的方法和终点/生物标志物以及国际癌症研究机构专著第112-131卷中作为每种致癌物质的支持证据的方法和终点/生物标志物进行了分类。我们全面的映射;化验和端点可以用来促进机械的数据搜索,提出了一种有用的工具,寻找相关化验和端点10;,,可以用来创建一个路线图利用数据来评估每个KC的证据的力量。KC-Assay数据库向公众提供在网络上https://kcad.cchem.berkeley.edu和充当“活文件”,能够被更新和改进。数据库地址：https://kcad.cchem.berkeley.edu。

{"title":"Mapping assays to the key characteristics of carcinogens to support decision-making.","authors":"Gabrielle Rigutto, Cliona M McHale, Ettayapuram Ramaprasad Azhagiya Singam, Iemaan Rana, Luoping Zhang, Martyn T Smith","doi":"10.1093/database/baaf026","DOIUrl":"https://doi.org/10.1093/database/baaf026","url":null,"abstract":"The key characteristics (KCs) of carcinogens are the properties common to known human carcinogens that can be used to search for, organize, and evaluate mechanistic data in support of hazard identification. A limiting factor in this approach is that relevant in vitro and in vivo assays, as well as corresponding biomarkers and endpoints, have been only partially documented for each of the 10 KCs (Smith MT, Guyton KZ, Kleinstreuer N et al. The key characteristics of carcinogens: relationship to the hallmarks of cancer, relevant biomarkers, and assays to measure them. Cancer Epidemiol Biomarkers Prev 2020;29:1887-903. https://doi.org/10.1158/1055-9965.EPI-19-1346). To address this limitation, a comprehensive database is described that catalogues these previously described methods and endpoints/biomarkers pertinent to the 10 KCs of carcinogens as well as those referenced as supporting evidence for each KC in the International Agency of Research on Cancer Monograph Volumes 112-131. Our comprehensive mapping of KCs to assays and endpoints can be used to facilitate mechanistic data searches, presents a useful tool for searching for assays and endpoints relevant to the 10 KCs, and can be used to create a roadmap for utilizing data to evaluate the strength of the evidence for each KC. The KC-Assay database is available to the public on the web at https://kcad.cchem.berkeley.edu and acts as a 'living document', with the ability to be updated and refined. Database URL: https://kcad.cchem.berkeley.edu.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CPDMS: a database system for crop physiological disorder management. CPDMS：作物生理失调管理数据库系统。

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-04-22 DOI: 10.1093/database/baaf031

Jae-Hyeon Oh, Hwang-Weon Jeong, Il Pyung Ahn, Seon-Hwa Bae, Sung Mi Kim, Eunhee Kim, Su Jung Ra, Jinjeong Lee, Hye Yeon Choi, Young-Joo Seol

As the importance of precision agriculture grows, scalable and efficient methods for real-time data collection and analysis have become essential. In this study, we developed a system to collect real-time crop images, focusing on physiological disorders in tomatoes. This system systematically collects crop images and related data, with the potential to evolve into a valuable tool for researchers and agricultural practitioners. A total of 58 479 images were produced under stress conditions, including bacterial wilt (BW), Tomato Yellow Leaf Curl Virus (TYLCV), Tomato Spotted Wilt Virus (TSWV), drought, and salinity, across seven tomato varieties. The images include front views at 0 degrees, 120 degrees, 240 degrees, and top views and petiole images. Of these, 43 894 images were suitable for labeling. Based on this, 24 000 images were used for AI model training, and 13 037 images for model testing. By training a deep learning model, we achieved a mean Average Precision (mAP) of 0.46 and a recall rate of 0.60. Additionally, we discussed data augmentation and hyperparameter tuning strategies to improve AI model performance and explored the potential for generalizing the system across various agricultural environments. The database constructed in this study will serve as a crucial resource for the future development of agricultural AI. Database URL: https://crops.phyzen.com/.

随着精准农业重要性的增长，实时数据收集和分析的可扩展和高效方法变得至关重要。在这项研究中，我们开发了一个系统来收集实时作物图像，专注于番茄的生理失调。该系统系统地收集作物图像和相关数据，有可能发展成为研究人员和农业从业者的宝贵工具。在包括细菌性枯萎病（BW）、番茄黄卷叶病毒（TYLCV）、番茄斑点枯萎病（TSWV）、干旱和盐度在内的胁迫条件下，共生成了58 479张图像，涉及7个番茄品种。这些图像包括0度、120度、240度的前视图，以及俯视图和叶柄图像。其中，43 894幅图像适合标记。在此基础上，人工智能模型训练使用了2.4万张图像，模型测试使用了13 037张图像。通过训练深度学习模型，我们实现了0.46的平均精度（mAP）和0.60的召回率。此外，我们讨论了数据增强和超参数调整策略，以提高人工智能模型的性能，并探索了在各种农业环境中推广系统的潜力。本研究构建的数据库将成为未来农业人工智能发展的重要资源。数据库地址：https://crops.phyzen.com/。

{"title":"CPDMS: a database system for crop physiological disorder management.","authors":"Jae-Hyeon Oh, Hwang-Weon Jeong, Il Pyung Ahn, Seon-Hwa Bae, Sung Mi Kim, Eunhee Kim, Su Jung Ra, Jinjeong Lee, Hye Yeon Choi, Young-Joo Seol","doi":"10.1093/database/baaf031","DOIUrl":"https://doi.org/10.1093/database/baaf031","url":null,"abstract":"As the importance of precision agriculture grows, scalable and efficient methods for real-time data collection and analysis have become essential. In this study, we developed a system to collect real-time crop images, focusing on physiological disorders in tomatoes. This system systematically collects crop images and related data, with the potential to evolve into a valuable tool for researchers and agricultural practitioners. A total of 58 479 images were produced under stress conditions, including bacterial wilt (BW), Tomato Yellow Leaf Curl Virus (TYLCV), Tomato Spotted Wilt Virus (TSWV), drought, and salinity, across seven tomato varieties. The images include front views at 0 degrees, 120 degrees, 240 degrees, and top views and petiole images. Of these, 43 894 images were suitable for labeling. Based on this, 24 000 images were used for AI model training, and 13 037 images for model testing. By training a deep learning model, we achieved a mean Average Precision (mAP) of 0.46 and a recall rate of 0.60. Additionally, we discussed data augmentation and hyperparameter tuning strategies to improve AI model performance and explored the potential for generalizing the system across various agricultural environments. The database constructed in this study will serve as a crucial resource for the future development of agricultural AI. Database URL: https://crops.phyzen.com/.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0