首页 > 最新文献

Database: The Journal of Biological Databases and Curation最新文献

英文 中文
Building resource-efficient community databases using open-source software. 使用开源软件构建资源高效的社区数据库。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-02-12 DOI: 10.1093/database/baaf005
Sook Jung, Chun-Huai Cheng, Taein Lee, Katheryn Buble, Jodi Humann, Ping Zheng, Jing Yu, Dorrie Main

The unprecedented volume of big data being routinely generated for nonmodel crop species, coupled with advanced technology enabling the use of big data in breeding, gives further impetus for the need to have access to crop community databases, where all relevant data are curated and integrated. Funding for such databases is, however, insufficient and intermittent, resulting in the data being underutilized. While increased awareness of the importance of funding databases is important, it is practically necessary to find a more efficient way to build a community database. To meet the need for integrated database resources for various crop genomics, genetics, and breeding research communities, we have built five crop databases over the last decade using an open-source database platform and software. We describe the system and methods used for database construction, curation, and analysis protocols, and the data and tools that are available in these five crop databases. Database URL: The Genome Database for Rosaceae (GDR, www.rosaceae.org), the Genome Database for Vaccinium (GDV, www.vaccinium.org), the Citrus Genome Database (CGD, www.citrusgenomedb.org), the Pulse Crop Database (PCD, www.pulsedb.org), and CottonGen (www.cottongen.org).

非常规作物品种的常规大数据量前所未有,再加上能够在育种中使用大数据的先进技术,进一步推动了访问作物社区数据库的需求,所有相关数据都是经过整理和整合的。但是,这种数据库的经费不足,而且时断时续,导致数据没有得到充分利用。虽然提高对数据库筹资重要性的认识是重要的,但实际上必须找到一种更有效的方法来建立社区数据库。为了满足各种作物基因组学、遗传学和育种研究团体对综合数据库资源的需求,我们在过去十年中使用开源数据库平台和软件建立了五个作物数据库。我们描述了用于数据库构建、管理和分析协议的系统和方法,以及这五个作物数据库中可用的数据和工具。数据库链接:蔷薇科基因组数据库(GDR, www.rosaceae.org)、Vaccinium基因组数据库(GDV, www.vaccinium.org)、Citrus基因组数据库(CGD, www.citrusgenomedb.org)、Pulse Crop数据库(PCD, www.pulsedb.org)和CottonGen基因组数据库(www.cottongen.org)。
{"title":"Building resource-efficient community databases using open-source software.","authors":"Sook Jung, Chun-Huai Cheng, Taein Lee, Katheryn Buble, Jodi Humann, Ping Zheng, Jing Yu, Dorrie Main","doi":"10.1093/database/baaf005","DOIUrl":"https://doi.org/10.1093/database/baaf005","url":null,"abstract":"<p><p>The unprecedented volume of big data being routinely generated for nonmodel crop species, coupled with advanced technology enabling the use of big data in breeding, gives further impetus for the need to have access to crop community databases, where all relevant data are curated and integrated. Funding for such databases is, however, insufficient and intermittent, resulting in the data being underutilized. While increased awareness of the importance of funding databases is important, it is practically necessary to find a more efficient way to build a community database. To meet the need for integrated database resources for various crop genomics, genetics, and breeding research communities, we have built five crop databases over the last decade using an open-source database platform and software. We describe the system and methods used for database construction, curation, and analysis protocols, and the data and tools that are available in these five crop databases. Database URL: The Genome Database for Rosaceae (GDR, www.rosaceae.org), the Genome Database for Vaccinium (GDV, www.vaccinium.org), the Citrus Genome Database (CGD, www.citrusgenomedb.org), the Pulse Crop Database (PCD, www.pulsedb.org), and CottonGen (www.cottongen.org).</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AnnCovDB: a manually curated annotation database for mutations in SARS-CoV-2 spike protein. AnnCovDB: SARS-CoV-2刺突蛋白突变的人工编辑注释数据库。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-02-12 DOI: 10.1093/database/baaf002
Xiaomin Zhang, Zhongyi Lei, Jiarong Zhang, Tingting Yang, Xian Liu, Jiguo Xue, Ming Ni

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been circulating and adapting within the human population for >4 years. A large number of mutations have occurred in the viral genome, resulting in significant variants known as variants of concern (VOCs) and variants of interest (VOIs). The spike (S) protein harbors many of the characteristic mutations of VOCs and VOIs, and significant efforts have been made to explore functional effects of the mutations in the S protein, which can cause or contribute to viral infection, transmission, immune evasion, pathogenicity, and illness severity. However, the knowledge and understanding are dispersed throughout various publications, and there is a lack of a well-structured database for functional annotation that is based on manual curation. AnnCovDB is a database that provides manually curated functional annotations for mutations in the S protein of SARS-CoV-2. Mutations in the S protein carried by at least 8000 variants in the GISAID were chosen, and the mutations were then utilized as query keywords to search in the PubMed database. The searched publications revealed that 2093 annotation entities for 205 single mutations and 93 multiple mutations were manually curated. These entities were organized into multilevel hierarchical categories for user convenience. For example, one annotation entity of N501Y mutation was 'Infectious cycle➔Attachment➔ACE2 binding affinity➔Increase'. AnnCovDB can be used to query specific mutations and browse through function annotation entities. Database URL: https://AnnCovDB.app.bio-it.tech/.

严重急性呼吸综合征冠状病毒2型(SARS-CoV-2)已经在人群中传播和适应了40年。病毒基因组中发生了大量突变,导致了重要的变异,称为关注变异(VOCs)和感兴趣变异(VOIs)。刺突(S)蛋白包含许多VOCs和voi的特征突变,并且已经做出了大量的努力来探索S蛋白突变的功能影响,这些突变可能导致或促成病毒感染、传播、免疫逃避、致病性和疾病严重程度。然而,知识和理解分散在各种出版物中,并且缺乏基于手动管理的功能注释的结构良好的数据库。AnnCovDB是一个数据库,为SARS-CoV-2的S蛋白突变提供人工整理的功能注释。选择GISAID中至少8000个变异携带的S蛋白突变,并将这些突变作为查询关键词在PubMed数据库中进行检索。检索到的出版物显示,205个单突变和93个多突变的2093个注释实体是手动策划的。为了方便用户,这些实体被组织成多层次的分类。例如,N501Y突变的一个注释实体是“感染周期、附着、ACE2结合亲和、增加”。AnnCovDB可用于查询特定的突变和浏览函数注释实体。数据库地址:https://AnnCovDB.app.bio-it.tech/。
{"title":"AnnCovDB: a manually curated annotation database for mutations in SARS-CoV-2 spike protein.","authors":"Xiaomin Zhang, Zhongyi Lei, Jiarong Zhang, Tingting Yang, Xian Liu, Jiguo Xue, Ming Ni","doi":"10.1093/database/baaf002","DOIUrl":"https://doi.org/10.1093/database/baaf002","url":null,"abstract":"<p><p>Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has been circulating and adapting within the human population for >4 years. A large number of mutations have occurred in the viral genome, resulting in significant variants known as variants of concern (VOCs) and variants of interest (VOIs). The spike (S) protein harbors many of the characteristic mutations of VOCs and VOIs, and significant efforts have been made to explore functional effects of the mutations in the S protein, which can cause or contribute to viral infection, transmission, immune evasion, pathogenicity, and illness severity. However, the knowledge and understanding are dispersed throughout various publications, and there is a lack of a well-structured database for functional annotation that is based on manual curation. AnnCovDB is a database that provides manually curated functional annotations for mutations in the S protein of SARS-CoV-2. Mutations in the S protein carried by at least 8000 variants in the GISAID were chosen, and the mutations were then utilized as query keywords to search in the PubMed database. The searched publications revealed that 2093 annotation entities for 205 single mutations and 93 multiple mutations were manually curated. These entities were organized into multilevel hierarchical categories for user convenience. For example, one annotation entity of N501Y mutation was 'Infectious cycle➔Attachment➔ACE2 binding affinity➔Increase'. AnnCovDB can be used to query specific mutations and browse through function annotation entities. Database URL: https://AnnCovDB.app.bio-it.tech/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Working in biocuration: contemporary experiences and perspectives. 在生物学领域工作:当代经验和观点。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-02-12 DOI: 10.1093/database/baaf003
Sarah R Davies

This perspective article synthesizes current knowledge regarding what is known regarding biocuration as a career and the challenges facing the field. It draws on existing literature and ongoing qualitative research to discuss the nature of biocuration, biocurators' career trajectories, key challenges that biocurators face, and strategies for overcoming these. Overall, biocurators express a high degree of satisfaction with their work and see it as central to the wider biosciences. The central challenges that they face relate to the underfunding and under-recognition of this work, meaning that there is minimal stable funding for the field and that the work of human biocurators is often invisible to those who use curated resources. The article closes by critically discussing existing and potential strategies for responding to these challenges.

这篇观点文章综合了目前关于生物固化作为一种职业的已知知识以及该领域面临的挑战。它利用现有文献和正在进行的定性研究来讨论生物馆长的本质、生物馆长的职业轨迹、生物馆长面临的主要挑战以及克服这些挑战的策略。总体而言,生物策展人对他们的工作表达了高度的满意度,并将其视为更广泛的生物科学的核心。他们面临的主要挑战与这项工作的资金不足和不被认可有关,这意味着该领域的稳定资金很少,而且人类生物策展人的工作通常对那些使用策展资源的人来说是不可见的。文章最后批判性地讨论了应对这些挑战的现有和潜在战略。
{"title":"Working in biocuration: contemporary experiences and perspectives.","authors":"Sarah R Davies","doi":"10.1093/database/baaf003","DOIUrl":"https://doi.org/10.1093/database/baaf003","url":null,"abstract":"<p><p>This perspective article synthesizes current knowledge regarding what is known regarding biocuration as a career and the challenges facing the field. It draws on existing literature and ongoing qualitative research to discuss the nature of biocuration, biocurators' career trajectories, key challenges that biocurators face, and strategies for overcoming these. Overall, biocurators express a high degree of satisfaction with their work and see it as central to the wider biosciences. The central challenges that they face relate to the underfunding and under-recognition of this work, meaning that there is minimal stable funding for the field and that the work of human biocurators is often invisible to those who use curated resources. The article closes by critically discussing existing and potential strategies for responding to these challenges.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126946","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LitSumm: large language models for literature summarization of noncoding RNAs. LitSumm:非编码rna文献综述的大型语言模型。
IF 3.6 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-02-05 DOI: 10.1093/database/baaf006
Andrew Green, Carlos Eduardo Ribas, Nancy Ontiveros-Palacios, Sam Griffiths-Jones, Anton I Petrov, Alex Bateman, Blake Sweeney

Curation of literature in life sciences is a growing challenge. The continued increase in the rate of publication, coupled with the relatively fixed number of curators worldwide, presents a major challenge to developers of biomedical knowledgebases. Very few knowledgebases have resources to scale to the whole relevant literature and all have to prioritize their efforts. In this work, we take a first step to alleviating the lack of curator time in RNA science by generating summaries of literature for noncoding RNAs using large language models (LLMs). We demonstrate that high-quality, factually accurate summaries with accurate references can be automatically generated from the literature using a commercial LLM and a chain of prompts and checks. Manual assessment was carried out for a subset of summaries, with the majority being rated extremely high quality. We apply our tool to a selection of >4600 ncRNAs and make the generated summaries available via the RNAcentral resource. We conclude that automated literature summarization is feasible with the current generation of LLMs, provided that careful prompting and automated checking are applied. Database URL: https://rnacentral.org/.

生命科学文献的管理是一个越来越大的挑战。出版速度的持续增长,加上世界范围内馆长的数量相对固定,对生物医学知识库的开发人员提出了重大挑战。很少有知识库有资源可以扩展到整个相关文献,并且所有知识库都必须优先考虑他们的努力。在这项工作中,我们通过使用大型语言模型(llm)生成非编码RNA的文献摘要,迈出了缓解RNA科学缺乏管理员时间的第一步。我们证明,使用商业法学硕士和一系列提示和检查,可以从文献中自动生成具有准确参考文献的高质量,事实准确的摘要。对摘要的一个子集进行了人工评估,其中大多数被评为极高质量。我们将我们的工具应用于选择的bb104600个ncrna,并通过rnaccentral资源提供生成的摘要。我们得出的结论是,如果采用仔细的提示和自动检查,自动文献摘要在当前一代法学硕士中是可行的。数据库地址:https://rnacentral.org/。
{"title":"LitSumm: large language models for literature summarization of noncoding RNAs.","authors":"Andrew Green, Carlos Eduardo Ribas, Nancy Ontiveros-Palacios, Sam Griffiths-Jones, Anton I Petrov, Alex Bateman, Blake Sweeney","doi":"10.1093/database/baaf006","DOIUrl":"10.1093/database/baaf006","url":null,"abstract":"<p><p>Curation of literature in life sciences is a growing challenge. The continued increase in the rate of publication, coupled with the relatively fixed number of curators worldwide, presents a major challenge to developers of biomedical knowledgebases. Very few knowledgebases have resources to scale to the whole relevant literature and all have to prioritize their efforts. In this work, we take a first step to alleviating the lack of curator time in RNA science by generating summaries of literature for noncoding RNAs using large language models (LLMs). We demonstrate that high-quality, factually accurate summaries with accurate references can be automatically generated from the literature using a commercial LLM and a chain of prompts and checks. Manual assessment was carried out for a subset of summaries, with the majority being rated extremely high quality. We apply our tool to a selection of >4600 ncRNAs and make the generated summaries available via the RNAcentral resource. We conclude that automated literature summarization is feasible with the current generation of LLMs, provided that careful prompting and automated checking are applied. Database URL: https://rnacentral.org/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.6,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11833236/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143254947","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LitSumm: large language models for literature summarization of noncoding RNAs. LitSumm:非编码rna文献综述的大型语言模型。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-02-05 DOI: 10.1093/database/baaf006
Andrew Green, Carlos Eduardo Ribas, Nancy Ontiveros-Palacios, Sam Griffiths-Jones, Anton I Petrov, Alex Bateman, Blake Sweeney

Curation of literature in life sciences is a growing challenge. The continued increase in the rate of publication, coupled with the relatively fixed number of curators worldwide, presents a major challenge to developers of biomedical knowledgebases. Very few knowledgebases have resources to scale to the whole relevant literature and all have to prioritize their efforts. In this work, we take a first step to alleviating the lack of curator time in RNA science by generating summaries of literature for noncoding RNAs using large language models (LLMs). We demonstrate that high-quality, factually accurate summaries with accurate references can be automatically generated from the literature using a commercial LLM and a chain of prompts and checks. Manual assessment was carried out for a subset of summaries, with the majority being rated extremely high quality. We apply our tool to a selection of >4600 ncRNAs and make the generated summaries available via the RNAcentral resource. We conclude that automated literature summarization is feasible with the current generation of LLMs, provided that careful prompting and automated checking are applied. Database URL: https://rnacentral.org/.

生命科学文献的管理是一个越来越大的挑战。出版速度的持续增长,加上世界范围内馆长的数量相对固定,对生物医学知识库的开发人员提出了重大挑战。很少有知识库有资源可以扩展到整个相关文献,并且所有知识库都必须优先考虑他们的努力。在这项工作中,我们通过使用大型语言模型(llm)生成非编码RNA的文献摘要,迈出了缓解RNA科学缺乏管理员时间的第一步。我们证明,使用商业法学硕士和一系列提示和检查,可以从文献中自动生成具有准确参考文献的高质量,事实准确的摘要。对摘要的一个子集进行了人工评估,其中大多数被评为极高质量。我们将我们的工具应用于选择的bb104600个ncrna,并通过rnaccentral资源提供生成的摘要。我们得出的结论是,如果采用仔细的提示和自动检查,自动文献摘要在当前一代法学硕士中是可行的。数据库地址:https://rnacentral.org/。
{"title":"LitSumm: large language models for literature summarization of noncoding RNAs.","authors":"Andrew Green, Carlos Eduardo Ribas, Nancy Ontiveros-Palacios, Sam Griffiths-Jones, Anton I Petrov, Alex Bateman, Blake Sweeney","doi":"10.1093/database/baaf006","DOIUrl":"https://doi.org/10.1093/database/baaf006","url":null,"abstract":"<p><p>Curation of literature in life sciences is a growing challenge. The continued increase in the rate of publication, coupled with the relatively fixed number of curators worldwide, presents a major challenge to developers of biomedical knowledgebases. Very few knowledgebases have resources to scale to the whole relevant literature and all have to prioritize their efforts. In this work, we take a first step to alleviating the lack of curator time in RNA science by generating summaries of literature for noncoding RNAs using large language models (LLMs). We demonstrate that high-quality, factually accurate summaries with accurate references can be automatically generated from the literature using a commercial LLM and a chain of prompts and checks. Manual assessment was carried out for a subset of summaries, with the majority being rated extremely high quality. We apply our tool to a selection of >4600 ncRNAs and make the generated summaries available via the RNAcentral resource. We conclude that automated literature summarization is feasible with the current generation of LLMs, provided that careful prompting and automated checking are applied. Database URL: https://rnacentral.org/.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Expression of Concern: DisGeNet: a disease-centric interaction database among diseases and various associated genes. 关注表达:DisGeNet:疾病和各种相关基因之间以疾病为中心的相互作用数据库。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-01-31 DOI: 10.1093/database/baaf007
{"title":"Expression of Concern: DisGeNet: a disease-centric interaction database among diseases and various associated genes.","authors":"","doi":"10.1093/database/baaf007","DOIUrl":"https://doi.org/10.1093/database/baaf007","url":null,"abstract":"","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Expression of Concern: DisGeNet: a disease-centric interaction database among diseases and various associated genes. 关注表达:DisGeNet:疾病和各种相关基因之间以疾病为中心的相互作用数据库。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-01-31 DOI: 10.1093/database/baaf007
{"title":"Expression of Concern: DisGeNet: a disease-centric interaction database among diseases and various associated genes.","authors":"","doi":"10.1093/database/baaf007","DOIUrl":"https://doi.org/10.1093/database/baaf007","url":null,"abstract":"","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143995483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Expression of Concern: DisGeNet: a disease-centric interaction database among diseases and various associated genes. 关注表达:DisGeNet:疾病和各种相关基因之间以疾病为中心的相互作用数据库。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-01-31 DOI: 10.1093/database/baaf007
{"title":"Expression of Concern: DisGeNet: a disease-centric interaction database among diseases and various associated genes.","authors":"","doi":"10.1093/database/baaf007","DOIUrl":"10.1093/database/baaf007","url":null,"abstract":"","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11784583/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143070758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Helping authors produce FAIR taxonomic data: evaluation of an author-driven phenotype data production prototype. 帮助作者产生公平的分类数据:作者驱动的表型数据生产原型的评估。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-01-29 DOI: 10.1093/database/baae097
Limin Zhang, Julian Starr, Bruce Ford, Anton Reznicek, Yuxuan Zhou, Étienne Léveillé-Bourret, Étienne Lacroix-Carignan, Jacques Cayouette, Tyler W Smith, Donald Sutherland, Paul Catling, Jeffery M Saarela, Hong Cui, James Macklin

It is well-known that the use of vocabulary in phenotype treatments is often inconsistent. An earlier survey of biologists who create or use phenotypic characters revealed that this lack of standardization leads to ambiguities, frustrating both the consumers and producers of phenotypic data. Such ambiguities are challenging for biologists, and more so for Artificial Intelligence, to resolve. That survey also indicated a strong interest in a new authoring workflow supported by ontologies to ensure published phenotype data are FAIR (Findable, Accessible, Interoperable, and Reusable) and suitable for large-scale computational analyses. In this article, we introduce a prototype software system designed for authors to produce computational phenotype data. This platform includes a web-based, ontology-enhanced editor for taxonomic characters (Character Recorder), an Ontology Backend holding standardized vocabulary (the Cared Ontology), and a mobile application for resolving ontological conflicts (Conflict Resolver). We present two formal user evaluations of Character Recorder, the main interface authors would interact with to produce FAIR data. The evaluations were conducted with undergraduate biology students and Carex experts. We evaluated Character Recorder against Microsoft Excel on their effectiveness, efficiency, and the cognitive demands of the users in producing computable taxon-by-character matrices. The evaluations showed that Character Recorder is quickly learnable for both student and professional participants, with its cognitive demand comparable to Excel's. Participants agreed that the quality of the data Character Recorder yielded was superior. Students praised Character Recorder's educational value, while Carex experts were keen to recommend it and help evolve it from a prototype into a comprehensive tool. Feature improvements recommended by expert participants have been implemented after the evaluation.

众所周知,在表型处理中词汇的使用往往不一致。对创建或使用表型性状的生物学家的早期调查显示,这种标准化的缺乏导致模棱两可,使表型数据的消费者和生产者都感到沮丧。这样的模糊性对生物学家来说是一个挑战,对人工智能来说更是如此。该调查还表明,人们对由本体支持的新创作工作流有浓厚的兴趣,以确保发布的表型数据是FAIR(可查找、可访问、可互操作和可重用),并适合大规模的计算分析。在本文中,我们介绍了一个原型软件系统,设计为作者产生计算表型数据。这个平台包括一个基于web的、本体增强的分类字符编辑器(Character Recorder)、一个保存标准化词汇表的本体后端(care Ontology)和一个用于解决本体冲突的移动应用程序(Conflict Resolver)。我们介绍了字符记录器的两个正式用户评估,作者将与之交互以产生FAIR数据的主要界面。评估是由生物学本科生和Carex专家进行的。我们针对Microsoft Excel评估了Character Recorder在生成可计算的按字符分类矩阵方面的有效性、效率和用户的认知需求。评价结果表明,无论对学生还是专业参与者来说,Character Recorder都具有较快的学习能力,其认知需求与Excel相当。嘉宾都认为字符记录器所产生的数据质素优良。学生们称赞Character Recorder的教育价值,而Carex的专家们则热衷于推荐它,并帮助它从一个原型发展成为一个全面的工具。专家参与者建议的功能改进已在评估后实施。
{"title":"Helping authors produce FAIR taxonomic data: evaluation of an author-driven phenotype data production prototype.","authors":"Limin Zhang, Julian Starr, Bruce Ford, Anton Reznicek, Yuxuan Zhou, Étienne Léveillé-Bourret, Étienne Lacroix-Carignan, Jacques Cayouette, Tyler W Smith, Donald Sutherland, Paul Catling, Jeffery M Saarela, Hong Cui, James Macklin","doi":"10.1093/database/baae097","DOIUrl":"10.1093/database/baae097","url":null,"abstract":"<p><p>It is well-known that the use of vocabulary in phenotype treatments is often inconsistent. An earlier survey of biologists who create or use phenotypic characters revealed that this lack of standardization leads to ambiguities, frustrating both the consumers and producers of phenotypic data. Such ambiguities are challenging for biologists, and more so for Artificial Intelligence, to resolve. That survey also indicated a strong interest in a new authoring workflow supported by ontologies to ensure published phenotype data are FAIR (Findable, Accessible, Interoperable, and Reusable) and suitable for large-scale computational analyses. In this article, we introduce a prototype software system designed for authors to produce computational phenotype data. This platform includes a web-based, ontology-enhanced editor for taxonomic characters (Character Recorder), an Ontology Backend holding standardized vocabulary (the Cared Ontology), and a mobile application for resolving ontological conflicts (Conflict Resolver). We present two formal user evaluations of Character Recorder, the main interface authors would interact with to produce FAIR data. The evaluations were conducted with undergraduate biology students and Carex experts. We evaluated Character Recorder against Microsoft Excel on their effectiveness, efficiency, and the cognitive demands of the users in producing computable taxon-by-character matrices. The evaluations showed that Character Recorder is quickly learnable for both student and professional participants, with its cognitive demand comparable to Excel's. Participants agreed that the quality of the data Character Recorder yielded was superior. Students praised Character Recorder's educational value, while Carex experts were keen to recommend it and help evolve it from a prototype into a comprehensive tool. Feature improvements recommended by expert participants have been implemented after the evaluation.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11928229/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143064244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Helping authors produce FAIR taxonomic data: evaluation of an author-driven phenotype data production prototype. 帮助作者产生公平的分类数据:作者驱动的表型数据生产原型的评估。
IF 3.4 4区 生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY Pub Date : 2025-01-29 DOI: 10.1093/database/baae097
Limin Zhang, Julian Starr, Bruce Ford, Anton Reznicek, Yuxuan Zhou, Étienne Léveillé-Bourret, Étienne Lacroix-Carignan, Jacques Cayouette, Tyler W Smith, Donald Sutherland, Paul Catling, Jeffery M Saarela, Hong Cui, James Macklin

It is well-known that the use of vocabulary in phenotype treatments is often inconsistent. An earlier survey of biologists who create or use phenotypic characters revealed that this lack of standardization leads to ambiguities, frustrating both the consumers and producers of phenotypic data. Such ambiguities are challenging for biologists, and more so for Artificial Intelligence, to resolve. That survey also indicated a strong interest in a new authoring workflow supported by ontologies to ensure published phenotype data are FAIR (Findable, Accessible, Interoperable, and Reusable) and suitable for large-scale computational analyses. In this article, we introduce a prototype software system designed for authors to produce computational phenotype data. This platform includes a web-based, ontology-enhanced editor for taxonomic characters (Character Recorder), an Ontology Backend holding standardized vocabulary (the Cared Ontology), and a mobile application for resolving ontological conflicts (Conflict Resolver). We present two formal user evaluations of Character Recorder, the main interface authors would interact with to produce FAIR data. The evaluations were conducted with undergraduate biology students and Carex experts. We evaluated Character Recorder against Microsoft Excel on their effectiveness, efficiency, and the cognitive demands of the users in producing computable taxon-by-character matrices. The evaluations showed that Character Recorder is quickly learnable for both student and professional participants, with its cognitive demand comparable to Excel's. Participants agreed that the quality of the data Character Recorder yielded was superior. Students praised Character Recorder's educational value, while Carex experts were keen to recommend it and help evolve it from a prototype into a comprehensive tool. Feature improvements recommended by expert participants have been implemented after the evaluation.

众所周知,在表型处理中词汇的使用往往不一致。对创建或使用表型性状的生物学家的早期调查显示,这种标准化的缺乏导致模棱两可,使表型数据的消费者和生产者都感到沮丧。这样的模糊性对生物学家来说是一个挑战,对人工智能来说更是如此。该调查还表明,人们对由本体支持的新创作工作流有浓厚的兴趣,以确保发布的表型数据是FAIR(可查找、可访问、可互操作和可重用),并适合大规模的计算分析。在本文中,我们介绍了一个原型软件系统,设计为作者产生计算表型数据。这个平台包括一个基于web的、本体增强的分类字符编辑器(Character Recorder)、一个保存标准化词汇表的本体后端(care Ontology)和一个用于解决本体冲突的移动应用程序(Conflict Resolver)。我们介绍了字符记录器的两个正式用户评估,作者将与之交互以产生FAIR数据的主要界面。评估是由生物学本科生和Carex专家进行的。我们针对Microsoft Excel评估了Character Recorder在生成可计算的按字符分类矩阵方面的有效性、效率和用户的认知需求。评价结果表明,无论对学生还是专业参与者来说,Character Recorder都具有较快的学习能力,其认知需求与Excel相当。嘉宾都认为字符记录器所产生的数据质素优良。学生们称赞Character Recorder的教育价值,而Carex的专家们则热衷于推荐它,并帮助它从一个原型发展成为一个全面的工具。专家参与者建议的功能改进已在评估后实施。
{"title":"Helping authors produce FAIR taxonomic data: evaluation of an author-driven phenotype data production prototype.","authors":"Limin Zhang, Julian Starr, Bruce Ford, Anton Reznicek, Yuxuan Zhou, Étienne Léveillé-Bourret, Étienne Lacroix-Carignan, Jacques Cayouette, Tyler W Smith, Donald Sutherland, Paul Catling, Jeffery M Saarela, Hong Cui, James Macklin","doi":"10.1093/database/baae097","DOIUrl":"https://doi.org/10.1093/database/baae097","url":null,"abstract":"<p><p>It is well-known that the use of vocabulary in phenotype treatments is often inconsistent. An earlier survey of biologists who create or use phenotypic characters revealed that this lack of standardization leads to ambiguities, frustrating both the consumers and producers of phenotypic data. Such ambiguities are challenging for biologists, and more so for Artificial Intelligence, to resolve. That survey also indicated a strong interest in a new authoring workflow supported by ontologies to ensure published phenotype data are FAIR (Findable, Accessible, Interoperable, and Reusable) and suitable for large-scale computational analyses. In this article, we introduce a prototype software system designed for authors to produce computational phenotype data. This platform includes a web-based, ontology-enhanced editor for taxonomic characters (Character Recorder), an Ontology Backend holding standardized vocabulary (the Cared Ontology), and a mobile application for resolving ontological conflicts (Conflict Resolver). We present two formal user evaluations of Character Recorder, the main interface authors would interact with to produce FAIR data. The evaluations were conducted with undergraduate biology students and Carex experts. We evaluated Character Recorder against Microsoft Excel on their effectiveness, efficiency, and the cognitive demands of the users in producing computable taxon-by-character matrices. The evaluations showed that Character Recorder is quickly learnable for both student and professional participants, with its cognitive demand comparable to Excel's. Participants agreed that the quality of the data Character Recorder yielded was superior. Students praised Character Recorder's educational value, while Carex experts were keen to recommend it and help evolve it from a prototype into a comprehensive tool. Feature improvements recommended by expert participants have been implemented after the evaluation.</p>","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Database: The Journal of Biological Databases and Curation
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1