Database: The Journal of Biological Databases and Curation最新文献_第9页

LitSumm: large language models for literature summarization of noncoding RNAs. LitSumm：非编码rna文献综述的大型语言模型。

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-02-05 DOI: 10.1093/database/baaf006

Andrew Green, Carlos Eduardo Ribas, Nancy Ontiveros-Palacios, Sam Griffiths-Jones, Anton I Petrov, Alex Bateman, Blake Sweeney

Curation of literature in life sciences is a growing challenge. The continued increase in the rate of publication, coupled with the relatively fixed number of curators worldwide, presents a major challenge to developers of biomedical knowledgebases. Very few knowledgebases have resources to scale to the whole relevant literature and all have to prioritize their efforts. In this work, we take a first step to alleviating the lack of curator time in RNA science by generating summaries of literature for noncoding RNAs using large language models (LLMs). We demonstrate that high-quality, factually accurate summaries with accurate references can be automatically generated from the literature using a commercial LLM and a chain of prompts and checks. Manual assessment was carried out for a subset of summaries, with the majority being rated extremely high quality. We apply our tool to a selection of >4600 ncRNAs and make the generated summaries available via the RNAcentral resource. We conclude that automated literature summarization is feasible with the current generation of LLMs, provided that careful prompting and automated checking are applied. Database URL: https://rnacentral.org/.

生命科学文献的管理是一个越来越大的挑战。出版速度的持续增长，加上世界范围内馆长的数量相对固定，对生物医学知识库的开发人员提出了重大挑战。很少有知识库有资源可以扩展到整个相关文献，并且所有知识库都必须优先考虑他们的努力。在这项工作中，我们通过使用大型语言模型（llm）生成非编码RNA的文献摘要，迈出了缓解RNA科学缺乏管理员时间的第一步。我们证明，使用商业法学硕士和一系列提示和检查，可以从文献中自动生成具有准确参考文献的高质量，事实准确的摘要。对摘要的一个子集进行了人工评估，其中大多数被评为极高质量。我们将我们的工具应用于选择的bb104600个ncrna，并通过rnaccentral资源提供生成的摘要。我们得出的结论是，如果采用仔细的提示和自动检查，自动文献摘要在当前一代法学硕士中是可行的。数据库地址：https://rnacentral.org/。

{"title":"LitSumm: large language models for literature summarization of noncoding RNAs.","authors":"Andrew Green, Carlos Eduardo Ribas, Nancy Ontiveros-Palacios, Sam Griffiths-Jones, Anton I Petrov, Alex Bateman, Blake Sweeney","doi":"10.1093/database/baaf006","DOIUrl":"https://doi.org/10.1093/database/baaf006","url":null,"abstract":"Curation of literature in life sciences is a growing challenge. The continued increase in the rate of publication, coupled with the relatively fixed number of curators worldwide, presents a major challenge to developers of biomedical knowledgebases. Very few knowledgebases have resources to scale to the whole relevant literature and all have to prioritize their efforts. In this work, we take a first step to alleviating the lack of curator time in RNA science by generating summaries of literature for noncoding RNAs using large language models (LLMs). We demonstrate that high-quality, factually accurate summaries with accurate references can be automatically generated from the literature using a commercial LLM and a chain of prompts and checks. Manual assessment was carried out for a subset of summaries, with the majority being rated extremely high quality. We apply our tool to a selection of >4600 ncRNAs and make the generated summaries available via the RNAcentral resource. We conclude that automated literature summarization is feasible with the current generation of LLMs, provided that careful prompting and automated checking are applied. Database URL: https://rnacentral.org/.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Expression of Concern: DisGeNet: a disease-centric interaction database among diseases and various associated genes. 关注表达：DisGeNet：疾病和各种相关基因之间以疾病为中心的相互作用数据库。

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-01-31 DOI: 10.1093/database/baaf007

引用次数: 0

Expression of Concern: DisGeNet: a disease-centric interaction database among diseases and various associated genes. 关注表达：DisGeNet：疾病和各种相关基因之间以疾病为中心的相互作用数据库。

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-01-31 DOI: 10.1093/database/baaf007

引用次数: 0

Expression of Concern: DisGeNet: a disease-centric interaction database among diseases and various associated genes. 关注表达：DisGeNet：疾病和各种相关基因之间以疾病为中心的相互作用数据库。

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-01-31 DOI: 10.1093/database/baaf007

引用次数: 0

Helping authors produce FAIR taxonomic data: evaluation of an author-driven phenotype data production prototype. 帮助作者产生公平的分类数据：作者驱动的表型数据生产原型的评估。

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-01-29 DOI: 10.1093/database/baae097

Limin Zhang, Julian Starr, Bruce Ford, Anton Reznicek, Yuxuan Zhou, Étienne Léveillé-Bourret, Étienne Lacroix-Carignan, Jacques Cayouette, Tyler W Smith, Donald Sutherland, Paul Catling, Jeffery M Saarela, Hong Cui, James Macklin

It is well-known that the use of vocabulary in phenotype treatments is often inconsistent. An earlier survey of biologists who create or use phenotypic characters revealed that this lack of standardization leads to ambiguities, frustrating both the consumers and producers of phenotypic data. Such ambiguities are challenging for biologists, and more so for Artificial Intelligence, to resolve. That survey also indicated a strong interest in a new authoring workflow supported by ontologies to ensure published phenotype data are FAIR (Findable, Accessible, Interoperable, and Reusable) and suitable for large-scale computational analyses. In this article, we introduce a prototype software system designed for authors to produce computational phenotype data. This platform includes a web-based, ontology-enhanced editor for taxonomic characters (Character Recorder), an Ontology Backend holding standardized vocabulary (the Cared Ontology), and a mobile application for resolving ontological conflicts (Conflict Resolver). We present two formal user evaluations of Character Recorder, the main interface authors would interact with to produce FAIR data. The evaluations were conducted with undergraduate biology students and Carex experts. We evaluated Character Recorder against Microsoft Excel on their effectiveness, efficiency, and the cognitive demands of the users in producing computable taxon-by-character matrices. The evaluations showed that Character Recorder is quickly learnable for both student and professional participants, with its cognitive demand comparable to Excel's. Participants agreed that the quality of the data Character Recorder yielded was superior. Students praised Character Recorder's educational value, while Carex experts were keen to recommend it and help evolve it from a prototype into a comprehensive tool. Feature improvements recommended by expert participants have been implemented after the evaluation.

众所周知，在表型处理中词汇的使用往往不一致。对创建或使用表型性状的生物学家的早期调查显示，这种标准化的缺乏导致模棱两可，使表型数据的消费者和生产者都感到沮丧。这样的模糊性对生物学家来说是一个挑战，对人工智能来说更是如此。该调查还表明，人们对由本体支持的新创作工作流有浓厚的兴趣，以确保发布的表型数据是FAIR（可查找、可访问、可互操作和可重用），并适合大规模的计算分析。在本文中，我们介绍了一个原型软件系统，设计为作者产生计算表型数据。这个平台包括一个基于web的、本体增强的分类字符编辑器（Character Recorder）、一个保存标准化词汇表的本体后端（care Ontology）和一个用于解决本体冲突的移动应用程序（Conflict Resolver）。我们介绍了字符记录器的两个正式用户评估，作者将与之交互以产生FAIR数据的主要界面。评估是由生物学本科生和Carex专家进行的。我们针对Microsoft Excel评估了Character Recorder在生成可计算的按字符分类矩阵方面的有效性、效率和用户的认知需求。评价结果表明，无论对学生还是专业参与者来说，Character Recorder都具有较快的学习能力，其认知需求与Excel相当。嘉宾都认为字符记录器所产生的数据质素优良。学生们称赞Character Recorder的教育价值，而Carex的专家们则热衷于推荐它，并帮助它从一个原型发展成为一个全面的工具。专家参与者建议的功能改进已在评估后实施。

{"title":"Helping authors produce FAIR taxonomic data: evaluation of an author-driven phenotype data production prototype.","authors":"Limin Zhang, Julian Starr, Bruce Ford, Anton Reznicek, Yuxuan Zhou, Étienne Léveillé-Bourret, Étienne Lacroix-Carignan, Jacques Cayouette, Tyler W Smith, Donald Sutherland, Paul Catling, Jeffery M Saarela, Hong Cui, James Macklin","doi":"10.1093/database/baae097","DOIUrl":"10.1093/database/baae097","url":null,"abstract":"It is well-known that the use of vocabulary in phenotype treatments is often inconsistent. An earlier survey of biologists who create or use phenotypic characters revealed that this lack of standardization leads to ambiguities, frustrating both the consumers and producers of phenotypic data. Such ambiguities are challenging for biologists, and more so for Artificial Intelligence, to resolve. That survey also indicated a strong interest in a new authoring workflow supported by ontologies to ensure published phenotype data are FAIR (Findable, Accessible, Interoperable, and Reusable) and suitable for large-scale computational analyses. In this article, we introduce a prototype software system designed for authors to produce computational phenotype data. This platform includes a web-based, ontology-enhanced editor for taxonomic characters (Character Recorder), an Ontology Backend holding standardized vocabulary (the Cared Ontology), and a mobile application for resolving ontological conflicts (Conflict Resolver). We present two formal user evaluations of Character Recorder, the main interface authors would interact with to produce FAIR data. The evaluations were conducted with undergraduate biology students and Carex experts. We evaluated Character Recorder against Microsoft Excel on their effectiveness, efficiency, and the cognitive demands of the users in producing computable taxon-by-character matrices. The evaluations showed that Character Recorder is quickly learnable for both student and professional participants, with its cognitive demand comparable to Excel's. Participants agreed that the quality of the data Character Recorder yielded was superior. Students praised Character Recorder's educational value, while Carex experts were keen to recommend it and help evolve it from a prototype into a comprehensive tool. Feature improvements recommended by expert participants have been implemented after the evaluation.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11928229/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143064244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Helping authors produce FAIR taxonomic data: evaluation of an author-driven phenotype data production prototype. 帮助作者产生公平的分类数据：作者驱动的表型数据生产原型的评估。

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-01-29 DOI: 10.1093/database/baae097

Limin Zhang, Julian Starr, Bruce Ford, Anton Reznicek, Yuxuan Zhou, Étienne Léveillé-Bourret, Étienne Lacroix-Carignan, Jacques Cayouette, Tyler W Smith, Donald Sutherland, Paul Catling, Jeffery M Saarela, Hong Cui, James Macklin

It is well-known that the use of vocabulary in phenotype treatments is often inconsistent. An earlier survey of biologists who create or use phenotypic characters revealed that this lack of standardization leads to ambiguities, frustrating both the consumers and producers of phenotypic data. Such ambiguities are challenging for biologists, and more so for Artificial Intelligence, to resolve. That survey also indicated a strong interest in a new authoring workflow supported by ontologies to ensure published phenotype data are FAIR (Findable, Accessible, Interoperable, and Reusable) and suitable for large-scale computational analyses. In this article, we introduce a prototype software system designed for authors to produce computational phenotype data. This platform includes a web-based, ontology-enhanced editor for taxonomic characters (Character Recorder), an Ontology Backend holding standardized vocabulary (the Cared Ontology), and a mobile application for resolving ontological conflicts (Conflict Resolver). We present two formal user evaluations of Character Recorder, the main interface authors would interact with to produce FAIR data. The evaluations were conducted with undergraduate biology students and Carex experts. We evaluated Character Recorder against Microsoft Excel on their effectiveness, efficiency, and the cognitive demands of the users in producing computable taxon-by-character matrices. The evaluations showed that Character Recorder is quickly learnable for both student and professional participants, with its cognitive demand comparable to Excel's. Participants agreed that the quality of the data Character Recorder yielded was superior. Students praised Character Recorder's educational value, while Carex experts were keen to recommend it and help evolve it from a prototype into a comprehensive tool. Feature improvements recommended by expert participants have been implemented after the evaluation.

众所周知，在表型处理中词汇的使用往往不一致。对创建或使用表型性状的生物学家的早期调查显示，这种标准化的缺乏导致模棱两可，使表型数据的消费者和生产者都感到沮丧。这样的模糊性对生物学家来说是一个挑战，对人工智能来说更是如此。该调查还表明，人们对由本体支持的新创作工作流有浓厚的兴趣，以确保发布的表型数据是FAIR（可查找、可访问、可互操作和可重用），并适合大规模的计算分析。在本文中，我们介绍了一个原型软件系统，设计为作者产生计算表型数据。这个平台包括一个基于web的、本体增强的分类字符编辑器（Character Recorder）、一个保存标准化词汇表的本体后端（care Ontology）和一个用于解决本体冲突的移动应用程序（Conflict Resolver）。我们介绍了字符记录器的两个正式用户评估，作者将与之交互以产生FAIR数据的主要界面。评估是由生物学本科生和Carex专家进行的。我们针对Microsoft Excel评估了Character Recorder在生成可计算的按字符分类矩阵方面的有效性、效率和用户的认知需求。评价结果表明，无论对学生还是专业参与者来说，Character Recorder都具有较快的学习能力，其认知需求与Excel相当。嘉宾都认为字符记录器所产生的数据质素优良。学生们称赞Character Recorder的教育价值，而Carex的专家们则热衷于推荐它，并帮助它从一个原型发展成为一个全面的工具。专家参与者建议的功能改进已在评估后实施。

{"title":"Helping authors produce FAIR taxonomic data: evaluation of an author-driven phenotype data production prototype.","authors":"Limin Zhang, Julian Starr, Bruce Ford, Anton Reznicek, Yuxuan Zhou, Étienne Léveillé-Bourret, Étienne Lacroix-Carignan, Jacques Cayouette, Tyler W Smith, Donald Sutherland, Paul Catling, Jeffery M Saarela, Hong Cui, James Macklin","doi":"10.1093/database/baae097","DOIUrl":"https://doi.org/10.1093/database/baae097","url":null,"abstract":"It is well-known that the use of vocabulary in phenotype treatments is often inconsistent. An earlier survey of biologists who create or use phenotypic characters revealed that this lack of standardization leads to ambiguities, frustrating both the consumers and producers of phenotypic data. Such ambiguities are challenging for biologists, and more so for Artificial Intelligence, to resolve. That survey also indicated a strong interest in a new authoring workflow supported by ontologies to ensure published phenotype data are FAIR (Findable, Accessible, Interoperable, and Reusable) and suitable for large-scale computational analyses. In this article, we introduce a prototype software system designed for authors to produce computational phenotype data. This platform includes a web-based, ontology-enhanced editor for taxonomic characters (Character Recorder), an Ontology Backend holding standardized vocabulary (the Cared Ontology), and a mobile application for resolving ontological conflicts (Conflict Resolver). We present two formal user evaluations of Character Recorder, the main interface authors would interact with to produce FAIR data. The evaluations were conducted with undergraduate biology students and Carex experts. We evaluated Character Recorder against Microsoft Excel on their effectiveness, efficiency, and the cognitive demands of the users in producing computable taxon-by-character matrices. The evaluations showed that Character Recorder is quickly learnable for both student and professional participants, with its cognitive demand comparable to Excel's. Participants agreed that the quality of the data Character Recorder yielded was superior. Students praised Character Recorder's educational value, while Carex experts were keen to recommend it and help evolve it from a prototype into a comprehensive tool. Feature improvements recommended by expert participants have been implemented after the evaluation.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BbGSD: Black-boned Sheep Genome SNP Database. 黑骨羊基因组SNP数据库。

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-01-28 DOI: 10.1093/database/baaf004

Chunjuan He, Lichang Chen, Juntao Cao, Yuqing Zhong, Zhendong Gao, Weidong Deng, Jiajin Zhang

Lanping black-boned (LPBB) sheep are a unique and rare ruminant species, characterized by black pigmentation in the skin and internal organs. Thus far, LPBB are the only known animal with heritable melanin characteristics besides the black-boned chicken, and the only mammal known to contain a large amount of melanin in the body. LPBB have therefore attracted substantial research attention, due to their potential contribution to medicine. However, long periods of grazing freely and crossbreeding with Lanping normal sheep (LPN) have diluted LPBB breeding resources, posing a challenge to the protection of species. To ensure the effective conservation and management of LPBB genetic resources, the construction of a large-scale database of genotypic information is therefore very important. To achieve this, we established the first LPBB-specific SNP database, named Black-boned Sheep Genome SNP Database (BbGSD, http://202.203.179.115:3838/oarsnpdb) using sheep genotype data (100 LPBB and 50 LPN) across 46 894 242 SNP sites. In this database, we implemented four main function modules: (i) the "LD heatmap" module, which uses a heatmap to enable the interactive visualization of pairwise linkage disequilibrium (LD) measurements between SNPs; (ii) the "SNP distribution" module, which allows users to interactively visualize tabular genotype data as heat maps; (iii) the "Phylogenetics" module which enables phylogenetic analysis to explore the evolutionary history or genetic relationships of the LPBB sheep; and the "Diversity" module, which can be used to calculate and display the nucleotide diversity among sheep populations in user-specified genomic regions. BbGSD is essential for accelerating studies on the functional genomics and screening of molecular markers of molecular-assisted breeding in black-boned sheep. Database URL: http://202.203.179.115:3838/oarsnpdb.

兰平黑骨羊是一种独特而稀有的反刍动物，其特征是皮肤和内脏的黑色色素沉着。到目前为止，LPBB是除了黑骨鸡之外唯一已知的具有遗传性黑色素特征的动物，也是唯一已知的体内含有大量黑色素的哺乳动物。因此，由于其对医学的潜在贡献，LPBB吸引了大量的研究关注。然而，长期的自由放牧和与兰平正常羊的杂交使兰平正常羊的育种资源被稀释，给物种保护带来了挑战。因此，为了保证lppb遗传资源的有效保护和管理，构建大规模的基因型信息数据库是非常重要的。为了实现这一目标，我们利用46 894 242个SNP位点的羊基因型数据（100个lbbb和50个LPN）建立了第一个lbbb特异性SNP数据库，命名为黑骨羊基因组SNP数据库（BbGSD, http://202.203.179.115:3838/oarsnpdb）。在该数据库中，我们实现了四个主要功能模块：(i)“LD热图”模块，该模块使用热图实现snp之间成对连锁不平衡（LD）测量的交互式可视化；（ii）“SNP分布”模块，允许用户以热图的形式交互式地可视化表格式基因型数据；（iii）“系统发育”模块，该模块使系统发育分析能够探索LPBB羊的进化史或遗传关系；“多样性”模块，可用于计算和显示用户指定基因组区域的绵羊群体之间的核苷酸多样性。BbGSD对于加快黑骨羊分子辅助育种的功能基因组学研究和分子标记筛选具有重要意义。数据库地址：http://202.203.179.115:3838/oarsnpdb。

{"title":"BbGSD: Black-boned Sheep Genome SNP Database.","authors":"Chunjuan He, Lichang Chen, Juntao Cao, Yuqing Zhong, Zhendong Gao, Weidong Deng, Jiajin Zhang","doi":"10.1093/database/baaf004","DOIUrl":"https://doi.org/10.1093/database/baaf004","url":null,"abstract":"Lanping black-boned (LPBB) sheep are a unique and rare ruminant species, characterized by black pigmentation in the skin and internal organs. Thus far, LPBB are the only known animal with heritable melanin characteristics besides the black-boned chicken, and the only mammal known to contain a large amount of melanin in the body. LPBB have therefore attracted substantial research attention, due to their potential contribution to medicine. However, long periods of grazing freely and crossbreeding with Lanping normal sheep (LPN) have diluted LPBB breeding resources, posing a challenge to the protection of species. To ensure the effective conservation and management of LPBB genetic resources, the construction of a large-scale database of genotypic information is therefore very important. To achieve this, we established the first LPBB-specific SNP database, named Black-boned Sheep Genome SNP Database (BbGSD, http://202.203.179.115:3838/oarsnpdb) using sheep genotype data (100 LPBB and 50 LPN) across 46 894 242 SNP sites. In this database, we implemented four main function modules: (i) the \"LD heatmap\" module, which uses a heatmap to enable the interactive visualization of pairwise linkage disequilibrium (LD) measurements between SNPs; (ii) the \"SNP distribution\" module, which allows users to interactively visualize tabular genotype data as heat maps; (iii) the \"Phylogenetics\" module which enables phylogenetic analysis to explore the evolutionary history or genetic relationships of the LPBB sheep; and the \"Diversity\" module, which can be used to calculate and display the nucleotide diversity among sheep populations in user-specified genomic regions. BbGSD is essential for accelerating studies on the functional genomics and screening of molecular markers of molecular-assisted breeding in black-boned sheep. Database URL: http://202.203.179.115:3838/oarsnpdb.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BbGSD: Black-boned Sheep Genome SNP Database. 黑骨羊基因组SNP数据库。

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-01-28 DOI: 10.1093/database/baaf004

Chunjuan He, Lichang Chen, Juntao Cao, Yuqing Zhong, Zhendong Gao, Weidong Deng, Jiajin Zhang

Lanping black-boned (LPBB) sheep are a unique and rare ruminant species, characterized by black pigmentation in the skin and internal organs. Thus far, LPBB are the only known animal with heritable melanin characteristics besides the black-boned chicken, and the only mammal known to contain a large amount of melanin in the body. LPBB have therefore attracted substantial research attention, due to their potential contribution to medicine. However, long periods of grazing freely and crossbreeding with Lanping normal sheep (LPN) have diluted LPBB breeding resources, posing a challenge to the protection of species. To ensure the effective conservation and management of LPBB genetic resources, the construction of a large-scale database of genotypic information is therefore very important. To achieve this, we established the first LPBB-specific SNP database, named Black-boned Sheep Genome SNP Database (BbGSD, http://202.203.179.115:3838/oarsnpdb) using sheep genotype data (100 LPBB and 50 LPN) across 46 894 242 SNP sites. In this database, we implemented four main function modules: (i) the "LD heatmap" module, which uses a heatmap to enable the interactive visualization of pairwise linkage disequilibrium (LD) measurements between SNPs; (ii) the "SNP distribution" module, which allows users to interactively visualize tabular genotype data as heat maps; (iii) the "Phylogenetics" module which enables phylogenetic analysis to explore the evolutionary history or genetic relationships of the LPBB sheep; and the "Diversity" module, which can be used to calculate and display the nucleotide diversity among sheep populations in user-specified genomic regions. BbGSD is essential for accelerating studies on the functional genomics and screening of molecular markers of molecular-assisted breeding in black-boned sheep. Database URL: http://202.203.179.115:3838/oarsnpdb.

兰平黑骨羊是一种独特而稀有的反刍动物，其特征是皮肤和内脏的黑色色素沉着。到目前为止，LPBB是除了黑骨鸡之外唯一已知的具有遗传性黑色素特征的动物，也是唯一已知的体内含有大量黑色素的哺乳动物。因此，由于其对医学的潜在贡献，LPBB吸引了大量的研究关注。然而，长期的自由放牧和与兰平正常羊的杂交使兰平正常羊的育种资源被稀释，给物种保护带来了挑战。因此，为了保证lppb遗传资源的有效保护和管理，构建大规模的基因型信息数据库是非常重要的。为了实现这一目标，我们利用46 894 242个SNP位点的羊基因型数据（100个lbbb和50个LPN）建立了第一个lbbb特异性SNP数据库，命名为黑骨羊基因组SNP数据库（BbGSD, http://202.203.179.115:3838/oarsnpdb）。在该数据库中，我们实现了四个主要功能模块：(i)“LD热图”模块，该模块使用热图实现snp之间成对连锁不平衡（LD）测量的交互式可视化；（ii）“SNP分布”模块，允许用户以热图的形式交互式地可视化表格式基因型数据；（iii）“系统发育”模块，该模块使系统发育分析能够探索LPBB羊的进化史或遗传关系；“多样性”模块，可用于计算和显示用户指定基因组区域的绵羊群体之间的核苷酸多样性。BbGSD对于加快黑骨羊分子辅助育种的功能基因组学研究和分子标记筛选具有重要意义。数据库地址：http://202.203.179.115:3838/oarsnpdb。

{"title":"BbGSD: Black-boned Sheep Genome SNP Database.","authors":"Chunjuan He, Lichang Chen, Juntao Cao, Yuqing Zhong, Zhendong Gao, Weidong Deng, Jiajin Zhang","doi":"10.1093/database/baaf004","DOIUrl":"10.1093/database/baaf004","url":null,"abstract":"Lanping black-boned (LPBB) sheep are a unique and rare ruminant species, characterized by black pigmentation in the skin and internal organs. Thus far, LPBB are the only known animal with heritable melanin characteristics besides the black-boned chicken, and the only mammal known to contain a large amount of melanin in the body. LPBB have therefore attracted substantial research attention, due to their potential contribution to medicine. However, long periods of grazing freely and crossbreeding with Lanping normal sheep (LPN) have diluted LPBB breeding resources, posing a challenge to the protection of species. To ensure the effective conservation and management of LPBB genetic resources, the construction of a large-scale database of genotypic information is therefore very important. To achieve this, we established the first LPBB-specific SNP database, named Black-boned Sheep Genome SNP Database (BbGSD, http://202.203.179.115:3838/oarsnpdb) using sheep genotype data (100 LPBB and 50 LPN) across 46 894 242 SNP sites. In this database, we implemented four main function modules: (i) the \"LD heatmap\" module, which uses a heatmap to enable the interactive visualization of pairwise linkage disequilibrium (LD) measurements between SNPs; (ii) the \"SNP distribution\" module, which allows users to interactively visualize tabular genotype data as heat maps; (iii) the \"Phylogenetics\" module which enables phylogenetic analysis to explore the evolutionary history or genetic relationships of the LPBB sheep; and the \"Diversity\" module, which can be used to calculate and display the nucleotide diversity among sheep populations in user-specified genomic regions. BbGSD is essential for accelerating studies on the functional genomics and screening of molecular markers of molecular-assisted breeding in black-boned sheep. Database URL: http://202.203.179.115:3838/oarsnpdb.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11774206/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143058345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The TOXIN knowledge graph: supporting animal-free risk assessment of cosmetics. 毒素知识图谱：支持化妆品的无动物风险评估。

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-01-28 DOI: 10.1093/database/baae121

Sara Sepehri, Anja Heymans, Dinja De Win, Jan Maushagen, Audrey Sanctorum, Christophe Debruyne, Robim M Rodrigues, Joery De Kock, Vera Rogiers, Olga De Troyer, Tamara Vanhaecke

The European Union's ban on animal testing for cosmetic products and their ingredients, combined with the lack of validated animal-free methods, poses challenges in evaluating their potential repeated-dose organ toxicity. To address this, innovative strategies like Next-Generation Risk Assessment (NGRA) are being explored, integrating historical animal data with new mechanistic insights from non-animal New Approach Methodologies (NAMs). This paper introduces the TOXIN knowledge graph (TOXIN KG), a tool designed to retrieve toxicological information on cosmetic ingredients, with a focus on liver-related data. TOXIN KG uses graph-structured semantic technology and integrates toxicological data through ontologies, ensuring interoperable representation. The primary data source is safety information on cosmetic ingredients from scientific opinions issued by the Scientific Committee on Consumer Safety between 2009 and 2019. The ToxRTool automates the reliability assessment of toxicity studies, while the Simplified Molecular Input Line Entry System (SMILES) notation standardizes chemical identification, enabling in silico prediction of repeated-dose toxicity via the implementation of the Organization for Economic Co-operation and Development Quantitative Structure-Activity Relationship Toolbox (OECD QSAR Toolbox). The ToXic Process Ontology, enriched with relevant biological repositories, is employed to represent toxicological concepts systematically. Search filters allow the identification of cosmetic compounds potentially linked to liver toxicity. Data visualization is achieved through Ontodia, a JavaScript library. TOXIN KG, filled with information for 88 cosmetic ingredients, allowed us to identify 53 compounds affecting at least one liver toxicity parameter in a 90-day repeated-dose animal study. For one compound, we illustrate how TOXIN KG links this observation to hepatic cholestasis as an adverse outcome. In an ab initio NGRA context, follow-up in vitro studies using human-based NAMs would be necessary to understand the compound's biological activity and the molecular mechanism leading to the adverse effect. In summary, TOXIN KG emerges as a valuable tool for advancing the reusability of cosmetics safety data, providing knowledge in support of NAM-based hazard and risk assessments. Database URL: https://toxin-search.netlify.app/.

欧盟禁止对化妆品及其成分进行动物实验，再加上缺乏经过验证的无动物实验方法，这给评估化妆品潜在的重复给药器官毒性带来了挑战。为了解决这个问题，人们正在探索下一代风险评估（NGRA）等创新策略，将历史动物数据与来自非动物新方法方法论（NAMs）的新机制见解相结合。本文介绍了毒素知识图谱（TOXIN KG），一个用于检索化妆品成分毒理学信息的工具，重点是肝脏相关数据。毒素KG使用图结构语义技术，并通过本体集成毒理学数据，确保可互操作表示。主要数据来源是2009年至2019年消费者安全科学委员会发布的科学意见中关于化妆品成分的安全信息。ToxRTool自动化了毒性研究的可靠性评估，而简化分子输入线输入系统（SMILES）符号标准化了化学鉴定，通过实施经济合作与发展组织定量结构-活性关系工具箱（OECD QSAR工具箱），实现了重复剂量毒性的计算机预测。毒性过程本体，丰富了相关的生物资源库，被用来系统地表示毒理学概念。搜索过滤器允许识别可能与肝毒性有关的化妆品化合物。数据可视化是通过JavaScript库Ontodia实现的。毒素KG含有88种化妆品成分的信息，使我们能够在90天的重复给药动物研究中确定53种影响至少一种肝脏毒性参数的化合物。对于一种化合物，我们说明了毒素KG如何将这种观察与肝脏胆汁淤积作为不利结果联系起来。在从头开始的NGRA背景下，有必要使用基于人的NAMs进行后续的体外研究，以了解该化合物的生物活性和导致不良反应的分子机制。总之，毒素KG是促进化妆品安全数据可重复使用的宝贵工具，为支持基于nama的危害和风险评估提供了知识。数据库地址：https://toxin-search.netlify.app/。

{"title":"The TOXIN knowledge graph: supporting animal-free risk assessment of cosmetics.","authors":"Sara Sepehri, Anja Heymans, Dinja De Win, Jan Maushagen, Audrey Sanctorum, Christophe Debruyne, Robim M Rodrigues, Joery De Kock, Vera Rogiers, Olga De Troyer, Tamara Vanhaecke","doi":"10.1093/database/baae121","DOIUrl":"10.1093/database/baae121","url":null,"abstract":"The European Union's ban on animal testing for cosmetic products and their ingredients, combined with the lack of validated animal-free methods, poses challenges in evaluating their potential repeated-dose organ toxicity. To address this, innovative strategies like Next-Generation Risk Assessment (NGRA) are being explored, integrating historical animal data with new mechanistic insights from non-animal New Approach Methodologies (NAMs). This paper introduces the TOXIN knowledge graph (TOXIN KG), a tool designed to retrieve toxicological information on cosmetic ingredients, with a focus on liver-related data. TOXIN KG uses graph-structured semantic technology and integrates toxicological data through ontologies, ensuring interoperable representation. The primary data source is safety information on cosmetic ingredients from scientific opinions issued by the Scientific Committee on Consumer Safety between 2009 and 2019. The ToxRTool automates the reliability assessment of toxicity studies, while the Simplified Molecular Input Line Entry System (SMILES) notation standardizes chemical identification, enabling in silico prediction of repeated-dose toxicity via the implementation of the Organization for Economic Co-operation and Development Quantitative Structure-Activity Relationship Toolbox (OECD QSAR Toolbox). The ToXic Process Ontology, enriched with relevant biological repositories, is employed to represent toxicological concepts systematically. Search filters allow the identification of cosmetic compounds potentially linked to liver toxicity. Data visualization is achieved through Ontodia, a JavaScript library. TOXIN KG, filled with information for 88 cosmetic ingredients, allowed us to identify 53 compounds affecting at least one liver toxicity parameter in a 90-day repeated-dose animal study. For one compound, we illustrate how TOXIN KG links this observation to hepatic cholestasis as an adverse outcome. In an ab initio NGRA context, follow-up in vitro studies using human-based NAMs would be necessary to understand the compound's biological activity and the molecular mechanism leading to the adverse effect. In summary, TOXIN KG emerges as a valuable tool for advancing the reusability of cosmetics safety data, providing knowledge in support of NAM-based hazard and risk assessments. Database URL: https://toxin-search.netlify.app/.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11776536/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143064250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The TOXIN knowledge graph: supporting animal-free risk assessment of cosmetics. 毒素知识图谱：支持化妆品的无动物风险评估。

IF 3.4 4区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Database: The Journal of Biological Databases and Curation

Pub Date : 2025-01-28 DOI: 10.1093/database/baae121

Sara Sepehri, Anja Heymans, Dinja De Win, Jan Maushagen, Audrey Sanctorum, Christophe Debruyne, Robim M Rodrigues, Joery De Kock, Vera Rogiers, Olga De Troyer, Tamara Vanhaecke

The European Union's ban on animal testing for cosmetic products and their ingredients, combined with the lack of validated animal-free methods, poses challenges in evaluating their potential repeated-dose organ toxicity. To address this, innovative strategies like Next-Generation Risk Assessment (NGRA) are being explored, integrating historical animal data with new mechanistic insights from non-animal New Approach Methodologies (NAMs). This paper introduces the TOXIN knowledge graph (TOXIN KG), a tool designed to retrieve toxicological information on cosmetic ingredients, with a focus on liver-related data. TOXIN KG uses graph-structured semantic technology and integrates toxicological data through ontologies, ensuring interoperable representation. The primary data source is safety information on cosmetic ingredients from scientific opinions issued by the Scientific Committee on Consumer Safety between 2009 and 2019. The ToxRTool automates the reliability assessment of toxicity studies, while the Simplified Molecular Input Line Entry System (SMILES) notation standardizes chemical identification, enabling in silico prediction of repeated-dose toxicity via the implementation of the Organization for Economic Co-operation and Development Quantitative Structure-Activity Relationship Toolbox (OECD QSAR Toolbox). The ToXic Process Ontology, enriched with relevant biological repositories, is employed to represent toxicological concepts systematically. Search filters allow the identification of cosmetic compounds potentially linked to liver toxicity. Data visualization is achieved through Ontodia, a JavaScript library. TOXIN KG, filled with information for 88 cosmetic ingredients, allowed us to identify 53 compounds affecting at least one liver toxicity parameter in a 90-day repeated-dose animal study. For one compound, we illustrate how TOXIN KG links this observation to hepatic cholestasis as an adverse outcome. In an ab initio NGRA context, follow-up in vitro studies using human-based NAMs would be necessary to understand the compound's biological activity and the molecular mechanism leading to the adverse effect. In summary, TOXIN KG emerges as a valuable tool for advancing the reusability of cosmetics safety data, providing knowledge in support of NAM-based hazard and risk assessments. Database URL: https://toxin-search.netlify.app/.

欧盟禁止对化妆品及其成分进行动物实验，再加上缺乏经过验证的无动物实验方法，这给评估化妆品潜在的重复给药器官毒性带来了挑战。为了解决这个问题，人们正在探索下一代风险评估（NGRA）等创新策略，将历史动物数据与来自非动物新方法方法论（NAMs）的新机制见解相结合。本文介绍了毒素知识图谱（TOXIN KG），一个用于检索化妆品成分毒理学信息的工具，重点是肝脏相关数据。毒素KG使用图结构语义技术，并通过本体集成毒理学数据，确保可互操作表示。主要数据来源是2009年至2019年消费者安全科学委员会发布的科学意见中关于化妆品成分的安全信息。ToxRTool自动化了毒性研究的可靠性评估，而简化分子输入线输入系统（SMILES）符号标准化了化学鉴定，通过实施经济合作与发展组织定量结构-活性关系工具箱（OECD QSAR工具箱），实现了重复剂量毒性的计算机预测。毒性过程本体，丰富了相关的生物资源库，被用来系统地表示毒理学概念。搜索过滤器允许识别可能与肝毒性有关的化妆品化合物。数据可视化是通过JavaScript库Ontodia实现的。毒素KG含有88种化妆品成分的信息，使我们能够在90天的重复给药动物研究中确定53种影响至少一种肝脏毒性参数的化合物。对于一种化合物，我们说明了毒素KG如何将这种观察与肝脏胆汁淤积作为不利结果联系起来。在从头开始的NGRA背景下，有必要使用基于人的NAMs进行后续的体外研究，以了解该化合物的生物活性和导致不良反应的分子机制。总之，毒素KG是促进化妆品安全数据可重复使用的宝贵工具，为支持基于nama的危害和风险评估提供了知识。数据库地址：https://toxin-search.netlify.app/。

{"title":"The TOXIN knowledge graph: supporting animal-free risk assessment of cosmetics.","authors":"Sara Sepehri, Anja Heymans, Dinja De Win, Jan Maushagen, Audrey Sanctorum, Christophe Debruyne, Robim M Rodrigues, Joery De Kock, Vera Rogiers, Olga De Troyer, Tamara Vanhaecke","doi":"10.1093/database/baae121","DOIUrl":"https://doi.org/10.1093/database/baae121","url":null,"abstract":"The European Union's ban on animal testing for cosmetic products and their ingredients, combined with the lack of validated animal-free methods, poses challenges in evaluating their potential repeated-dose organ toxicity. To address this, innovative strategies like Next-Generation Risk Assessment (NGRA) are being explored, integrating historical animal data with new mechanistic insights from non-animal New Approach Methodologies (NAMs). This paper introduces the TOXIN knowledge graph (TOXIN KG), a tool designed to retrieve toxicological information on cosmetic ingredients, with a focus on liver-related data. TOXIN KG uses graph-structured semantic technology and integrates toxicological data through ontologies, ensuring interoperable representation. The primary data source is safety information on cosmetic ingredients from scientific opinions issued by the Scientific Committee on Consumer Safety between 2009 and 2019. The ToxRTool automates the reliability assessment of toxicity studies, while the Simplified Molecular Input Line Entry System (SMILES) notation standardizes chemical identification, enabling in silico prediction of repeated-dose toxicity via the implementation of the Organization for Economic Co-operation and Development Quantitative Structure-Activity Relationship Toolbox (OECD QSAR Toolbox). The ToXic Process Ontology, enriched with relevant biological repositories, is employed to represent toxicological concepts systematically. Search filters allow the identification of cosmetic compounds potentially linked to liver toxicity. Data visualization is achieved through Ontodia, a JavaScript library. TOXIN KG, filled with information for 88 cosmetic ingredients, allowed us to identify 53 compounds affecting at least one liver toxicity parameter in a 90-day repeated-dose animal study. For one compound, we illustrate how TOXIN KG links this observation to hepatic cholestasis as an adverse outcome. In an ab initio NGRA context, follow-up in vitro studies using human-based NAMs would be necessary to understand the compound's biological activity and the molecular mechanism leading to the adverse effect. In summary, TOXIN KG emerges as a valuable tool for advancing the reusability of cosmetics safety data, providing knowledge in support of NAM-based hazard and risk assessments. Database URL: https://toxin-search.netlify.app/.","PeriodicalId":10923,"journal":{"name":"Database: The Journal of Biological Databases and Curation","volume":"2025 ","pages":""},"PeriodicalIF":3.4,"publicationDate":"2025-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0