Harmonizome 3.0：来自不同多组学资源的基因和蛋白质综合知识

IF 16.6 2区生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY Nucleic Acids Research Pub Date : 2024-11-20 DOI:10.1093/nar/gkae1080

Ido Diamant, Daniel J B Clarke, John Erol Evangelista, Nathania Lingam, Avi Ma’ayan

{"title":"Harmonizome 3.0：来自不同多组学资源的基因和蛋白质综合知识","authors":"Ido Diamant, Daniel J B Clarke, John Erol Evangelista, Nathania Lingam, Avi Ma’ayan","doi":"10.1093/nar/gkae1080","DOIUrl":null,"url":null,"abstract":"By processing and abstracting diverse omics datasets into associations between genes and their attributes, the Harmonizome database enables researchers to explore and integrate knowledge about human genes from many central omics resources. Here, we introduce Harmonizome 3.0, a significant upgrade to the original Harmonizome database. The upgrade adds 26 datasets that contribute nearly 12 million associations between genes and various attribute types such as cells and tissues, diseases, and pathways. The upgrade has a dataset crossing feature to identify gene modules that are shared across datasets. To further explain significantly high gene set overlap between dataset pairs, a large language model (LLM) composes a paragraph that speculates about the reasons behind the high overlap. The upgrade also adds more data formats and visualization options. Datasets are downloadable as knowledge graph (KG) assertions and visualized with Uniform Manifold Approximation and Projection (UMAP) plots. The KG assertions can be explored via a user interface that visualizes gene–attribute associations as ball-and-stick diagrams. Overall, Harmonizome 3.0 is a rich resource of processed omics datasets that are provided in several AI-ready formats. Harmonizome 3.0 is available at https://maayanlab.cloud/Harmonizome/.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"250 1","pages":""},"PeriodicalIF":16.6000,"publicationDate":"2024-11-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Harmonizome 3.0: integrated knowledge about genes and proteins from diverse multi-omics resources\",\"authors\":\"Ido Diamant, Daniel J B Clarke, John Erol Evangelista, Nathania Lingam, Avi Ma’ayan\",\"doi\":\"10.1093/nar/gkae1080\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"By processing and abstracting diverse omics datasets into associations between genes and their attributes, the Harmonizome database enables researchers to explore and integrate knowledge about human genes from many central omics resources. Here, we introduce Harmonizome 3.0, a significant upgrade to the original Harmonizome database. The upgrade adds 26 datasets that contribute nearly 12 million associations between genes and various attribute types such as cells and tissues, diseases, and pathways. The upgrade has a dataset crossing feature to identify gene modules that are shared across datasets. To further explain significantly high gene set overlap between dataset pairs, a large language model (LLM) composes a paragraph that speculates about the reasons behind the high overlap. The upgrade also adds more data formats and visualization options. Datasets are downloadable as knowledge graph (KG) assertions and visualized with Uniform Manifold Approximation and Projection (UMAP) plots. The KG assertions can be explored via a user interface that visualizes gene–attribute associations as ball-and-stick diagrams. Overall, Harmonizome 3.0 is a rich resource of processed omics datasets that are provided in several AI-ready formats. Harmonizome 3.0 is available at https://maayanlab.cloud/Harmonizome/.\",\"PeriodicalId\":19471,\"journal\":{\"name\":\"Nucleic Acids Research\",\"volume\":\"250 1\",\"pages\":\"\"},\"PeriodicalIF\":16.6000,\"publicationDate\":\"2024-11-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Nucleic Acids Research\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1093/nar/gkae1080\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nucleic Acids Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/nar/gkae1080","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

Harmonizome 数据库通过将各种 omics 数据集处理和抽象为基因与其属性之间的关联，使研究人员能够从许多中央 omics 资源中探索和整合有关人类基因的知识。在此，我们介绍 Harmonizome 3.0，这是对原始 Harmonizome 数据库的重大升级。此次升级增加了 26 个数据集，这些数据集提供了近 1200 万个基因与细胞和组织、疾病和通路等各种属性类型之间的关联。升级版具有数据集交叉功能，可识别跨数据集共享的基因模块。为了进一步解释数据集对之间基因组的高重合度，一个大型语言模型（LLM）会撰写一段文字，推测高重合度背后的原因。此次升级还增加了更多数据格式和可视化选项。数据集可以知识图谱（KG）断言的形式下载，并通过统一表层逼近和投影（UMAP）图进行可视化。可通过用户界面探索 KG 断言，该界面将基因属性关联可视化为球棍图。总之，Harmonizome 3.0 是一个包含丰富的经处理的 omics 数据集的资源库，以多种 AI 就绪格式提供。Harmonizome 3.0 可在 https://maayanlab.cloud/Harmonizome/ 上获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Harmonizome 3.0: integrated knowledge about genes and proteins from diverse multi-omics resources

By processing and abstracting diverse omics datasets into associations between genes and their attributes, the Harmonizome database enables researchers to explore and integrate knowledge about human genes from many central omics resources. Here, we introduce Harmonizome 3.0, a significant upgrade to the original Harmonizome database. The upgrade adds 26 datasets that contribute nearly 12 million associations between genes and various attribute types such as cells and tissues, diseases, and pathways. The upgrade has a dataset crossing feature to identify gene modules that are shared across datasets. To further explain significantly high gene set overlap between dataset pairs, a large language model (LLM) composes a paragraph that speculates about the reasons behind the high overlap. The upgrade also adds more data formats and visualization options. Datasets are downloadable as knowledge graph (KG) assertions and visualized with Uniform Manifold Approximation and Projection (UMAP) plots. The KG assertions can be explored via a user interface that visualizes gene–attribute associations as ball-and-stick diagrams. Overall, Harmonizome 3.0 is a rich resource of processed omics datasets that are provided in several AI-ready formats. Harmonizome 3.0 is available at https://maayanlab.cloud/Harmonizome/.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Nucleic Acids Research 生物-生化与分子生物学

CiteScore

27.10

自引率

4.70%

发文量

1057

审稿时长

2 months

期刊介绍： Nucleic Acids Research (NAR) is a scientific journal that publishes research on various aspects of nucleic acids and proteins involved in nucleic acid metabolism and interactions. It covers areas such as chemistry and synthetic biology, computational biology, gene regulation, chromatin and epigenetics, genome integrity, repair and replication, genomics, molecular biology, nucleic acid enzymes, RNA, and structural biology. The journal also includes a Survey and Summary section for brief reviews. Additionally, each year, the first issue is dedicated to biological databases, and an issue in July focuses on web-based software resources for the biological community. Nucleic Acids Research is indexed by several services including Abstracts on Hygiene and Communicable Diseases, Animal Breeding Abstracts, Agricultural Engineering Abstracts, Agbiotech News and Information, BIOSIS Previews, CAB Abstracts, and EMBASE.