Ksenia Sokolova, Kathleen M. Chen, Yun Hao, Jian Zhou, Olga G. Troyanskaya
{"title":"转录调控的深度学习序列模型","authors":"Ksenia Sokolova, Kathleen M. Chen, Yun Hao, Jian Zhou, Olga G. Troyanskaya","doi":"10.1146/annurev-genom-021623-024727","DOIUrl":null,"url":null,"abstract":"Deciphering the regulatory code of gene expression and interpreting the transcriptional effects of genome variation are critical challenges in human genetics. Modern experimental technologies have resulted in an abundance of data, enabling the development of sequence-based deep learning models that link patterns embedded in DNA to the biochemical and regulatory properties contributing to transcriptional regulation, including modeling epigenetic marks, 3D genome organization, and gene expression, with tissue and cell-type specificity. Such methods can predict the functional consequences of any noncoding variant in the human genome, even rare or never-before-observed variants, and systematically characterize their consequences beyond what is tractable from experiments or quantitative genetics studies alone. Recently, the development and application of interpretability approaches have led to the identification of key sequence patterns contributing to the predicted tasks, providing insights into the underlying biological mechanisms learned and revealing opportunities for improvement in future models.","PeriodicalId":8231,"journal":{"name":"Annual review of genomics and human genetics","volume":"98 1","pages":""},"PeriodicalIF":7.7000,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Deep Learning Sequence Models for Transcriptional Regulation\",\"authors\":\"Ksenia Sokolova, Kathleen M. Chen, Yun Hao, Jian Zhou, Olga G. Troyanskaya\",\"doi\":\"10.1146/annurev-genom-021623-024727\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deciphering the regulatory code of gene expression and interpreting the transcriptional effects of genome variation are critical challenges in human genetics. Modern experimental technologies have resulted in an abundance of data, enabling the development of sequence-based deep learning models that link patterns embedded in DNA to the biochemical and regulatory properties contributing to transcriptional regulation, including modeling epigenetic marks, 3D genome organization, and gene expression, with tissue and cell-type specificity. Such methods can predict the functional consequences of any noncoding variant in the human genome, even rare or never-before-observed variants, and systematically characterize their consequences beyond what is tractable from experiments or quantitative genetics studies alone. Recently, the development and application of interpretability approaches have led to the identification of key sequence patterns contributing to the predicted tasks, providing insights into the underlying biological mechanisms learned and revealing opportunities for improvement in future models.\",\"PeriodicalId\":8231,\"journal\":{\"name\":\"Annual review of genomics and human genetics\",\"volume\":\"98 1\",\"pages\":\"\"},\"PeriodicalIF\":7.7000,\"publicationDate\":\"2024-04-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Annual review of genomics and human genetics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1146/annurev-genom-021623-024727\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual review of genomics and human genetics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1146/annurev-genom-021623-024727","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0
摘要
破译基因表达的调控密码和解读基因组变异的转录效应是人类遗传学面临的关键挑战。现代实验技术产生了大量的数据,使基于序列的深度学习模型得以开发,这些模型将嵌入 DNA 的模式与有助于转录调控的生化和调控特性联系起来,包括表观遗传标记建模、三维基因组组织和基因表达,并具有组织和细胞类型特异性。这些方法可以预测人类基因组中任何非编码变异的功能性后果,甚至是罕见的或从未观察到的变异,并系统地描述其后果的特征,而不仅仅是通过实验或定量遗传学研究来描述。最近,通过开发和应用可解释性方法,确定了有助于完成预测任务的关键序列模式,深入了解了潜在的生物学机制,并揭示了改进未来模型的机会。
Deep Learning Sequence Models for Transcriptional Regulation
Deciphering the regulatory code of gene expression and interpreting the transcriptional effects of genome variation are critical challenges in human genetics. Modern experimental technologies have resulted in an abundance of data, enabling the development of sequence-based deep learning models that link patterns embedded in DNA to the biochemical and regulatory properties contributing to transcriptional regulation, including modeling epigenetic marks, 3D genome organization, and gene expression, with tissue and cell-type specificity. Such methods can predict the functional consequences of any noncoding variant in the human genome, even rare or never-before-observed variants, and systematically characterize their consequences beyond what is tractable from experiments or quantitative genetics studies alone. Recently, the development and application of interpretability approaches have led to the identification of key sequence patterns contributing to the predicted tasks, providing insights into the underlying biological mechanisms learned and revealing opportunities for improvement in future models.
期刊介绍:
Since its inception in 2000, the Annual Review of Genomics and Human Genetics has been dedicated to showcasing significant developments in genomics as they pertain to human genetics and the human genome. The journal emphasizes genomic technology, genome structure and function, genetic modification, human variation and population genetics, human evolution, and various aspects of human genetic diseases, including individualized medicine.