通过转换器从序列 SNPs 数据中高效推算 HLA。

IF 2.6 3区 生物学 Q2 GENETICS & HEREDITY Journal of Human Genetics Pub Date : 2024-08-02 DOI:10.1038/s10038-024-01278-x
Kaho Tanaka, Kosuke Kato, Naoki Nonaka, Jun Seita
{"title":"通过转换器从序列 SNPs 数据中高效推算 HLA。","authors":"Kaho Tanaka, Kosuke Kato, Naoki Nonaka, Jun Seita","doi":"10.1038/s10038-024-01278-x","DOIUrl":null,"url":null,"abstract":"Human leukocyte antigen (HLA) genes are associated with a variety of diseases, yet the direct typing of HLA alleles is both time-consuming and costly. Consequently, various imputation methods leveraging sequential single nucleotide polymorphisms (SNPs) data have been proposed, employing either statistical or deep learning models, such as the convolutional neural network (CNN)-based model, DEEP*HLA. However, these methods exhibit limited imputation efficiency for infrequent alleles and necessitate a large size of reference dataset. In this context, we have developed a Transformer-based model to HLA allele imputation, named “HLA Reliable IMpuatioN by Transformer (HLARIMNT)” designed to exploit the sequential nature of SNPs data. We evaluated HLARIMNT’s performance using two distinct reference panels; Pan-Asian reference panel (n = 530) and Type 1 Diabetes genetics Consortium (T1DGC) reference panel (n = 5225), alongside a combined panel (n = 1060). HLARIMNT demonstrated superior accuracy to DEEP*HLA across several indices, particularly for infrequent alleles. Furthermore, we explored the impact of varying training data sizes on imputation accuracy, finding that HLARIMNT consistently outperformed across all data size. These findings suggest that Transformer-based models can efficiently impute not only HLA types but potentially other gene types from sequential SNPs data.","PeriodicalId":16077,"journal":{"name":"Journal of Human Genetics","volume":null,"pages":null},"PeriodicalIF":2.6000,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s10038-024-01278-x.pdf","citationCount":"0","resultStr":"{\"title\":\"Efficient HLA imputation from sequential SNPs data by transformer\",\"authors\":\"Kaho Tanaka, Kosuke Kato, Naoki Nonaka, Jun Seita\",\"doi\":\"10.1038/s10038-024-01278-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Human leukocyte antigen (HLA) genes are associated with a variety of diseases, yet the direct typing of HLA alleles is both time-consuming and costly. Consequently, various imputation methods leveraging sequential single nucleotide polymorphisms (SNPs) data have been proposed, employing either statistical or deep learning models, such as the convolutional neural network (CNN)-based model, DEEP*HLA. However, these methods exhibit limited imputation efficiency for infrequent alleles and necessitate a large size of reference dataset. In this context, we have developed a Transformer-based model to HLA allele imputation, named “HLA Reliable IMpuatioN by Transformer (HLARIMNT)” designed to exploit the sequential nature of SNPs data. We evaluated HLARIMNT’s performance using two distinct reference panels; Pan-Asian reference panel (n = 530) and Type 1 Diabetes genetics Consortium (T1DGC) reference panel (n = 5225), alongside a combined panel (n = 1060). HLARIMNT demonstrated superior accuracy to DEEP*HLA across several indices, particularly for infrequent alleles. Furthermore, we explored the impact of varying training data sizes on imputation accuracy, finding that HLARIMNT consistently outperformed across all data size. These findings suggest that Transformer-based models can efficiently impute not only HLA types but potentially other gene types from sequential SNPs data.\",\"PeriodicalId\":16077,\"journal\":{\"name\":\"Journal of Human Genetics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2024-08-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.nature.com/articles/s10038-024-01278-x.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Human Genetics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://www.nature.com/articles/s10038-024-01278-x\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"GENETICS & HEREDITY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Human Genetics","FirstCategoryId":"99","ListUrlMain":"https://www.nature.com/articles/s10038-024-01278-x","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

摘要

人类白细胞抗原(HLA)基因与多种疾病相关,但直接进行 HLA 等位基因分型既费时又费钱。因此,人们提出了各种利用序列单核苷酸多态性(SNPs)数据的估算方法,采用统计或深度学习模型,如基于卷积神经网络(CNN)的模型 DEEP*HLA。然而,这些方法对于不常见的等位基因的估算效率有限,而且需要大量的参考数据集。在这种情况下,我们开发了一种基于变换器的 HLA 等位基因估算模型,命名为 "HLA Reliable IMpuatioN by Transformer (HLARIMNT)",旨在利用 SNPs 数据的连续性。我们使用两个不同的参考面板(泛亚参考面板(n = 530)和 1 型糖尿病遗传学联盟(T1DGC)参考面板(n = 5225))以及一个组合面板(n = 1060)评估了 HLARIMNT 的性能。在多个指标上,HLARIMNT 的准确性都优于 DEEP*HLA,特别是对于不常见的等位基因。此外,我们还探讨了不同训练数据规模对估算准确性的影响,发现在所有数据规模下,HLARIMNT 的表现始终优于 DEEP*HLA。这些研究结果表明,基于 Transformer 的模型不仅能有效地归因 HLA 类型,还可能从序列 SNPs 数据中归因其他基因类型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

摘要图片

摘要图片

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Efficient HLA imputation from sequential SNPs data by transformer
Human leukocyte antigen (HLA) genes are associated with a variety of diseases, yet the direct typing of HLA alleles is both time-consuming and costly. Consequently, various imputation methods leveraging sequential single nucleotide polymorphisms (SNPs) data have been proposed, employing either statistical or deep learning models, such as the convolutional neural network (CNN)-based model, DEEP*HLA. However, these methods exhibit limited imputation efficiency for infrequent alleles and necessitate a large size of reference dataset. In this context, we have developed a Transformer-based model to HLA allele imputation, named “HLA Reliable IMpuatioN by Transformer (HLARIMNT)” designed to exploit the sequential nature of SNPs data. We evaluated HLARIMNT’s performance using two distinct reference panels; Pan-Asian reference panel (n = 530) and Type 1 Diabetes genetics Consortium (T1DGC) reference panel (n = 5225), alongside a combined panel (n = 1060). HLARIMNT demonstrated superior accuracy to DEEP*HLA across several indices, particularly for infrequent alleles. Furthermore, we explored the impact of varying training data sizes on imputation accuracy, finding that HLARIMNT consistently outperformed across all data size. These findings suggest that Transformer-based models can efficiently impute not only HLA types but potentially other gene types from sequential SNPs data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Human Genetics
Journal of Human Genetics 生物-遗传学
CiteScore
7.20
自引率
0.00%
发文量
101
审稿时长
4-8 weeks
期刊介绍: The Journal of Human Genetics is an international journal publishing articles on human genetics, including medical genetics and human genome analysis. It covers all aspects of human genetics, including molecular genetics, clinical genetics, behavioral genetics, immunogenetics, pharmacogenomics, population genetics, functional genomics, epigenetics, genetic counseling and gene therapy. Articles on the following areas are especially welcome: genetic factors of monogenic and complex disorders, genome-wide association studies, genetic epidemiology, cancer genetics, personal genomics, genotype-phenotype relationships and genome diversity.
期刊最新文献
Novel homozygous ESAM variants in two families with perinatal strokes showing variable neuroradiologic and clinical findings. Biallelic missense CEP55 variants cause prenatal MARCH syndrome. Two-hit mutation causes Wilms tumor in an individual with FBXW7-related neurodevelopmental syndrome. Genetic analysis of a Yayoi individual from the Doigahama site provides insights into the origins of immigrants to the Japanese Archipelago. Development of a method for the imputation of the multi-allelic serotonin-transporter-linked polymorphic region (5-HTTLPR) in the Japanese population.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1