Development of a deep learning model for cancer diagnosis by inspecting cell-free DNA end-motifs

IF 6.8 1区医学 Q1 ONCOLOGY NPJ Precision Oncology Pub Date : 2024-07-27 DOI:10.1038/s41698-024-00635-5

Hongru Shen, Meng Yang, Jilei Liu, Kexin Chen, Xiangchun Li

{"title":"Development of a deep learning model for cancer diagnosis by inspecting cell-free DNA end-motifs","authors":"Hongru Shen, Meng Yang, Jilei Liu, Kexin Chen, Xiangchun Li","doi":"10.1038/s41698-024-00635-5","DOIUrl":null,"url":null,"abstract":"Accurate discrimination between patients with and without cancer from cfDNA is crucial for early cancer diagnosis. Herein, we develop and validate a deep-learning-based model entitled end-motif inspection via transformer (EMIT) for discriminating individuals with and without cancer by learning feature representations from cfDNA end-motifs. EMIT is a self-supervised learning approach that models rankings of cfDNA end-motifs. We include 4606 samples subjected to different types of cfDNA sequencing to develop EIMIT, and subsequently evaluate classification performance of linear projections of EMIT on six datasets and an additional inhouse testing set encopassing whole-genome, whole-genome bisulfite and 5-hydroxymethylcytosine sequencing. The linear projection of representations from EMIT achieved area under the receiver operating curve (AUROC) values ranged from 0.895 (0.835–0.955) to 0.996 (0.994–0.997) across these six datasets, outperforming its baseline by significant margins. Additionally, we showed that linear projection of EMIT representations can achieve an AUROC of 0.962 (0.914–1.0) in identification of lung cancer on an independent testing set subjected to whole-exome sequencing. The findings of this study indicate that a transformer-based deep learning model can learn cancer-discrimative representations from cfDNA end-motifs. The representations of this deep learning model can be exploited for discriminating patients with and without cancer.","PeriodicalId":19433,"journal":{"name":"NPJ Precision Oncology","volume":null,"pages":null},"PeriodicalIF":6.8000,"publicationDate":"2024-07-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41698-024-00635-5.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NPJ Precision Oncology","FirstCategoryId":"3","ListUrlMain":"https://www.nature.com/articles/s41698-024-00635-5","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ONCOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Accurate discrimination between patients with and without cancer from cfDNA is crucial for early cancer diagnosis. Herein, we develop and validate a deep-learning-based model entitled end-motif inspection via transformer (EMIT) for discriminating individuals with and without cancer by learning feature representations from cfDNA end-motifs. EMIT is a self-supervised learning approach that models rankings of cfDNA end-motifs. We include 4606 samples subjected to different types of cfDNA sequencing to develop EIMIT, and subsequently evaluate classification performance of linear projections of EMIT on six datasets and an additional inhouse testing set encopassing whole-genome, whole-genome bisulfite and 5-hydroxymethylcytosine sequencing. The linear projection of representations from EMIT achieved area under the receiver operating curve (AUROC) values ranged from 0.895 (0.835–0.955) to 0.996 (0.994–0.997) across these six datasets, outperforming its baseline by significant margins. Additionally, we showed that linear projection of EMIT representations can achieve an AUROC of 0.962 (0.914–1.0) in identification of lung cancer on an independent testing set subjected to whole-exome sequencing. The findings of this study indicate that a transformer-based deep learning model can learn cancer-discrimative representations from cfDNA end-motifs. The representations of this deep learning model can be exploited for discriminating patients with and without cancer.

Abstract Image

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

通过检测无细胞 DNA 末端位点开发用于癌症诊断的深度学习模型

从 cfDNA 中准确区分癌症患者和非癌症患者对于早期癌症诊断至关重要。在本文中，我们开发并验证了一种基于深度学习的模型，名为 "通过转换器进行末端修饰检查（EMIT）"，该模型通过学习 cfDNA 末端修饰的特征表征来区分癌症患者和非癌症患者。EMIT 是一种自我监督的学习方法，可对 cfDNA 末端主题词的排名进行建模。我们纳入了 4606 份经过不同类型 cfDNA 测序的样本来开发 EIMIT，随后在六个数据集和一个额外的内部测试集（包括全基因组、全基因组亚硫酸氢盐测序和 5-羟甲基胞嘧啶测序）上评估了 EMIT 线性投影的分类性能。在这六个数据集中，EMIT的线性投影表示法的接收者操作曲线下面积（AUROC）值从0.895（0.835-0.955）到0.996（0.994-0.997）不等，明显优于其基准值。此外，我们还发现，在全外显子组测序的独立测试集上识别肺癌时，EMIT 表示的线性投影的 AUROC 可以达到 0.962（0.914-1.0）。这项研究的结果表明，基于变换器的深度学习模型可以从 cfDNA 末端位点中学习癌症鉴别表征。这种深度学习模型的表征可用于区分癌症患者和非癌症患者。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

NPJ Precision Oncology ONCOLOGY-

CiteScore

9.90

自引率

1.30%

发文量

审稿时长

18 weeks

期刊介绍： Online-only and open access, npj Precision Oncology is an international, peer-reviewed journal dedicated to showcasing cutting-edge scientific research in all facets of precision oncology, spanning from fundamental science to translational applications and clinical medicine.