Transformer-based representation learning and multiple-instance learning for cancer diagnosis exclusively from raw sequencing fragments of bisulfite-treated plasma cell-free DNA.

IF 6.6 2区 医学 Q1 Biochemistry, Genetics and Molecular Biology Molecular Oncology Pub Date : 2024-11-01 Epub Date: 2024-10-08 DOI:10.1002/1878-0261.13745
Jilei Liu, Hongru Shen, Yichen Yang, Meng Yang, Qiang Zhang, Kexin Chen, Xiangchun Li
{"title":"Transformer-based representation learning and multiple-instance learning for cancer diagnosis exclusively from raw sequencing fragments of bisulfite-treated plasma cell-free DNA.","authors":"Jilei Liu, Hongru Shen, Yichen Yang, Meng Yang, Qiang Zhang, Kexin Chen, Xiangchun Li","doi":"10.1002/1878-0261.13745","DOIUrl":null,"url":null,"abstract":"<p><p>Early cancer diagnosis from bisulfite-treated cell-free DNA (cfDNA) fragments requires tedious data analytical procedures. Here, we present a deep-learning-based approach for early cancer interception and diagnosis (DECIDIA) that can achieve accurate cancer diagnosis exclusively from bisulfite-treated cfDNA sequencing fragments. DECIDIA relies on transformer-based representation learning of DNA fragments and weakly supervised multiple-instance learning for classification. We systematically evaluate the performance of DECIDIA for cancer diagnosis and cancer type prediction on a curated dataset of 5389 samples that consist of colorectal cancer (CRC; n = 1574), hepatocellular cell carcinoma (HCC; n = 1181), lung cancer (n = 654), and non-cancer control (n = 1980). DECIDIA achieved an area under the receiver operating curve (AUROC) of 0.980 (95% CI, 0.976-0.984) in 10-fold cross-validation settings on the CRC dataset by differentiating cancer patients from cancer-free controls, outperforming benchmarked methods that are based on methylation intensities. Noticeably, DECIDIA achieved an AUROC of 0.910 (95% CI, 0.896-0.924) on the externally independent HCC testing set in distinguishing HCC patients from cancer-free controls, although there was no HCC data used in model development. In the settings of cancer-type classification, we observed that DECIDIA achieved a micro-average AUROC of 0.963 (95% CI, 0.960-0.966) and an overall accuracy of 82.8% (95% CI, 81.8-83.9). In addition, we distilled four sequence signatures from the raw sequencing reads that exhibited differential patterns in cancer versus control and among different cancer types. Our approach represents a new paradigm towards eliminating the tedious data analytical procedures for liquid biopsy that uses bisulfite-treated cfDNA methylome.</p>","PeriodicalId":18764,"journal":{"name":"Molecular Oncology","volume":" ","pages":"2755-2769"},"PeriodicalIF":6.6000,"publicationDate":"2024-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11547222/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Oncology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1002/1878-0261.13745","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/10/8 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"Biochemistry, Genetics and Molecular Biology","Score":null,"Total":0}
引用次数: 0

Abstract

Early cancer diagnosis from bisulfite-treated cell-free DNA (cfDNA) fragments requires tedious data analytical procedures. Here, we present a deep-learning-based approach for early cancer interception and diagnosis (DECIDIA) that can achieve accurate cancer diagnosis exclusively from bisulfite-treated cfDNA sequencing fragments. DECIDIA relies on transformer-based representation learning of DNA fragments and weakly supervised multiple-instance learning for classification. We systematically evaluate the performance of DECIDIA for cancer diagnosis and cancer type prediction on a curated dataset of 5389 samples that consist of colorectal cancer (CRC; n = 1574), hepatocellular cell carcinoma (HCC; n = 1181), lung cancer (n = 654), and non-cancer control (n = 1980). DECIDIA achieved an area under the receiver operating curve (AUROC) of 0.980 (95% CI, 0.976-0.984) in 10-fold cross-validation settings on the CRC dataset by differentiating cancer patients from cancer-free controls, outperforming benchmarked methods that are based on methylation intensities. Noticeably, DECIDIA achieved an AUROC of 0.910 (95% CI, 0.896-0.924) on the externally independent HCC testing set in distinguishing HCC patients from cancer-free controls, although there was no HCC data used in model development. In the settings of cancer-type classification, we observed that DECIDIA achieved a micro-average AUROC of 0.963 (95% CI, 0.960-0.966) and an overall accuracy of 82.8% (95% CI, 81.8-83.9). In addition, we distilled four sequence signatures from the raw sequencing reads that exhibited differential patterns in cancer versus control and among different cancer types. Our approach represents a new paradigm towards eliminating the tedious data analytical procedures for liquid biopsy that uses bisulfite-treated cfDNA methylome.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于变压器的表征学习和多实例学习,可完全从经亚硫酸氢盐处理的无血浆细胞 DNA 的原始测序片段中进行癌症诊断。
利用亚硫酸氢盐处理过的无细胞DNA(cfDNA)片段进行早期癌症诊断需要繁琐的数据分析过程。在这里,我们提出了一种基于深度学习的早期癌症拦截和诊断方法(DECIDIA),它可以完全通过亚硫酸氢盐处理过的 cfDNA 测序片段实现准确的癌症诊断。DECIDIA 依靠基于变换器的 DNA 片段表示学习和弱监督多实例学习进行分类。我们在一个由 5389 个样本组成的数据集上系统地评估了 DECIDIA 在癌症诊断和癌症类型预测方面的性能,这些样本包括结直肠癌(CRC;n = 1574)、肝细胞癌(HCC;n = 1181)、肺癌(n = 654)和非癌症对照(n = 1980)。在对 CRC 数据集进行 10 倍交叉验证时,DECIDIA 通过区分癌症患者和无癌症对照组获得了 0.980(95% CI,0.976-0.984)的接收者操作曲线下面积 (AUROC),优于基于甲基化强度的基准方法。值得注意的是,在外部独立的 HCC 测试集上,DECIDIA 的 AUROC 达到了 0.910(95% CI,0.896-0.924),能将 HCC 患者与无癌症对照组区分开来,尽管在模型开发过程中没有使用 HCC 数据。在癌症类型分类中,我们观察到 DECIDIA 的微平均 AUROC 为 0.963(95% CI,0.960-0.966),总体准确率为 82.8%(95% CI,81.8-83.9)。此外,我们还从原始测序读数中提炼出了四个序列特征,这些特征在癌症与对照以及不同癌症类型之间表现出不同的模式。我们的方法代表了一种新的范例,它消除了使用亚硫酸氢盐处理过的 cfDNA 甲基组进行液体活检的繁琐数据分析程序。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Molecular Oncology
Molecular Oncology Biochemistry, Genetics and Molecular Biology-Molecular Medicine
CiteScore
11.80
自引率
1.50%
发文量
203
审稿时长
10 weeks
期刊介绍: Molecular Oncology highlights new discoveries, approaches, and technical developments, in basic, clinical and discovery-driven translational cancer research. It publishes research articles, reviews (by invitation only), and timely science policy articles. The journal is now fully Open Access with all articles published over the past 10 years freely available.
期刊最新文献
Platelet-activating factor: a potential therapeutic target to improve cancer immunotherapy. Global metabolomic profiling of tumor tissue and paired serum samples to identify biomarkers for response to neoadjuvant FOLFIRINOX treatment of human pancreatic cancer. Gut microbiota diversity is prognostic and associated with benefit from chemo-immunotherapy in metastatic triple-negative breast cancer. Integrative transcriptomic analysis identifies emetine as a promising candidate for overcoming acquired resistance to ALK inhibitors in lung cancer. Vertical inhibition of p110α/AKT and N-cadherin enhances treatment efficacy in PIK3CA-aberrated ovarian cancer cells.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1