Yuxiang Lin, Qiyi Zhang, Hanxi Chen, Shuhang Liu, Kaiming Peng, Xiaojie Wang, Liyong Zhang, Jun Huang, Xiuqing Yan, Xueliang Lin, Uddin M D Hasan, Mahabub Sarwara, Fangmeng Fu, Shangyuan Feng, Chuan Wang
{"title":"基于深度学习的血清表面增强拉曼光谱的多种癌症早期检测:一项大规模病例对照研究。","authors":"Yuxiang Lin, Qiyi Zhang, Hanxi Chen, Shuhang Liu, Kaiming Peng, Xiaojie Wang, Liyong Zhang, Jun Huang, Xiuqing Yan, Xueliang Lin, Uddin M D Hasan, Mahabub Sarwara, Fangmeng Fu, Shangyuan Feng, Chuan Wang","doi":"10.1186/s12916-025-03887-5","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Early detection of cancer can help patients with more effective treatments and result in better prognosis. Unfortunately, established cancer screening technologies are limited for use, especially for multi-cancer early detection. In this study, we described a serum-based platform integrating surface-enhanced Raman spectroscopy (SERS) technology with resampling strategy, feature dimensionality enhancement, deep learning and interpretability analysis methods for sensitive and accurate pan-cancer screening.</p><p><strong>Methods: </strong>Totally, 1655 early-stage patients with breast cancer (BC, n = 569), lung cancer (LC, n = 513), thyroid cancer (TC, n = 220), colorectal cancer (CC, n = 215), gastric cancer (GC, n = 100), esophageal cancer (EC, n = 38), and 1896 healthy controls (HC) were enrolled. The serum SERS spectra were obtained from each participant. Data dimension enhancement was conducted by heatmap transformation and continuous wavelet transform (CWT). The dimensionalization SERS spectral data were subsequently analyzed by residual neural network (ResNet) as convolutional neural network (CNN) algorithm. Class activation mapping (CAM) method was performed to elucidate the potential biological significance of spectral data classification.</p><p><strong>Results: </strong>All participants were divided into a training set and a test set with a ratio of 7:3. The BorderlineSMOTE method was selected as the most appropriate resampling strategy and the deep neural network (DNN) model achieved desirable performance among all groups (accuracy rate: 93.15%, precision rate: 88:46%, recall rate: 85.68%, and F1-score: 86.98%), with the generated AUC values of 0.991 for HC, 0.995 for BC, 0.979 for LC, 0.996 for TC, 0.994 for CC, 0.982 for GC, and 0.941 for EC, respectively. Furthermore, the combination use of SERS spectra data and ResNet (form of heatmap) were also capable of effectively distinguishing different categories and making accurate predictions (accuracy rate: 94.75%, precision rate: 89.02, recall rate: 86.97, and F1-score: 87.88), with the AUC values of 0.996 for HC, 0.995 for BC, 0.988 for LC, 0.999 for TC, 0.993 for CC, 0.985 for GC, and 0.940 for EC, respectively. Additionally, strong wave number range of the spectral data was observed in the CAM analysis.</p><p><strong>Conclusions: </strong>Our study has offered a highly effective serum SERS-based approach for multi-cancer early detection, which might shed new light on cancer screening in clinical practice.</p>","PeriodicalId":9188,"journal":{"name":"BMC Medicine","volume":"23 1","pages":"97"},"PeriodicalIF":8.3000,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11846373/pdf/","citationCount":"0","resultStr":"{\"title\":\"Multi-cancer early detection based on serum surface-enhanced Raman spectroscopy with deep learning: a large-scale case-control study.\",\"authors\":\"Yuxiang Lin, Qiyi Zhang, Hanxi Chen, Shuhang Liu, Kaiming Peng, Xiaojie Wang, Liyong Zhang, Jun Huang, Xiuqing Yan, Xueliang Lin, Uddin M D Hasan, Mahabub Sarwara, Fangmeng Fu, Shangyuan Feng, Chuan Wang\",\"doi\":\"10.1186/s12916-025-03887-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Early detection of cancer can help patients with more effective treatments and result in better prognosis. Unfortunately, established cancer screening technologies are limited for use, especially for multi-cancer early detection. In this study, we described a serum-based platform integrating surface-enhanced Raman spectroscopy (SERS) technology with resampling strategy, feature dimensionality enhancement, deep learning and interpretability analysis methods for sensitive and accurate pan-cancer screening.</p><p><strong>Methods: </strong>Totally, 1655 early-stage patients with breast cancer (BC, n = 569), lung cancer (LC, n = 513), thyroid cancer (TC, n = 220), colorectal cancer (CC, n = 215), gastric cancer (GC, n = 100), esophageal cancer (EC, n = 38), and 1896 healthy controls (HC) were enrolled. The serum SERS spectra were obtained from each participant. Data dimension enhancement was conducted by heatmap transformation and continuous wavelet transform (CWT). The dimensionalization SERS spectral data were subsequently analyzed by residual neural network (ResNet) as convolutional neural network (CNN) algorithm. Class activation mapping (CAM) method was performed to elucidate the potential biological significance of spectral data classification.</p><p><strong>Results: </strong>All participants were divided into a training set and a test set with a ratio of 7:3. The BorderlineSMOTE method was selected as the most appropriate resampling strategy and the deep neural network (DNN) model achieved desirable performance among all groups (accuracy rate: 93.15%, precision rate: 88:46%, recall rate: 85.68%, and F1-score: 86.98%), with the generated AUC values of 0.991 for HC, 0.995 for BC, 0.979 for LC, 0.996 for TC, 0.994 for CC, 0.982 for GC, and 0.941 for EC, respectively. Furthermore, the combination use of SERS spectra data and ResNet (form of heatmap) were also capable of effectively distinguishing different categories and making accurate predictions (accuracy rate: 94.75%, precision rate: 89.02, recall rate: 86.97, and F1-score: 87.88), with the AUC values of 0.996 for HC, 0.995 for BC, 0.988 for LC, 0.999 for TC, 0.993 for CC, 0.985 for GC, and 0.940 for EC, respectively. Additionally, strong wave number range of the spectral data was observed in the CAM analysis.</p><p><strong>Conclusions: </strong>Our study has offered a highly effective serum SERS-based approach for multi-cancer early detection, which might shed new light on cancer screening in clinical practice.</p>\",\"PeriodicalId\":9188,\"journal\":{\"name\":\"BMC Medicine\",\"volume\":\"23 1\",\"pages\":\"97\"},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2025-02-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11846373/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Medicine\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1186/s12916-025-03887-5\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"MEDICINE, GENERAL & INTERNAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12916-025-03887-5","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MEDICINE, GENERAL & INTERNAL","Score":null,"Total":0}
引用次数: 0
摘要
背景:早期发现癌症可以帮助患者获得更有效的治疗和更好的预后。不幸的是,现有的癌症筛查技术的应用有限,特别是对多种癌症的早期检测。在这项研究中,我们描述了一个基于血清的平台,将表面增强拉曼光谱(SERS)技术与重采样策略、特征维数增强、深度学习和可解释性分析方法相结合,用于敏感和准确的泛癌症筛查。方法:共纳入1655例早期乳腺癌(BC, n = 569)、肺癌(LC, n = 513)、甲状腺癌(TC, n = 220)、结直肠癌(CC, n = 215)、胃癌(GC, n = 100)、食管癌(EC, n = 38)患者和1896例健康对照(HC)。获得每位参与者的血清SERS谱。采用热图变换和连续小波变换对数据进行维数增强。随后用残差神经网络(ResNet)作为卷积神经网络(CNN)算法对维度化SERS谱数据进行分析。采用类激活映射(CAM)方法阐明光谱数据分类的潜在生物学意义。结果:所有参与者被分为训练集和测试集,比例为7:3。选择BorderlineSMOTE方法作为最合适的重采样策略,深度神经网络(DNN)模型在所有组中均取得了较好的效果(准确率:93.15%,准确率:88:46%,召回率:85.68%,f1评分:86.98%),HC、BC、LC、TC、CC、0.994、GC和EC的AUC值分别为0.991、0.995、0.979、0.982和0.941。此外,SERS光谱数据与ResNet(热图形式)结合使用也能有效区分不同类别并做出准确的预测(准确率为94.75%,准确率为89.02,召回率为86.97,f1评分为87.88),其中HC、BC、LC、TC、CC、0.993、GC和EC的AUC值分别为0.996、0.995、0.988、0.999、0.985和0.940。此外,在CAM分析中还观察到光谱数据的强波数范围。结论:本研究提供了一种高效的基于血清sers的多癌早期检测方法,可能为临床癌症筛查提供新的思路。
Multi-cancer early detection based on serum surface-enhanced Raman spectroscopy with deep learning: a large-scale case-control study.
Background: Early detection of cancer can help patients with more effective treatments and result in better prognosis. Unfortunately, established cancer screening technologies are limited for use, especially for multi-cancer early detection. In this study, we described a serum-based platform integrating surface-enhanced Raman spectroscopy (SERS) technology with resampling strategy, feature dimensionality enhancement, deep learning and interpretability analysis methods for sensitive and accurate pan-cancer screening.
Methods: Totally, 1655 early-stage patients with breast cancer (BC, n = 569), lung cancer (LC, n = 513), thyroid cancer (TC, n = 220), colorectal cancer (CC, n = 215), gastric cancer (GC, n = 100), esophageal cancer (EC, n = 38), and 1896 healthy controls (HC) were enrolled. The serum SERS spectra were obtained from each participant. Data dimension enhancement was conducted by heatmap transformation and continuous wavelet transform (CWT). The dimensionalization SERS spectral data were subsequently analyzed by residual neural network (ResNet) as convolutional neural network (CNN) algorithm. Class activation mapping (CAM) method was performed to elucidate the potential biological significance of spectral data classification.
Results: All participants were divided into a training set and a test set with a ratio of 7:3. The BorderlineSMOTE method was selected as the most appropriate resampling strategy and the deep neural network (DNN) model achieved desirable performance among all groups (accuracy rate: 93.15%, precision rate: 88:46%, recall rate: 85.68%, and F1-score: 86.98%), with the generated AUC values of 0.991 for HC, 0.995 for BC, 0.979 for LC, 0.996 for TC, 0.994 for CC, 0.982 for GC, and 0.941 for EC, respectively. Furthermore, the combination use of SERS spectra data and ResNet (form of heatmap) were also capable of effectively distinguishing different categories and making accurate predictions (accuracy rate: 94.75%, precision rate: 89.02, recall rate: 86.97, and F1-score: 87.88), with the AUC values of 0.996 for HC, 0.995 for BC, 0.988 for LC, 0.999 for TC, 0.993 for CC, 0.985 for GC, and 0.940 for EC, respectively. Additionally, strong wave number range of the spectral data was observed in the CAM analysis.
Conclusions: Our study has offered a highly effective serum SERS-based approach for multi-cancer early detection, which might shed new light on cancer screening in clinical practice.
期刊介绍:
BMC Medicine is an open access, transparent peer-reviewed general medical journal. It is the flagship journal of the BMC series and publishes outstanding and influential research in various areas including clinical practice, translational medicine, medical and health advances, public health, global health, policy, and general topics of interest to the biomedical and sociomedical professional communities. In addition to research articles, the journal also publishes stimulating debates, reviews, unique forum articles, and concise tutorials. All articles published in BMC Medicine are included in various databases such as Biological Abstracts, BIOSIS, CAS, Citebase, Current contents, DOAJ, Embase, MEDLINE, PubMed, Science Citation Index Expanded, OAIster, SCImago, Scopus, SOCOLAR, and Zetoc.