{"title":"An improved cancer diagnosis algorithm for protein mass spectrometry based on PCA and a one-dimensional neural network combining ResNet and SENet†","authors":"Liang Ma, Wenqing Gao, Xiangyang Hu, Dongdong Zhou, Chenlu Wang, Jiancheng Yu and Keqi Tang","doi":"10.1039/D4AN00784K","DOIUrl":null,"url":null,"abstract":"<p >Cancer is one of the most serious health problems worldwide. Because cancer has no specific symptoms in its early stages, it is often not diagnosed until it is in advanced stages, reducing the likelihood of successful treatment. Therefore, early diagnosis of cancer is a formidable challenge. Mass spectrometry-based proteomics offers a robust technical foundation for cancer diagnosis. However, mass spectrometry data are characterized by high dimensionality, large data volume, and noise interference, which can lead to diagnostic errors in clinical applications. To address this challenge, an improved algorithm combining principal component analysis (PCA) with a convolutional neural network (CNN) algorithm (denoted as PCA-1DSE-ResCNN) was proposed to assist in analyzing high-dimensional mass spectral data. The algorithm initially reduced the dimensionality of the data through the PCA technique. Subsequently, the convolutional neural network algorithm (1DSE-ResCNN) integrating residual blocks and squeeze-and-excitation blocks was used as a classifier. This approach can not only alleviate the issues of overfitting and gradient vanishing caused by deep network layers but also reduce redundant information, enabling the algorithm to effectively learn high-dimensional data features and deal with nonlinear relationships. To validate the effectiveness of the algorithm, the high-dimensional ovarian cancer mass spectrometry dataset was selected as an example to examine its application performance in early diagnosis of ovarian cancer. The experimental results demonstrated that the PCA-1DSE-ResCNN algorithm outperforms other methods in terms of accuracy, specificity, and sensitivity on three high-dimensional ovarian cancer datasets. This study will contribute to the rapid diagnosis and early detection of cancer.</p>","PeriodicalId":63,"journal":{"name":"Analyst","volume":" 23","pages":" 5675-5683"},"PeriodicalIF":3.6000,"publicationDate":"2024-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analyst","FirstCategoryId":"92","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2024/an/d4an00784k","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Cancer is one of the most serious health problems worldwide. Because cancer has no specific symptoms in its early stages, it is often not diagnosed until it is in advanced stages, reducing the likelihood of successful treatment. Therefore, early diagnosis of cancer is a formidable challenge. Mass spectrometry-based proteomics offers a robust technical foundation for cancer diagnosis. However, mass spectrometry data are characterized by high dimensionality, large data volume, and noise interference, which can lead to diagnostic errors in clinical applications. To address this challenge, an improved algorithm combining principal component analysis (PCA) with a convolutional neural network (CNN) algorithm (denoted as PCA-1DSE-ResCNN) was proposed to assist in analyzing high-dimensional mass spectral data. The algorithm initially reduced the dimensionality of the data through the PCA technique. Subsequently, the convolutional neural network algorithm (1DSE-ResCNN) integrating residual blocks and squeeze-and-excitation blocks was used as a classifier. This approach can not only alleviate the issues of overfitting and gradient vanishing caused by deep network layers but also reduce redundant information, enabling the algorithm to effectively learn high-dimensional data features and deal with nonlinear relationships. To validate the effectiveness of the algorithm, the high-dimensional ovarian cancer mass spectrometry dataset was selected as an example to examine its application performance in early diagnosis of ovarian cancer. The experimental results demonstrated that the PCA-1DSE-ResCNN algorithm outperforms other methods in terms of accuracy, specificity, and sensitivity on three high-dimensional ovarian cancer datasets. This study will contribute to the rapid diagnosis and early detection of cancer.