Rongyue Zhao, Wangsen Li, Jinchai Xu, Linjie Chen, Xuan Wei, Xiangzeng Kong
{"title":"A CNN-based self-supervised learning framework for small-sample near-infrared spectroscopy classification.","authors":"Rongyue Zhao, Wangsen Li, Jinchai Xu, Linjie Chen, Xuan Wei, Xiangzeng Kong","doi":"10.1039/d4ay01970a","DOIUrl":null,"url":null,"abstract":"<p><p>Near-infrared (NIR) spectroscopy, with its advantages of non-destructive analysis, simple operation, and fast detection speed, has been widely applied in various fields. However, the effectiveness of current spectral analysis techniques still relies on complex preprocessing and feature selection of spectral data. While data-driven deep learning can automatically extract features from raw spectral data, it typically requires large amounts of labeled data for training, limiting its application in spectral analysis. To address this issue, we propose a self-supervised learning (SSL) framework based on convolutional neural networks (CNN) to enhance spectral analysis performance with small sample sizes. The method comprises two learning stages: pre-training and fine-tuning. In the pre-training stage, a large amount of pseudo-labeled data is used to learn intrinsic spectral features, followed by fine-tuning with a smaller set of labeled data to complete the final model training. Applied to our own collected dataset of three tea varieties, the proposed model achieved a classification accuracy of 99.12%. Additionally, experiments on three public datasets demonstrated that the SSL model significantly outperforms traditional machine learning methods, achieving accuracies of 97.83%, 98.14%, and 99.89%, respectively. Comparative experiments further confirmed the effectiveness of the pre-training stage, with the highest accuracy improvement, reaching 10.41%. These results highlight the potential of the proposed method for handling small sample spectral data, providing a viable solution for improved spectral analysis.</p>","PeriodicalId":64,"journal":{"name":"Analytical Methods","volume":" ","pages":""},"PeriodicalIF":2.7000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytical Methods","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1039/d4ay01970a","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Near-infrared (NIR) spectroscopy, with its advantages of non-destructive analysis, simple operation, and fast detection speed, has been widely applied in various fields. However, the effectiveness of current spectral analysis techniques still relies on complex preprocessing and feature selection of spectral data. While data-driven deep learning can automatically extract features from raw spectral data, it typically requires large amounts of labeled data for training, limiting its application in spectral analysis. To address this issue, we propose a self-supervised learning (SSL) framework based on convolutional neural networks (CNN) to enhance spectral analysis performance with small sample sizes. The method comprises two learning stages: pre-training and fine-tuning. In the pre-training stage, a large amount of pseudo-labeled data is used to learn intrinsic spectral features, followed by fine-tuning with a smaller set of labeled data to complete the final model training. Applied to our own collected dataset of three tea varieties, the proposed model achieved a classification accuracy of 99.12%. Additionally, experiments on three public datasets demonstrated that the SSL model significantly outperforms traditional machine learning methods, achieving accuracies of 97.83%, 98.14%, and 99.89%, respectively. Comparative experiments further confirmed the effectiveness of the pre-training stage, with the highest accuracy improvement, reaching 10.41%. These results highlight the potential of the proposed method for handling small sample spectral data, providing a viable solution for improved spectral analysis.