Runfeng Chen, Qingqing Yan, Tuhanguli Tuoheti, Lin Xu, Qiang Gao, Yan Zhang, Hailong Ren, Lipeng Zheng, Feng Wang, Ya Liu
{"title":"A prediction model of rubber content in the dried root of Taraxacum kok-saghyz Rodin based on near-infrared spectroscopy","authors":"Runfeng Chen, Qingqing Yan, Tuhanguli Tuoheti, Lin Xu, Qiang Gao, Yan Zhang, Hailong Ren, Lipeng Zheng, Feng Wang, Ya Liu","doi":"10.1186/s13007-024-01183-6","DOIUrl":null,"url":null,"abstract":"Taraxacum kok-saghyz Rodin (TKS) is a highly potential source of natural rubber (NR) due to its wide range of suitable planting areas, strong adaptability, and suitability for mechanized planting and harvesting. However, current methods for detecting NR content are relatively cumbersome, necessitating the development of a rapid detection model. This study used near-infrared spectroscopy technology to establish a rapid detection model for NR content in TKS root segments and powder samples. The K445 strain at different growth stages within a year and 129 TKS samples hybridized with dandelion were used to obtain their near-infrared spectral data. The rubber content in the root of the samples was detected using the alkaline boiling method. The Monte Carlo sampling method (MCS) was used to filter abnormal data from the root segments of TKS and powder samples, respectively. The SPXY algorithm was used to divide the training set and validation set in a 3:1 ratio. The original spectrum was preprocessed using moving window smoothing (MWS), standard normalized variate (SNV), multiplicative scatter correction (MSC), and first derivative (FD) algorithms. The competitive adaptive reweighted sampling (CARS) algorithm and the corresponding chemical characteristic bands of NR were used to screen the bands. Partial least squares (PLS), random forest (RF), Lightweight gradient augmentation machine (LightGBM), and convolutional neural network (CNN) algorithms were employed to establish a model using the optimal spectral processing method for three different bands: full band, CARS algorithm, and chemical characteristic bands corresponding to NR. The model with the best predictive performance for high rubber content intervals (rubber content > 15%) was identified. The results indicated that the optimal rubber content prediction models for TKS root segments and powder samples were MWS–FD CASR–RF and MWS–FD chemical characteristic band RF, respectively. Their respective $${\\text{R}}_{{\\text{P}}}^{2}$$ , RMSEP, and RPDP values were 0.951, 0.979, 1.814, 1.133, 4.498, and 6.845. In the high rubber content range, the model based on the LightGBM algorithm had the best prediction performance, with the RMSEP of the root segments and powder samples being 0.752 and 0.918, respectively. This research indicates that dried TKS root powder samples are more appropriate for constructing a rubber content prediction model than segmented samples, and the predictive capability of root powder samples is superior to that of root segmented samples. Especially in the elevated rubber content range, the model formulated using the LightGBM algorithm has superior predictive performance, which could offer a theoretical basis for the rapid detection technology of TKS content in the future.","PeriodicalId":20100,"journal":{"name":"Plant Methods","volume":"42 1","pages":""},"PeriodicalIF":4.7000,"publicationDate":"2024-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Plant Methods","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13007-024-01183-6","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Taraxacum kok-saghyz Rodin (TKS) is a highly potential source of natural rubber (NR) due to its wide range of suitable planting areas, strong adaptability, and suitability for mechanized planting and harvesting. However, current methods for detecting NR content are relatively cumbersome, necessitating the development of a rapid detection model. This study used near-infrared spectroscopy technology to establish a rapid detection model for NR content in TKS root segments and powder samples. The K445 strain at different growth stages within a year and 129 TKS samples hybridized with dandelion were used to obtain their near-infrared spectral data. The rubber content in the root of the samples was detected using the alkaline boiling method. The Monte Carlo sampling method (MCS) was used to filter abnormal data from the root segments of TKS and powder samples, respectively. The SPXY algorithm was used to divide the training set and validation set in a 3:1 ratio. The original spectrum was preprocessed using moving window smoothing (MWS), standard normalized variate (SNV), multiplicative scatter correction (MSC), and first derivative (FD) algorithms. The competitive adaptive reweighted sampling (CARS) algorithm and the corresponding chemical characteristic bands of NR were used to screen the bands. Partial least squares (PLS), random forest (RF), Lightweight gradient augmentation machine (LightGBM), and convolutional neural network (CNN) algorithms were employed to establish a model using the optimal spectral processing method for three different bands: full band, CARS algorithm, and chemical characteristic bands corresponding to NR. The model with the best predictive performance for high rubber content intervals (rubber content > 15%) was identified. The results indicated that the optimal rubber content prediction models for TKS root segments and powder samples were MWS–FD CASR–RF and MWS–FD chemical characteristic band RF, respectively. Their respective $${\text{R}}_{{\text{P}}}^{2}$$ , RMSEP, and RPDP values were 0.951, 0.979, 1.814, 1.133, 4.498, and 6.845. In the high rubber content range, the model based on the LightGBM algorithm had the best prediction performance, with the RMSEP of the root segments and powder samples being 0.752 and 0.918, respectively. This research indicates that dried TKS root powder samples are more appropriate for constructing a rubber content prediction model than segmented samples, and the predictive capability of root powder samples is superior to that of root segmented samples. Especially in the elevated rubber content range, the model formulated using the LightGBM algorithm has superior predictive performance, which could offer a theoretical basis for the rapid detection technology of TKS content in the future.
期刊介绍:
Plant Methods is an open access, peer-reviewed, online journal for the plant research community that encompasses all aspects of technological innovation in the plant sciences.
There is no doubt that we have entered an exciting new era in plant biology. The completion of the Arabidopsis genome sequence, and the rapid progress being made in other plant genomics projects are providing unparalleled opportunities for progress in all areas of plant science. Nevertheless, enormous challenges lie ahead if we are to understand the function of every gene in the genome, and how the individual parts work together to make the whole organism. Achieving these goals will require an unprecedented collaborative effort, combining high-throughput, system-wide technologies with more focused approaches that integrate traditional disciplines such as cell biology, biochemistry and molecular genetics.
Technological innovation is probably the most important catalyst for progress in any scientific discipline. Plant Methods’ goal is to stimulate the development and adoption of new and improved techniques and research tools and, where appropriate, to promote consistency of methodologies for better integration of data from different laboratories.