Vibrational spectroscopic techniques and variable selection in Linear Discriminant Analysis to geographical origin discrimination of Jatropha mollissima sap
Caroline Lins Fernandes , Tiago Santos Silva , Caike Lobo Rodrigues de Lima , Isabel Cristina Vicente dos Santos , Djair Araújo Fialho , Marcus Vinicius Lia Fook , Paulo Henrique Gonçalves Dias Diniz , José Filipe Bacalhau Rodrigues , Simone da Silva Simões
{"title":"Vibrational spectroscopic techniques and variable selection in Linear Discriminant Analysis to geographical origin discrimination of Jatropha mollissima sap","authors":"Caroline Lins Fernandes , Tiago Santos Silva , Caike Lobo Rodrigues de Lima , Isabel Cristina Vicente dos Santos , Djair Araújo Fialho , Marcus Vinicius Lia Fook , Paulo Henrique Gonçalves Dias Diniz , José Filipe Bacalhau Rodrigues , Simone da Silva Simões","doi":"10.1016/j.phytol.2024.09.007","DOIUrl":null,"url":null,"abstract":"<div><div>This study aimed at the geographical origin discrimination of <em>Jatropha mollissima</em> saps using mid- and near-infrared spectroscopies (MIR and NIR, respectively) and Linear Discriminant Analysis (LDA). For this purpose, a total of 108 sap samples were collected in 3 different geographical regions over 12 months, and their content of total polyphenols, flavonoids, and tannins were quantified. Overall, samples from region C had consistently lower levels of these secondary metabolites throughout most of the months studied. In contrast, samples from regions A and B displayed relatively stable metabolite concentrations over the collection period. Since raw sap samples are subject to deterioration during storage, lyophilization was employed to remove moisture and consequently increase their shelf-life. Thus, MIR spectroscopy was applied to both raw and lyophilized sap samples, while NIR spectroscopy was only applied to lyophilized samples. Then, Principal Component Analysis from the secondary metabolites and spectral data indicated a trend of separation between the region C (located in <em>Sertão</em>) and the regions A and B (located in <em>Agreste</em>). Next, the Successive Projection Algorithm (SPA), Genetic Algorithm (GA), and Ant Colony Optimization (ACO) were used for variable selection in LDA. As a result, all constructed models achieved a sensitivity, specificity, and accuracy of 100 %, or very close to it, in both the training and test sets. However, the SPA-LDA models were more parsimonious, selecting fewer variables and presenting reproductive results as it is a deterministic technique. Hence, the proposed analytical methodologies align with the principles of Green Chemistry by requiring a simple sample preparation, avoiding the use of reagents and solvents, and reducing waste generation. Moreover, special attention can be given to NIR spectroscopy, as it offers a cost-effective analytical tool that can be explored <em>in situ</em> using a portable miniaturized device in the future.</div></div>","PeriodicalId":20408,"journal":{"name":"Phytochemistry Letters","volume":"64 ","pages":"Pages 37-46"},"PeriodicalIF":1.3000,"publicationDate":"2024-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Phytochemistry Letters","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1874390024001356","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0
Abstract
This study aimed at the geographical origin discrimination of Jatropha mollissima saps using mid- and near-infrared spectroscopies (MIR and NIR, respectively) and Linear Discriminant Analysis (LDA). For this purpose, a total of 108 sap samples were collected in 3 different geographical regions over 12 months, and their content of total polyphenols, flavonoids, and tannins were quantified. Overall, samples from region C had consistently lower levels of these secondary metabolites throughout most of the months studied. In contrast, samples from regions A and B displayed relatively stable metabolite concentrations over the collection period. Since raw sap samples are subject to deterioration during storage, lyophilization was employed to remove moisture and consequently increase their shelf-life. Thus, MIR spectroscopy was applied to both raw and lyophilized sap samples, while NIR spectroscopy was only applied to lyophilized samples. Then, Principal Component Analysis from the secondary metabolites and spectral data indicated a trend of separation between the region C (located in Sertão) and the regions A and B (located in Agreste). Next, the Successive Projection Algorithm (SPA), Genetic Algorithm (GA), and Ant Colony Optimization (ACO) were used for variable selection in LDA. As a result, all constructed models achieved a sensitivity, specificity, and accuracy of 100 %, or very close to it, in both the training and test sets. However, the SPA-LDA models were more parsimonious, selecting fewer variables and presenting reproductive results as it is a deterministic technique. Hence, the proposed analytical methodologies align with the principles of Green Chemistry by requiring a simple sample preparation, avoiding the use of reagents and solvents, and reducing waste generation. Moreover, special attention can be given to NIR spectroscopy, as it offers a cost-effective analytical tool that can be explored in situ using a portable miniaturized device in the future.
本研究旨在利用中红外和近红外光谱仪(分别为 MIR 和 NIR)以及线性判别分析(LDA)对麻风树树液进行地理产地判别。为此,我们在 3 个不同的地理区域收集了 108 份树液样本,历时 12 个月,并对其总多酚、类黄酮和单宁的含量进行了量化。总体而言,在研究的大部分月份里,来自 C 地区的样本中这些次生代谢物的含量一直较低。相比之下,A 区和 B 区的样本在采集期间的代谢物浓度相对稳定。由于原始树液样本在储存过程中会变质,因此我们采用了冻干技术来去除水分,从而延长其保质期。因此,近红外光谱法同时适用于未加工树液样本和冻干树液样本,而近红外光谱法仅适用于冻干树液样本。然后,根据次生代谢物和光谱数据进行主成分分析,结果表明 C 区(位于 Sertão)与 A 区和 B 区(位于 Agreste)之间存在分离趋势。接下来,在 LDA 中使用了连续投影算法(SPA)、遗传算法(GA)和蚁群优化算法(ACO)进行变量选择。结果,所有构建的模型在训练集和测试集中的灵敏度、特异性和准确性都达到或接近 100%。不过,SPA-LDA 模型更为简洁,选择的变量更少,而且由于它是一种确定性技术,因此能呈现出繁殖结果。因此,建议的分析方法符合绿色化学的原则,只需简单的样品制备,避免使用试剂和溶剂,并减少废物的产生。此外,还可以特别关注近红外光谱法,因为它提供了一种具有成本效益的分析工具,将来可以使用便携式微型设备进行现场探索。
期刊介绍:
Phytochemistry Letters invites rapid communications on all aspects of natural product research including:
• Structural elucidation of natural products
• Analytical evaluation of herbal medicines
• Clinical efficacy, safety and pharmacovigilance of herbal medicines
• Natural product biosynthesis
• Natural product synthesis and chemical modification
• Natural product metabolism
• Chemical ecology
• Biotechnology
• Bioassay-guided isolation
• Pharmacognosy
• Pharmacology of natural products
• Metabolomics
• Ethnobotany and traditional usage
• Genetics of natural products
Manuscripts that detail the isolation of just one new compound are not substantial enough to be sent out of review and are out of scope. Furthermore, where pharmacology has been performed on one new compound to increase the amount of novel data, the pharmacology must be substantial and/or related to the medicinal use of the producing organism.