Quantification and mapping of medicinally important Quercitrin compound using hyperspectral imaging and machine learning

Ayushi Gupta , Prashant K. Srivastava , Karuna Shanker , K. Chandra Sekar
{"title":"Quantification and mapping of medicinally important Quercitrin compound using hyperspectral imaging and machine learning","authors":"Ayushi Gupta ,&nbsp;Prashant K. Srivastava ,&nbsp;Karuna Shanker ,&nbsp;K. Chandra Sekar","doi":"10.1016/j.jag.2024.104202","DOIUrl":null,"url":null,"abstract":"<div><div>Precise spatial mapping of individual species using hyperspectral data is crucial for effective forest management and policy-making. This study focuses on <em>Rhododendron arboreum</em>, known for its medicinal properties attributed to the flavonoid Quercitrin. Sample data and spectroradiometer data were collected from the complex terrain of the Kumaon region in the Himalayas. Hyperspectral data, which includes signal variations based on biophysical and biochemical properties along with noise, were preprocessed using filtering techniques to enhance signal clarity by removing noise. Smoothing techniques were applied to remove noisy bands from the spectra, such as the Savitzky-Golay filter for reduced least square fit complexity and the Average Mean filter for taking mean spectral values. Subsequently, Spectral Analysis (SA) techniques, including first derivative, second derivative, and continuum removal, were employed. These mathematical transformations highlighted absorption troughs and determined the effect of Quercitrin on spectral wavelengths. Principal Component Analysis (PCA) was used to identify the most relevant bands related to Quercitrin. Additionally, regression analysis was applied on resampled spectral data, selected significant wavelengths based on variable importance values, pinpointing the most prominent wavelengths: 1196, 1229, 1328, 1383, 1425, 1636, 1661, 1699, 1785, and 1715 nm. Over 50 two-band combination indices were tested, and those with p-values less than 0.05 were deemed significant. For the development of prediction model, Machine Learning (ML) algorithms, including Support Vector Machine (SVM), Relevance Vector Machine (RVM), Random Forest (RF), and Artificial Neural Network (ANN), were applied. The Random Forest model, which splits input data into trees to simulate the best model based on observed values, demonstrated high effectiveness in predicting Quercitrin levels, achieving a training correlation of 0.864 and a testing correlation of 0.570. Hence RF proved to be a best technique of band selection as well as robust for Quercitrin prediction. This methodological approach highlights the importance of advanced data processing and analysis techniques in remote sensing applications for forest phytochemical prediction.</div></div>","PeriodicalId":73423,"journal":{"name":"International journal of applied earth observation and geoinformation : ITC journal","volume":"134 ","pages":"Article 104202"},"PeriodicalIF":7.6000,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of applied earth observation and geoinformation : ITC journal","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1569843224005582","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"REMOTE SENSING","Score":null,"Total":0}
引用次数: 0

Abstract

Precise spatial mapping of individual species using hyperspectral data is crucial for effective forest management and policy-making. This study focuses on Rhododendron arboreum, known for its medicinal properties attributed to the flavonoid Quercitrin. Sample data and spectroradiometer data were collected from the complex terrain of the Kumaon region in the Himalayas. Hyperspectral data, which includes signal variations based on biophysical and biochemical properties along with noise, were preprocessed using filtering techniques to enhance signal clarity by removing noise. Smoothing techniques were applied to remove noisy bands from the spectra, such as the Savitzky-Golay filter for reduced least square fit complexity and the Average Mean filter for taking mean spectral values. Subsequently, Spectral Analysis (SA) techniques, including first derivative, second derivative, and continuum removal, were employed. These mathematical transformations highlighted absorption troughs and determined the effect of Quercitrin on spectral wavelengths. Principal Component Analysis (PCA) was used to identify the most relevant bands related to Quercitrin. Additionally, regression analysis was applied on resampled spectral data, selected significant wavelengths based on variable importance values, pinpointing the most prominent wavelengths: 1196, 1229, 1328, 1383, 1425, 1636, 1661, 1699, 1785, and 1715 nm. Over 50 two-band combination indices were tested, and those with p-values less than 0.05 were deemed significant. For the development of prediction model, Machine Learning (ML) algorithms, including Support Vector Machine (SVM), Relevance Vector Machine (RVM), Random Forest (RF), and Artificial Neural Network (ANN), were applied. The Random Forest model, which splits input data into trees to simulate the best model based on observed values, demonstrated high effectiveness in predicting Quercitrin levels, achieving a training correlation of 0.864 and a testing correlation of 0.570. Hence RF proved to be a best technique of band selection as well as robust for Quercitrin prediction. This methodological approach highlights the importance of advanced data processing and analysis techniques in remote sensing applications for forest phytochemical prediction.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
利用高光谱成像和机器学习对具有重要药用价值的槲皮素化合物进行定量和绘图
利用高光谱数据精确绘制单个物种的空间分布图对于有效的森林管理和政策制定至关重要。本研究侧重于杜鹃花,杜鹃花因其黄酮类化合物槲皮苷的药用特性而闻名。样本数据和光谱辐射计数据是从喜马拉雅山脉库马恩地区的复杂地形中收集的。高光谱数据包括基于生物物理和生物化学特性的信号变化以及噪声,使用过滤技术对数据进行预处理,通过去除噪声提高信号清晰度。采用平滑技术去除光谱中的噪声带,如用于降低最小平方拟合复杂度的萨维茨基-戈莱滤波器和用于提取平均光谱值的平均值滤波器。随后,采用了光谱分析(SA)技术,包括一阶导数、二阶导数和连续体去除。这些数学变换突出了吸收波谷,并确定了槲皮素对光谱波长的影响。主成分分析(PCA)用于确定与槲皮素最相关的波段。此外,还对重新采样的光谱数据进行了回归分析,根据变量重要性值选择了重要的波长,确定了最突出的波长:1196、1229、1328、1383、1425、1636、1661、1699、1785 和 1715 nm。对 50 多个双波段组合指数进行了测试,P 值小于 0.05 的指数被认为具有重要意义。在建立预测模型时,采用了机器学习(ML)算法,包括支持向量机(SVM)、相关向量机(RVM)、随机森林(RF)和人工神经网络(ANN)。随机森林模型将输入数据分割成树,根据观察值模拟最佳模型,该模型在预测槲皮素水平方面表现出很高的有效性,训练相关性达到 0.864,测试相关性达到 0.570。因此,射频被证明是槲皮素预测的最佳频带选择技术和稳健性。这种方法强调了遥感应用中先进数据处理和分析技术在森林植物化学预测中的重要性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
International journal of applied earth observation and geoinformation : ITC journal
International journal of applied earth observation and geoinformation : ITC journal Global and Planetary Change, Management, Monitoring, Policy and Law, Earth-Surface Processes, Computers in Earth Sciences
CiteScore
12.00
自引率
0.00%
发文量
0
审稿时长
77 days
期刊介绍: The International Journal of Applied Earth Observation and Geoinformation publishes original papers that utilize earth observation data for natural resource and environmental inventory and management. These data primarily originate from remote sensing platforms, including satellites and aircraft, supplemented by surface and subsurface measurements. Addressing natural resources such as forests, agricultural land, soils, and water, as well as environmental concerns like biodiversity, land degradation, and hazards, the journal explores conceptual and data-driven approaches. It covers geoinformation themes like capturing, databasing, visualization, interpretation, data quality, and spatial uncertainty.
期刊最新文献
Back to geometry: Efficient indoor space segmentation from point clouds by 2D–3D geometry constrains Fine-scale retrieval of leaf chlorophyll content using a semi-empirically accelerated 3D radiative transfer model Improved early detection of wheat stripe rust through integration pigments and pigment-related spectral indices quantified from UAV hyperspectral imagery GNSS-denied geolocalization of UAVs using terrain-weighted constraint optimization Investigating overlapping deformation patterns of the Beijing Plain by independent component analysis of InSAR observations
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1