Pub Date : 2026-03-01Epub Date: 2026-03-04DOI: 10.1016/j.ecoinf.2026.103690
Arinder K. Arora , Nolan Anderson , Kiran R. Gadhave
Forecasting pest population dynamics under variable microclimates is essential for understanding and managing ecological interactions in agroecosystems. In this study, we employed machine learning to model the population dynamics of a cosmopolitan supervector across two contrasting production environments—open fields and high tunnels. Using data from 1686 weekly trap observations (standardized to 2254 modeling units) and 16 environmental predictors, we developed and compared Random Forest, Gradient Boosting Machine (GBM), and XGBoost models to identify key abiotic and biotic drivers of population fluctuations. Random Forest achieved the highest predictive accuracy in open fields (87.7%), while XGBoost performed best under high-tunnel conditions (84.9%). Parent (seed) population and temperature consistently emerged as dominant predictors, with humidity and wind showing secondary effects. Models trained in one microclimate failed to predict populations in the other (≤44% accuracy), revealing distinct ecological processes governing pest dynamics in adjacent systems. These results demonstrate that machine learning can disentangle nonlinear interactions among environmental variables and improve predictive understanding of vector population ecology. Our framework illustrates how ecological informatics can integrate environmental sensing, population monitoring, and data-driven modeling to forecast biologically meaningful patterns across heterogeneous agroecosystems.
{"title":"Machine learning reveals microclimate-specific drivers of a cosmopolitan supervector's population dynamics","authors":"Arinder K. Arora , Nolan Anderson , Kiran R. Gadhave","doi":"10.1016/j.ecoinf.2026.103690","DOIUrl":"10.1016/j.ecoinf.2026.103690","url":null,"abstract":"<div><div>Forecasting pest population dynamics under variable microclimates is essential for understanding and managing ecological interactions in agroecosystems. In this study, we employed machine learning to model the population dynamics of a cosmopolitan supervector across two contrasting production environments—open fields and high tunnels. Using data from 1686 weekly trap observations (standardized to 2254 modeling units) and 16 environmental predictors, we developed and compared Random Forest, Gradient Boosting Machine (GBM), and XGBoost models to identify key abiotic and biotic drivers of population fluctuations. Random Forest achieved the highest predictive accuracy in open fields (87.7%), while XGBoost performed best under high-tunnel conditions (84.9%). Parent (seed) population and temperature consistently emerged as dominant predictors, with humidity and wind showing secondary effects. Models trained in one microclimate failed to predict populations in the other (≤44% accuracy), revealing distinct ecological processes governing pest dynamics in adjacent systems. These results demonstrate that machine learning can disentangle nonlinear interactions among environmental variables and improve predictive understanding of vector population ecology. Our framework illustrates how ecological informatics can integrate environmental sensing, population monitoring, and data-driven modeling to forecast biologically meaningful patterns across heterogeneous agroecosystems.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"94 ","pages":"Article 103690"},"PeriodicalIF":7.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147422416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-02-28DOI: 10.1016/j.ecoinf.2026.103663
Toufique A. Soomro , Allister Clarke , Jonathan Medway , Bin Liang , Stephen Summerhayes , Juan Pablo Guerschman , Robert de Ligt , Hugh Armitage , Clinton Ayers
Vegetation is a central indicator of rangeland condition, yet monitoring it across vast and heterogeneous dryland landscapes remains a major challenge. Although the advantages of remote sensing and unmanned aerial vehicles (UAVs) have been recognized for many years, their evolving role in rangeland vegetation assessment warrants a fresh examination, particularly in light of recent advances in sensor design, data processing, and multiscale integration. This systematic review, conducted using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) approach, synthesizes contemporary developments in UAV-based monitoring of rangeland vegetation. Key progress includes improved flight configurations, enhanced photogrammetric reconstruction for structural mapping, and the increased use of machine learning for estimating vegetation cover and above ground biomass. Emerging tools such as real-time kinematic positioning, automated image processing, and cloud-based computing are accelerating the transition toward transparent, repeatable, and scalable workflows. The integration of UAV products with satellite observations further strengthens regional vegetation assessment and supports broader ecosystem monitoring frameworks. Together, these advances highlight the growing capacity of UAV-based methods to deliver consistent, high-resolution vegetation information for sustainable rangeland management in Australia and globally.
{"title":"UAV-based remote sensing for rangeland monitoring, a generalized and transparent workflow with an Australian lead","authors":"Toufique A. Soomro , Allister Clarke , Jonathan Medway , Bin Liang , Stephen Summerhayes , Juan Pablo Guerschman , Robert de Ligt , Hugh Armitage , Clinton Ayers","doi":"10.1016/j.ecoinf.2026.103663","DOIUrl":"10.1016/j.ecoinf.2026.103663","url":null,"abstract":"<div><div>Vegetation is a central indicator of rangeland condition, yet monitoring it across vast and heterogeneous dryland landscapes remains a major challenge. Although the advantages of remote sensing and unmanned aerial vehicles (UAVs) have been recognized for many years, their evolving role in rangeland vegetation assessment warrants a fresh examination, particularly in light of recent advances in sensor design, data processing, and multiscale integration. This systematic review, conducted using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) approach, synthesizes contemporary developments in UAV-based monitoring of rangeland vegetation. Key progress includes improved flight configurations, enhanced photogrammetric reconstruction for structural mapping, and the increased use of machine learning for estimating vegetation cover and above ground biomass. Emerging tools such as real-time kinematic positioning, automated image processing, and cloud-based computing are accelerating the transition toward transparent, repeatable, and scalable workflows. The integration of UAV products with satellite observations further strengthens regional vegetation assessment and supports broader ecosystem monitoring frameworks. Together, these advances highlight the growing capacity of UAV-based methods to deliver consistent, high-resolution vegetation information for sustainable rangeland management in Australia and globally.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"94 ","pages":"Article 103663"},"PeriodicalIF":7.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147422417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-02-24DOI: 10.1016/j.ecoinf.2026.103665
Yuhan Wu , Matteo Convertino
Coastal habitats, such as oyster reefs, are critical land–sea ecotones that support biodiversity, ecosystem services, and ecosystem resilience. However, restoration of oyster reefs in river deltas and other coastal ecosystems remains challenged by the lack of scalable tools capable of quantifying how environmental heterogeneity shapes connectivity and reef structure across fragmented ecotones.
Here, we introduce a generalizable Topologic Systemic Ecograph (TIE) model that integrates three-dimensional hydrodynamic and biogeochemical fields with habitat occurrence data to infer ecological flows, flow-defined connectivity, and basins derived from predicted habitat suitability. By adapting hydro-inspired flow-routing algorithms and network-theoretic analysis, we construct the Oyster Flow Graph (OFG) and delineate Oyster Connectivity Basins (OCBs)– ecograph and ecosheds –providing spatially explicit ecological patterns, including eco-environmental feedbacks that support biogenic structures across ecosystem scales.
Application of the TIE framework to biogenic structures such as oyster reefs in the Pearl River Delta, Greater Bay Area, reveals pronounced regional differentiation, with central delta basins functioning as connectivity hubs and peripheral basins acting as flow bottlenecks. Stable, high-suitability zones emerge along sheltered deltaic and estuarine habitats, indicating conditions favorable for reef establishment and persistence. The inferred ecological flow topology is physically consistent with regional hydrodynamic patterns for 66.34% of flow directions, and Random Forest modeling highlights key hydro-biogeochemical drivers shaping network connectivity. At the delta scale, nitrate concentration, latitudinal (North–South) and vertical velocities at intermediate and deep depths, as well as chlorophyll-a, emerge as the predominant factors. Under high-flow conditions, vertical and longitudinal/cross-delta (East–West) velocities become the most important features. These flow interactions reflect the predominance of deltaic hydrology in governing nutrient transport and residence within reef habitats, thereby influencing reef morphology, ecological fitness, and cascading ecosystem services. Temperature and salinity emerge as second-order factors, given their relatively weak interactions with other environmental variables in defining ecological flows.
Overall, the proposed TIE model and framework advance precision restoration design by explicitly linking inferred eco-environmental flows derived from habitat suitability to ecological connectivity expressed as topology, and are transferable to other coastal and marine habitats where eco-environmental pressures can be structured to trigger ecological self-emergence.
{"title":"Suitability networks for restoration: Hydrobiogeochemical flows imprinting habitat-forming species","authors":"Yuhan Wu , Matteo Convertino","doi":"10.1016/j.ecoinf.2026.103665","DOIUrl":"10.1016/j.ecoinf.2026.103665","url":null,"abstract":"<div><div>Coastal habitats, such as oyster reefs, are critical land–sea ecotones that support biodiversity, ecosystem services, and ecosystem resilience. However, restoration of oyster reefs in river deltas and other coastal ecosystems remains challenged by the lack of scalable tools capable of quantifying how environmental heterogeneity shapes connectivity and reef structure across fragmented ecotones.</div><div>Here, we introduce a generalizable Topologic Systemic Ecograph (TIE) model that integrates three-dimensional hydrodynamic and biogeochemical fields with habitat occurrence data to infer ecological flows, flow-defined connectivity, and basins derived from predicted habitat suitability. By adapting hydro-inspired flow-routing algorithms and network-theoretic analysis, we construct the Oyster Flow Graph (OFG) and delineate Oyster Connectivity Basins (OCBs)– ecograph and ecosheds –providing spatially explicit ecological patterns, including eco-environmental feedbacks that support biogenic structures across ecosystem scales.</div><div>Application of the TIE framework to biogenic structures such as oyster reefs in the Pearl River Delta, Greater Bay Area, reveals pronounced regional differentiation, with central delta basins functioning as connectivity hubs and peripheral basins acting as flow bottlenecks. Stable, high-suitability zones emerge along sheltered deltaic and estuarine habitats, indicating conditions favorable for reef establishment and persistence. The inferred ecological flow topology is physically consistent with regional hydrodynamic patterns for 66.34% of flow directions, and Random Forest modeling highlights key hydro-biogeochemical drivers shaping network connectivity. At the delta scale, nitrate concentration, latitudinal (North–South) and vertical velocities at intermediate and deep depths, as well as chlorophyll-a, emerge as the predominant factors. Under high-flow conditions, vertical and longitudinal/cross-delta (East–West) velocities become the most important features. These flow interactions reflect the predominance of deltaic hydrology in governing nutrient transport and residence within reef habitats, thereby influencing reef morphology, ecological fitness, and cascading ecosystem services. Temperature and salinity emerge as second-order factors, given their relatively weak interactions with other environmental variables in defining ecological flows.</div><div>Overall, the proposed TIE model and framework advance precision restoration design by explicitly linking inferred eco-environmental flows derived from habitat suitability to ecological connectivity expressed as topology, and are transferable to other coastal and marine habitats where eco-environmental pressures can be structured to trigger ecological self-emergence.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"94 ","pages":"Article 103665"},"PeriodicalIF":7.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147422403","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Time-series forecasting faces a major challenge when input data is missing. Additionally, standard multi-site water quality models often fail to capture the spatial connections among monitoring stations. This study proposes a GAT-enhanced LSTM model (GAT-LSTM) that integrates Graph Attention Networks (GAT) with Long Short-Term Memory (LSTM) to enhance prediction robustness under data incompleteness. We established a systematic evaluation framework, using MAE, MAPE, and R2 as the metrics for assessing predictive performance. In addition, we defined a Comprehensive Robustness Index (CRI) to evaluate model performance under three scenarios: spatial (missing stations), temporal (missing time steps), and random (missing indicators). Using real-world data from 13 monitoring stations in Pearl River, the third largest river in China, we compared GAT-LSTM against a standalone LSTM. Results show that the two models achieved comparable accuracy when data were complete; however, across all missing-data scenarios, GAT-LSTM consistently demonstrated superior robustness, exhibiting 1.3–1.8 times greater tolerance to data loss than the conventional LSTM. The GAT component became critical when spatial data is missing. The performance gap was most pronounced when key monitoring stations were removed first: GAT-LSTM maintained high stability (CRI: 0.98), whereas the standalone LSTM experienced a sharp decline (CRI: 0.5). These findings confirm that incorporating the GAT architecture provides powerful compensatory capability for incomplete spatial data, rendering GAT-LSTM significantly more resilient in real-world water quality prediction tasks. When monitoring networks suffer from inconsistent spatial coverage, GAT transitions from an optional enhancement to an essential core component.
{"title":"Unveiling the river spatial topology in robust water quality prediction: A LSTM-based evaluation framework","authors":"Shuai Wang, Ying Xing, Jiahui Zhu, Yuxian Li, Feifei Dong","doi":"10.1016/j.ecoinf.2026.103659","DOIUrl":"10.1016/j.ecoinf.2026.103659","url":null,"abstract":"<div><div>Time-series forecasting faces a major challenge when input data is missing. Additionally, standard multi-site water quality models often fail to capture the spatial connections among monitoring stations. This study proposes a GAT-enhanced LSTM model (GAT-LSTM) that integrates Graph Attention Networks (GAT) with Long Short-Term Memory (LSTM) to enhance prediction robustness under data incompleteness. We established a systematic evaluation framework, using MAE, MAPE, and R<sup>2</sup> as the metrics for assessing predictive performance. In addition, we defined a Comprehensive Robustness Index (CRI) to evaluate model performance under three scenarios: spatial (missing stations), temporal (missing time steps), and random (missing indicators). Using real-world data from 13 monitoring stations in Pearl River, the third largest river in China, we compared GAT-LSTM against a standalone LSTM. Results show that the two models achieved comparable accuracy when data were complete; however, across all missing-data scenarios, GAT-LSTM consistently demonstrated superior robustness, exhibiting 1.3–1.8 times greater tolerance to data loss than the conventional LSTM. The GAT component became critical when spatial data is missing. The performance gap was most pronounced when key monitoring stations were removed first: GAT-LSTM maintained high stability (CRI: 0.98), whereas the standalone LSTM experienced a sharp decline (CRI: 0.5). These findings confirm that incorporating the GAT architecture provides powerful compensatory capability for incomplete spatial data, rendering GAT-LSTM significantly more resilient in real-world water quality prediction tasks. When monitoring networks suffer from inconsistent spatial coverage, GAT transitions from an optional enhancement to an essential core component.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"94 ","pages":"Article 103659"},"PeriodicalIF":7.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147422405","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-02-27DOI: 10.1016/j.ecoinf.2026.103679
Tuyan Luo , Xiaohu Tang , Xin Lv , Baolong Bao , Xiaohui Chen , Xiu Fang , Zhixiang Liu , Jingxiang Xu
Automated fish identification plays a pivotal role in the development of intelligent aquaculture systems by enabling more effective stock assessment and behavioral monitoring. Although contemporary convolutional neural network (CNN)-based approaches have demonstrated strong recognition performance, they frequently exhibit computational inefficiency and limited robustness under the challenging conditions characteristic of underwater environments. In this study, we introduce a novel network exploration framework, grounded in the RegNet design paradigm, for deriving task-specific architectures tailored to underwater fish recognition. Using a relatively small dataset and approximately 200K training iterations, we obtain a family of high-performing models, collectively referred to as SeekNet, spanning multiple complexity regimes. Relative to state-of-the-art baselines, SeekNet consistently achieves superior performance. On our primary dataset, SeekNet attains a rank-1 accuracy of 95.97% and a True Acceptance Rate (TAR) of 88.04% at a False Acceptance Rate (FAR) of . On a separate closed-set dataset, it reaches a rank-1 accuracy of 98.78% and a TAR of 98.71% at the same FAR threshold. These results substantiate the effectiveness of the proposed methodology and underscore its practical suitability for deployment in real-world aquaculture environments.
{"title":"A design space-based network for efficient and accurate fish recognition in aquaculture","authors":"Tuyan Luo , Xiaohu Tang , Xin Lv , Baolong Bao , Xiaohui Chen , Xiu Fang , Zhixiang Liu , Jingxiang Xu","doi":"10.1016/j.ecoinf.2026.103679","DOIUrl":"10.1016/j.ecoinf.2026.103679","url":null,"abstract":"<div><div>Automated fish identification plays a pivotal role in the development of intelligent aquaculture systems by enabling more effective stock assessment and behavioral monitoring. Although contemporary convolutional neural network (CNN)-based approaches have demonstrated strong recognition performance, they frequently exhibit computational inefficiency and limited robustness under the challenging conditions characteristic of underwater environments. In this study, we introduce a novel network exploration framework, grounded in the RegNet design paradigm, for deriving task-specific architectures tailored to underwater fish recognition. Using a relatively small dataset and approximately 200K training iterations, we obtain a family of high-performing models, collectively referred to as SeekNet, spanning multiple complexity regimes. Relative to state-of-the-art baselines, SeekNet consistently achieves superior performance. On our primary dataset, SeekNet attains a rank-1 accuracy of 95.97% and a True Acceptance Rate (TAR) of 88.04% at a False Acceptance Rate (FAR) of <span><math><mrow><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mo>−</mo><mn>6</mn></mrow></msup></mrow></math></span>. On a separate closed-set dataset, it reaches a rank-1 accuracy of 98.78% and a TAR of 98.71% at the same FAR threshold. These results substantiate the effectiveness of the proposed methodology and underscore its practical suitability for deployment in real-world aquaculture environments.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"94 ","pages":"Article 103679"},"PeriodicalIF":7.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147422420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Unmanned underwater vehicles (UUVs) are increasingly used for non-invasive data collection, with applications ranging from supporting fisheries stock assessment to biodiversity mapping. Estimating the area surveyed is crucial to calculate densities and abundances from observations. Area estimation methods range from utilising fixed transect dimensions, to advanced approaches that account for path deviations and intra-transect variability in field of view, often integrating multiple sensors or detailed bathymetric data. However, accuracy of position data from remote vehicles is limited by environmental and operational variability. Compounding these differences, researchers rarely fully document methodologies, preventing comparability between datasets.
This paper develops a transferable methodology to process UUV position data, assessing the results of 2 width and 12 length estimation methods on estimated area surveyed. This analysis uniquely includes the calculation of associated error and resulting confidence intervals for each approach. Results show species density estimates can vary up to 13% depending on processing applied. The two equations used to calculate transect width cause significant differences between density estimates. Significant differences in transect length also occur depending on the degree and method of smoothing technique applied to position data. Importantly, these differences between methodologies are encompassed by the variance calculated from the position data.
Recommendations to obtain representative transect areas are to validate width equations against laser measurements, use incremental position data and include depth when calculating total distance travelled. Due to variation in resulting density estimates across methods it is essential to include confidence intervals and full details of pre-filtering and smoothing procedures.
{"title":"Area estimation methods in underwater video surveys: Biases, errors and their impacts on density estimates","authors":"Georgina Vickery , Fabian Zimmermann , Fletcher Thompson , Carsten Hvingel","doi":"10.1016/j.ecoinf.2026.103648","DOIUrl":"10.1016/j.ecoinf.2026.103648","url":null,"abstract":"<div><div>Unmanned underwater vehicles (UUVs) are increasingly used for non-invasive data collection, with applications ranging from supporting fisheries stock assessment to biodiversity mapping. Estimating the area surveyed is crucial to calculate densities and abundances from observations. Area estimation methods range from utilising fixed transect dimensions, to advanced approaches that account for path deviations and intra-transect variability in field of view, often integrating multiple sensors or detailed bathymetric data. However, accuracy of position data from remote vehicles is limited by environmental and operational variability. Compounding these differences, researchers rarely fully document methodologies, preventing comparability between datasets.</div><div>This paper develops a transferable methodology to process UUV position data, assessing the results of 2 width and 12 length estimation methods on estimated area surveyed. This analysis uniquely includes the calculation of associated error and resulting confidence intervals for each approach. Results show species density estimates can vary up to 13% depending on processing applied. The two equations used to calculate transect width cause significant differences between density estimates. Significant differences in transect length also occur depending on the degree and method of smoothing technique applied to position data. Importantly, these differences between methodologies are encompassed by the variance calculated from the position data.</div><div>Recommendations to obtain representative transect areas are to validate width equations against laser measurements, use incremental position data and include depth when calculating total distance travelled. Due to variation in resulting density estimates across methods it is essential to include confidence intervals and full details of pre-filtering and smoothing procedures.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"94 ","pages":"Article 103648"},"PeriodicalIF":7.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147422415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-02-16DOI: 10.1016/j.ecoinf.2026.103647
Osmar Luiz Ferreira de Carvalho , Glauber das Neves , João Paulo Sena-Souza , Alexandre Tadeu Brunello , Vinicius Vasconcelos , Daniel Guerreiro e Silva , Maria Gabriella da Silva Araújo , Deoclecio Jardim Amorim , Luiz Antonio Martinelli , Gabriela Bielefeld Nardoto , Osmar Abílio de Carvalho Júnior
Soil C is an integrative indicator of carbon cycling, vegetation composition, and land use dynamics. Despite increasing availability of high-resolution environmental datasets, predicting soil C remains challenging due to collinear and scale-dependent biogeochemical processes, and few studies have systematically compared feature selection strategies or regression algorithms across spatial scales. This study introduces an innovative hierarchical framework for predicting the spatial distribution of soil C across Brazil, systematically comparing feature selection strategies and machine learning algorithms across three nested datasets: Cerrado, extended Cerrado, and national scale. Predictors included climatic variables, topography, soil properties, and vegetation indices. Feature selection combined stepwise, recursive, and exhaustive searches, followed by variance inflation factor (VIF) filtering to reduce multicollinearity. Model benchmarking compared linear, kernel-based, and ensemble regressors under nested cross-validation, with performance assessed by coefficient of determination (), root mean squared error (RMSE), and mean absolute error (MAE). Results show that model performance declined with increasing spatial extent, with best VIF-constrained decreasing from 0.77 (local) to 0.64 (regional) and 0.58 (national). Compact VIF-constrained subsets yielded similar accuracy to unconstrained sets, demonstrating that multicollinearity control improves parsimony without sacrificing predictive power. Ensemble regressors outperformed linear and kernel-based methods across all datasets. Feature importance shifted with spatial extent, with vegetation productivity and seasonal climate jointly structuring C patterns rather than any single predictor dominating across scales. This framework advances C isoscape modeling by combining predictive accuracy with interpretability, supporting applications in soil carbon monitoring, ecological research, and land-use planning.
{"title":"Predicting soil δ13C patterns in Brazil using nested datasets, feature selection, and machine learning","authors":"Osmar Luiz Ferreira de Carvalho , Glauber das Neves , João Paulo Sena-Souza , Alexandre Tadeu Brunello , Vinicius Vasconcelos , Daniel Guerreiro e Silva , Maria Gabriella da Silva Araújo , Deoclecio Jardim Amorim , Luiz Antonio Martinelli , Gabriela Bielefeld Nardoto , Osmar Abílio de Carvalho Júnior","doi":"10.1016/j.ecoinf.2026.103647","DOIUrl":"10.1016/j.ecoinf.2026.103647","url":null,"abstract":"<div><div>Soil <span><math><msup><mrow><mi>δ</mi></mrow><mrow><mn>13</mn></mrow></msup></math></span>C is an integrative indicator of carbon cycling, vegetation composition, and land use dynamics. Despite increasing availability of high-resolution environmental datasets, predicting soil <span><math><msup><mrow><mi>δ</mi></mrow><mrow><mn>13</mn></mrow></msup></math></span>C remains challenging due to collinear and scale-dependent biogeochemical processes, and few studies have systematically compared feature selection strategies or regression algorithms across spatial scales. This study introduces an innovative hierarchical framework for predicting the spatial distribution of soil <span><math><msup><mrow><mi>δ</mi></mrow><mrow><mn>13</mn></mrow></msup></math></span>C across Brazil, systematically comparing feature selection strategies and machine learning algorithms across three nested datasets: Cerrado, extended Cerrado, and national scale. Predictors included climatic variables, topography, soil properties, and vegetation indices. Feature selection combined stepwise, recursive, and exhaustive searches, followed by variance inflation factor (VIF) filtering to reduce multicollinearity. Model benchmarking compared linear, kernel-based, and ensemble regressors under nested cross-validation, with performance assessed by coefficient of determination (<span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>), root mean squared error (RMSE), and mean absolute error (MAE). Results show that model performance declined with increasing spatial extent, with best VIF-constrained <span><math><msup><mrow><mi>R</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span> decreasing from 0.77 (local) to 0.64 (regional) and 0.58 (national). Compact VIF-constrained subsets yielded similar accuracy to unconstrained sets, demonstrating that multicollinearity control improves parsimony without sacrificing predictive power. Ensemble regressors outperformed linear and kernel-based methods across all datasets. Feature importance shifted with spatial extent, with vegetation productivity and seasonal climate jointly structuring <span><math><msup><mrow><mi>δ</mi></mrow><mrow><mn>13</mn></mrow></msup></math></span>C patterns rather than any single predictor dominating across scales. This framework advances <span><math><msup><mrow><mi>δ</mi></mrow><mrow><mn>13</mn></mrow></msup></math></span>C isoscape modeling by combining predictive accuracy with interpretability, supporting applications in soil carbon monitoring, ecological research, and land-use planning.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"94 ","pages":"Article 103647"},"PeriodicalIF":7.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147422406","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-01-08DOI: 10.1016/j.ecoinf.2026.103605
Anuruddha Paul , Rishi Raj , Mahendra Kumar Gourisaria , Amitkumar V. Jha , Nicu Bizon
Wildlife conservation efforts increasingly depend on automated species classification for processing large-scale camera trap data, yet existing approaches struggle with accuracy and computational efficiency in resource-constrained environments. This paper introduces HARVEST (Hierarchical Attention for Robust Vision Enhancement with Shifted Tokenization), a novel hybrid architecture integrating YOLOv8 object detection with transformer-based classification. The architecture incorporates three key innovations: Shifted Patch Tokenization (SPT) for boundary information preservation, Local Information Enhancer (LIFE) for spatial feature extraction, and Locality-Enhanced Attention (LEA) for adaptive feature integration. The model is evaluated on two comprehensive datasets: a challenging 45-species Ohio State University (OSU) Small Animals dataset exhibiting an extreme class imbalance (6320:1 ratio) and a balanced 6-species African wildlife dataset. The HARVEST demonstrates excellent performance and achieves 85.27% accuracy on the OSU dataset and 94.74% accuracy on the Wildlife dataset with only 13.0M parameters, representing an 85% reduction compared to standard Vision Transformers while maintaining superior performance. The OSU evaluation demonstrates robust performance across highly imbalanced real-world conditions with species sample sizes ranging from 1 to 6320 images, validating practical applicability for conservation scenarios. Qualitative analysis reveals biologically meaningful attention patterns focusing on taxonomically relevant features. The efficient architecture enables real-world deployment in conservation applications, providing a practical solution for automated wildlife monitoring and biodiversity surveillance.
野生动物保护工作越来越依赖于自动物种分类来处理大规模相机陷阱数据,然而现有的方法在资源有限的环境中存在准确性和计算效率的问题。本文介绍了一种将YOLOv8目标检测与基于变换的分类相结合的新型混合架构——HARVEST (Hierarchical Attention for Robust Vision Enhancement with shifting Tokenization)。该体系结构包含三个关键创新:用于边界信息保存的移位补丁标记化(SPT),用于空间特征提取的局部信息增强器(LIFE)和用于自适应特征集成的位置增强注意(LEA)。该模型在两个综合数据集上进行了评估:一个具有挑战性的45种俄亥俄州立大学(OSU)小动物数据集,显示出极端的类不平衡(6380:1的比例)和一个平衡的6种非洲野生动物数据集。HARVEST表现出优异的性能,在OSU数据集上实现了85.27%的准确率,在野生动物数据集上实现了94.74%的准确率,仅使用13.0M参数,与标准视觉变形器相比降低了85%,同时保持了优异的性能。俄勒冈州立大学的评估表明,在高度不平衡的现实世界条件下,物种样本量从1到6320张不等,验证了保护场景的实际适用性。定性分析揭示了生物学上有意义的注意力模式,集中在分类学上相关的特征上。这种高效的架构能够在保护应用中进行实际部署,为自动野生动物监测和生物多样性监测提供实用的解决方案。
{"title":"YOLO-HARVEST: A hybrid ViT architecture with locality-enhanced attention for automated wildlife species classification","authors":"Anuruddha Paul , Rishi Raj , Mahendra Kumar Gourisaria , Amitkumar V. Jha , Nicu Bizon","doi":"10.1016/j.ecoinf.2026.103605","DOIUrl":"10.1016/j.ecoinf.2026.103605","url":null,"abstract":"<div><div>Wildlife conservation efforts increasingly depend on automated species classification for processing large-scale camera trap data, yet existing approaches struggle with accuracy and computational efficiency in resource-constrained environments. This paper introduces HARVEST (Hierarchical Attention for Robust Vision Enhancement with Shifted Tokenization), a novel hybrid architecture integrating YOLOv8 object detection with transformer-based classification. The architecture incorporates three key innovations: Shifted Patch Tokenization (SPT) for boundary information preservation, Local Information Enhancer (LIFE) for spatial feature extraction, and Locality-Enhanced Attention (LEA) for adaptive feature integration. The model is evaluated on two comprehensive datasets: a challenging 45-species Ohio State University (OSU) Small Animals dataset exhibiting an extreme class imbalance (6320:1 ratio) and a balanced 6-species African wildlife dataset. The HARVEST demonstrates excellent performance and achieves 85.27% accuracy on the OSU dataset and 94.74% accuracy on the Wildlife dataset with only 13.0M parameters, representing an 85% reduction compared to standard Vision Transformers while maintaining superior performance. The OSU evaluation demonstrates robust performance across highly imbalanced real-world conditions with species sample sizes ranging from 1 to 6320 images, validating practical applicability for conservation scenarios. Qualitative analysis reveals biologically meaningful attention patterns focusing on taxonomically relevant features. The efficient architecture enables real-world deployment in conservation applications, providing a practical solution for automated wildlife monitoring and biodiversity surveillance.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"94 ","pages":"Article 103605"},"PeriodicalIF":7.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146080600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-01-30DOI: 10.1016/j.ecoinf.2026.103624
Diogo F. Oliveira , Gonçalo M. Marques , Filipe M.P. Santos , Laure Pecquerie , João M.C. Sousa , Tiago Domingos
Dynamic Energy Budget (DEB) theory is a general theory that describes how organisms utilize the energy in food for maintenance, growth, development, and reproduction. DEB models have been widely applied in fields such as conservation biology, aquaculture and ecotoxicology, due to their ability to simulate how organisms respond to changing environmental conditions. To obtain a DEB model, the calibration problem must be solved: find the parameters that minimize the deviation between observed data and model predictions. While DEB model calibration is largely automated, the selection of initial parameters remains a key unresolved step, since the only automated method – the bijection method – often fails to produce a feasible initial parameter set. Consequently, modelers resort to trial-and-error to find parameters to seed the estimation. To bridge this gap, we propose using machine learning to initialize the calibration. We develop two models: a neural network and a 1-nearest-neighbor. Both models are built with a focus on feasibility, directly integrating parameter constraints into their structure. We train and evaluate our methods on the 5000+ DEB models in the Add-my-Pet database. Both methods generate feasible parameter sets in 99% of cases — compared to only 40% for the bijection method. The neural network initialization leads to improved DEB model calibration, achieving a calibration loss three times lower, on average, when compared to other methods. To support broader adoption, we have open-sourced our code and our models are available as initialization options within DEBtool, the primary software for parameter calibration.
{"title":"Reliable machine learning initialization methods for the calibration of Dynamic Energy Budget models","authors":"Diogo F. Oliveira , Gonçalo M. Marques , Filipe M.P. Santos , Laure Pecquerie , João M.C. Sousa , Tiago Domingos","doi":"10.1016/j.ecoinf.2026.103624","DOIUrl":"10.1016/j.ecoinf.2026.103624","url":null,"abstract":"<div><div>Dynamic Energy Budget (DEB) theory is a general theory that describes how organisms utilize the energy in food for maintenance, growth, development, and reproduction. DEB models have been widely applied in fields such as conservation biology, aquaculture and ecotoxicology, due to their ability to simulate how organisms respond to changing environmental conditions. To obtain a DEB model, the calibration problem must be solved: find the parameters that minimize the deviation between observed data and model predictions. While DEB model calibration is largely automated, the selection of initial parameters remains a key unresolved step, since the only automated method – the bijection method – often fails to produce a feasible initial parameter set. Consequently, modelers resort to trial-and-error to find parameters to seed the estimation. To bridge this gap, we propose using machine learning to initialize the calibration. We develop two models: a neural network and a 1-nearest-neighbor. Both models are built with a focus on feasibility, directly integrating parameter constraints into their structure. We train and evaluate our methods on the 5000+ DEB models in the Add-my-Pet database. Both methods generate feasible parameter sets in 99% of cases — compared to only 40% for the bijection method. The neural network initialization leads to improved DEB model calibration, achieving a calibration loss three times lower, on average, when compared to other methods. To support broader adoption, we have open-sourced our code and our models are available as initialization options within <span>DEBtool</span>, the primary software for parameter calibration.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"94 ","pages":"Article 103624"},"PeriodicalIF":7.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146173894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2026-03-01Epub Date: 2026-02-12DOI: 10.1016/j.ecoinf.2026.103656
Matthias Körschens , Solveig Franziska Bucher , Christiane M. Ritz , Sebastian Gebauer , Jens Wesenberg , Christine Römermann
Herbaria contain large numbers of conserved specimens with lots of information for biodiversity research, since they offer a track record of the morphology as well as temporal and spatial distribution of plant species worldwide. Besides the dried plant itself, a lot of additional information is usually provided with the herbarium specimens, typically captured in printed or handwritten labels, such as the date of collection, the location and the collector’s name. While, due to historical reasons, the specimens have been collected and labeled manually, considerable efforts are underway to digitize entire herbaria and therewith make the specimens available for analysis with automated methods. However, the extraction of information from handwritten labels is a considerable challenge, since the handwritings do not only differ from one collector to another, but they are also often in old types of writing (e.g., Sütterlin, an old German script). Therefore, they are often hard to decipher both manually and automatically, and barely any substantial consistent data of this kind exists to train state-of-the-art vision models. Since the location of the labels differs depending on the record, they need to be detected before the automated analysis of the writing, which also proved challenging in the past. In this work we show that state-of-the-art Large Language and Vision Models (LLVM) possess capabilities to extract such handwriting zero-shot, i.e., completely without training or fine-tuning, to a high degree of accuracy. Additionally, we show that the results can be refined and improved considerably by performing zero-shot detection of the labels beforehand. We evaluate our approach on two novel datasets, one containing handwritten and one printed labels, respectively, based on herbarium scans from the virtual herbarium of the flora of Germany. In our evaluations, the approaches achieve a mean similarity of 84.5% for handwritten, and one of 93.1% for printed labels. Thus, we conclude that still some evaluation is needed before the LLVMs can be fully applied to transcribe herbarium specimen labels, as sometimes the species taxonomies as well as the collection sites are not correctly identified. Still these models can support the transcription process in large collections. Our code and a graphical web application is publicly available under https://github.com/Atlas8008/herbarium_label_reader.
{"title":"Large language vision models for zero-shot handwriting recognition of historical herbarium labels","authors":"Matthias Körschens , Solveig Franziska Bucher , Christiane M. Ritz , Sebastian Gebauer , Jens Wesenberg , Christine Römermann","doi":"10.1016/j.ecoinf.2026.103656","DOIUrl":"10.1016/j.ecoinf.2026.103656","url":null,"abstract":"<div><div>Herbaria contain large numbers of conserved specimens with lots of information for biodiversity research, since they offer a track record of the morphology as well as temporal and spatial distribution of plant species worldwide. Besides the dried plant itself, a lot of additional information is usually provided with the herbarium specimens, typically captured in printed or handwritten labels, such as the date of collection, the location and the collector’s name. While, due to historical reasons, the specimens have been collected and labeled manually, considerable efforts are underway to digitize entire herbaria and therewith make the specimens available for analysis with automated methods. However, the extraction of information from handwritten labels is a considerable challenge, since the handwritings do not only differ from one collector to another, but they are also often in old types of writing (e.g., Sütterlin, an old German script). Therefore, they are often hard to decipher both manually and automatically, and barely any substantial consistent data of this kind exists to train state-of-the-art vision models. Since the location of the labels differs depending on the record, they need to be detected before the automated analysis of the writing, which also proved challenging in the past. In this work we show that state-of-the-art Large Language and Vision Models (LLVM) possess capabilities to extract such handwriting zero-shot, i.e., completely without training or fine-tuning, to a high degree of accuracy. Additionally, we show that the results can be refined and improved considerably by performing zero-shot detection of the labels beforehand. We evaluate our approach on two novel datasets, one containing handwritten and one printed labels, respectively, based on herbarium scans from the virtual herbarium of the flora of Germany. In our evaluations, the approaches achieve a mean similarity of 84.5% for handwritten, and one of 93.1% for printed labels. Thus, we conclude that still some evaluation is needed before the LLVMs can be fully applied to transcribe herbarium specimen labels, as sometimes the species taxonomies as well as the collection sites are not correctly identified. Still these models can support the transcription process in large collections. Our code and a graphical web application is publicly available under <span><span>https://github.com/Atlas8008/herbarium_label_reader</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":51024,"journal":{"name":"Ecological Informatics","volume":"94 ","pages":"Article 103656"},"PeriodicalIF":7.3,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146173984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}