An investigation of microbial groundwater contamination seasonality and extreme weather event interruptions using “big data”, time-series analyses, and unsupervised machine learning
Ioan Petculescu, Paul Hynds, R. Stephen Brown, Kevin McDermott, Anna Majury
{"title":"An investigation of microbial groundwater contamination seasonality and extreme weather event interruptions using “big data”, time-series analyses, and unsupervised machine learning","authors":"Ioan Petculescu, Paul Hynds, R. Stephen Brown, Kevin McDermott, Anna Majury","doi":"10.1016/j.envpol.2025.125790","DOIUrl":null,"url":null,"abstract":"Temporal studies of groundwater potability have historically focused on <em>E. coli</em> detection rates, with non-<em>E. coli</em> coliforms (NEC) and microbial concentrations remaining understudied by comparison. Additionally, “big data” (i.e., large, diverse datasets that grow over time) have yet to be employed for assessing the effects of high return-period extreme weather events on groundwater quality. The current investigation employed ≈1.1 million Ontarian private well samples collected between 2010 and 2021, seeking to address these knowledge gaps via applying time-series decomposition, interrupted time-series analysis (ITSA), and unsupervised machine learning to five microbial contamination parameters: <em>E. coli</em> and NEC concentrations (CFU/100 mL) and detection rates (%), and the calculated NEC:<em>E. coli</em> ratio. Time-series decompositions revealed <em>E. coli</em> concentrations and the NEC<em>:E. coli</em> ratio as complementary metrics, with concurrent interpretation of their seasonal signals indicating that localized contamination mechanisms dominate during winter months. ITSA findings highlighted the importance of hydrogeological time lags: for example, a significant <em>E. coli</em> detection rate increase (2.4% vs 1.8%, p = 0.02) was identified 12 weeks after the May 2017 flood event. Unsupervised machine learning spatially classified annual contamination cycles across Ontarian subregions (n = 27), with the highest inter-cluster variability identified among <em>E. coli</em> detection rates and the lowest among NEC detection rates and the NEC:<em>E. coli</em> ratio. Given the spatiotemporal consistency identified for NEC and the NEC:<em>E. coli</em> ratio, associated interpretations and recommendations are likely transferable across large, heterogeneous regions. The presented study may serve as a methodological blueprint for future temporal investigations employing “big” groundwater quality data.","PeriodicalId":311,"journal":{"name":"Environmental Pollution","volume":"11 1","pages":""},"PeriodicalIF":7.6000,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Pollution","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1016/j.envpol.2025.125790","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Temporal studies of groundwater potability have historically focused on E. coli detection rates, with non-E. coli coliforms (NEC) and microbial concentrations remaining understudied by comparison. Additionally, “big data” (i.e., large, diverse datasets that grow over time) have yet to be employed for assessing the effects of high return-period extreme weather events on groundwater quality. The current investigation employed ≈1.1 million Ontarian private well samples collected between 2010 and 2021, seeking to address these knowledge gaps via applying time-series decomposition, interrupted time-series analysis (ITSA), and unsupervised machine learning to five microbial contamination parameters: E. coli and NEC concentrations (CFU/100 mL) and detection rates (%), and the calculated NEC:E. coli ratio. Time-series decompositions revealed E. coli concentrations and the NEC:E. coli ratio as complementary metrics, with concurrent interpretation of their seasonal signals indicating that localized contamination mechanisms dominate during winter months. ITSA findings highlighted the importance of hydrogeological time lags: for example, a significant E. coli detection rate increase (2.4% vs 1.8%, p = 0.02) was identified 12 weeks after the May 2017 flood event. Unsupervised machine learning spatially classified annual contamination cycles across Ontarian subregions (n = 27), with the highest inter-cluster variability identified among E. coli detection rates and the lowest among NEC detection rates and the NEC:E. coli ratio. Given the spatiotemporal consistency identified for NEC and the NEC:E. coli ratio, associated interpretations and recommendations are likely transferable across large, heterogeneous regions. The presented study may serve as a methodological blueprint for future temporal investigations employing “big” groundwater quality data.
期刊介绍:
Environmental Pollution is an international peer-reviewed journal that publishes high-quality research papers and review articles covering all aspects of environmental pollution and its impacts on ecosystems and human health.
Subject areas include, but are not limited to:
• Sources and occurrences of pollutants that are clearly defined and measured in environmental compartments, food and food-related items, and human bodies;
• Interlinks between contaminant exposure and biological, ecological, and human health effects, including those of climate change;
• Contaminants of emerging concerns (including but not limited to antibiotic resistant microorganisms or genes, microplastics/nanoplastics, electronic wastes, light, and noise) and/or their biological, ecological, or human health effects;
• Laboratory and field studies on the remediation/mitigation of environmental pollution via new techniques and with clear links to biological, ecological, or human health effects;
• Modeling of pollution processes, patterns, or trends that is of clear environmental and/or human health interest;
• New techniques that measure and examine environmental occurrences, transport, behavior, and effects of pollutants within the environment or the laboratory, provided that they can be clearly used to address problems within regional or global environmental compartments.