An investigation of microbial groundwater contamination seasonality and extreme weather event interruptions using “big data”, time-series analyses, and unsupervised machine learning
Ioan Petculescu, Paul Hynds, R. Stephen Brown, Kevin McDermott, Anna Majury
{"title":"An investigation of microbial groundwater contamination seasonality and extreme weather event interruptions using “big data”, time-series analyses, and unsupervised machine learning","authors":"Ioan Petculescu, Paul Hynds, R. Stephen Brown, Kevin McDermott, Anna Majury","doi":"10.1016/j.envpol.2025.125790","DOIUrl":null,"url":null,"abstract":"Temporal studies of groundwater potability have historically focused on <em>E. coli</em> detection rates, with non-<em>E. coli</em> coliforms (NEC) and microbial concentrations remaining understudied by comparison. Additionally, “big data” (i.e., large, diverse datasets that grow over time) have yet to be employed for assessing the effects of high return-period extreme weather events on groundwater quality. The current investigation employed ≈1.1 million Ontarian private well samples collected between 2010 and 2021, seeking to address these knowledge gaps via applying time-series decomposition, interrupted time-series analysis (ITSA), and unsupervised machine learning to five microbial contamination parameters: <em>E. coli</em> and NEC concentrations (CFU/100 mL) and detection rates (%), and the calculated NEC:<em>E. coli</em> ratio. Time-series decompositions revealed <em>E. coli</em> concentrations and the NEC<em>:E. coli</em> ratio as complementary metrics, with concurrent interpretation of their seasonal signals indicating that localized contamination mechanisms dominate during winter months. ITSA findings highlighted the importance of hydrogeological time lags: for example, a significant <em>E. coli</em> detection rate increase (2.4% vs 1.8%, p = 0.02) was identified 12 weeks after the May 2017 flood event. Unsupervised machine learning spatially classified annual contamination cycles across Ontarian subregions (n = 27), with the highest inter-cluster variability identified among <em>E. coli</em> detection rates and the lowest among NEC detection rates and the NEC:<em>E. coli</em> ratio. Given the spatiotemporal consistency identified for NEC and the NEC:<em>E. coli</em> ratio, associated interpretations and recommendations are likely transferable across large, heterogeneous regions. The presented study may serve as a methodological blueprint for future temporal investigations employing “big” groundwater quality data.","PeriodicalId":311,"journal":{"name":"Environmental Pollution","volume":"11 1","pages":""},"PeriodicalIF":7.3000,"publicationDate":"2025-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Environmental Pollution","FirstCategoryId":"93","ListUrlMain":"https://doi.org/10.1016/j.envpol.2025.125790","RegionNum":2,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 0
Abstract
Temporal studies of groundwater potability have historically focused on E. coli detection rates, with non-E. coli coliforms (NEC) and microbial concentrations remaining understudied by comparison. Additionally, “big data” (i.e., large, diverse datasets that grow over time) have yet to be employed for assessing the effects of high return-period extreme weather events on groundwater quality. The current investigation employed ≈1.1 million Ontarian private well samples collected between 2010 and 2021, seeking to address these knowledge gaps via applying time-series decomposition, interrupted time-series analysis (ITSA), and unsupervised machine learning to five microbial contamination parameters: E. coli and NEC concentrations (CFU/100 mL) and detection rates (%), and the calculated NEC:E. coli ratio. Time-series decompositions revealed E. coli concentrations and the NEC:E. coli ratio as complementary metrics, with concurrent interpretation of their seasonal signals indicating that localized contamination mechanisms dominate during winter months. ITSA findings highlighted the importance of hydrogeological time lags: for example, a significant E. coli detection rate increase (2.4% vs 1.8%, p = 0.02) was identified 12 weeks after the May 2017 flood event. Unsupervised machine learning spatially classified annual contamination cycles across Ontarian subregions (n = 27), with the highest inter-cluster variability identified among E. coli detection rates and the lowest among NEC detection rates and the NEC:E. coli ratio. Given the spatiotemporal consistency identified for NEC and the NEC:E. coli ratio, associated interpretations and recommendations are likely transferable across large, heterogeneous regions. The presented study may serve as a methodological blueprint for future temporal investigations employing “big” groundwater quality data.
地下水可饮用性的时间研究历来集中在大肠杆菌的检出率上,而非大肠杆菌。大肠杆菌(NEC)和微生物浓度仍未进行比较研究。此外,“大数据”(即随时间增长的大型多样化数据集)尚未用于评估高回报期极端天气事件对地下水质量的影响。目前的调查使用了2010年至2021年间收集的约110万个安大略省私人井样本,试图通过应用时间序列分解、中断时间序列分析(ITSA)和无监督机器学习来解决这些知识缺口,这些参数包括大肠杆菌和NEC浓度(CFU/100 mL)、检出率(%)以及计算出的NEC:E。杆菌比例。时间序列分解显示大肠杆菌浓度和NEC:E。大肠杆菌比例作为补充指标,同时解释其季节性信号,表明局部污染机制在冬季占主导地位。ITSA的研究结果强调了水文地质时间滞后的重要性:例如,在2017年5月洪水事件发生12周后,大肠杆菌检出率显著增加(2.4% vs 1.8%, p = 0.02)。无监督机器学习对安大略省次区域的年度污染周期进行了空间分类(n = 27),其中大肠杆菌检出率的聚类间变异性最高,NEC检出率和NEC:E。杆菌比例。考虑到NEC和NEC的时空一致性:E。大肠杆菌比率,相关的解释和建议可能在大的异质区域之间转移。本研究可以作为未来使用“大”地下水质量数据进行时间调查的方法学蓝图。
期刊介绍:
Environmental Pollution is an international peer-reviewed journal that publishes high-quality research papers and review articles covering all aspects of environmental pollution and its impacts on ecosystems and human health.
Subject areas include, but are not limited to:
• Sources and occurrences of pollutants that are clearly defined and measured in environmental compartments, food and food-related items, and human bodies;
• Interlinks between contaminant exposure and biological, ecological, and human health effects, including those of climate change;
• Contaminants of emerging concerns (including but not limited to antibiotic resistant microorganisms or genes, microplastics/nanoplastics, electronic wastes, light, and noise) and/or their biological, ecological, or human health effects;
• Laboratory and field studies on the remediation/mitigation of environmental pollution via new techniques and with clear links to biological, ecological, or human health effects;
• Modeling of pollution processes, patterns, or trends that is of clear environmental and/or human health interest;
• New techniques that measure and examine environmental occurrences, transport, behavior, and effects of pollutants within the environment or the laboratory, provided that they can be clearly used to address problems within regional or global environmental compartments.