Comparing Regression Models with Count Data to Artificial Neural Network and Ensemble Models for Prediction of Generic Escherichia coli Population in Agricultural Ponds Based on Weather Station Measurements

IF 3 4区 环境科学与生态学 Q2 ENVIRONMENTAL SCIENCES Microbial Risk Analysis Pub Date : 2021-12-01 DOI:10.1016/j.mran.2021.100171
Gonca Buyrukoğlu , Selim Buyrukoğlu , Zeynal Topalcengiz
{"title":"Comparing Regression Models with Count Data to Artificial Neural Network and Ensemble Models for Prediction of Generic Escherichia coli Population in Agricultural Ponds Based on Weather Station Measurements","authors":"Gonca Buyrukoğlu ,&nbsp;Selim Buyrukoğlu ,&nbsp;Zeynal Topalcengiz","doi":"10.1016/j.mran.2021.100171","DOIUrl":null,"url":null,"abstract":"<div><p><span>Indicator microorganisms are monitored in agricultural waters to foster produce safety. Various prediction models are used to estimate the population of indicator microorganisms and pathogens when no observation is available. The purpose of this study was to compare the performance of regression models with count data (zero-inflated Poisson and hurdle negative binomial) to artificial neural network and ensemble models (random forest and AdaBoost) for the prediction of generic </span><em>Escherichia coli</em> population in agricultural surface waters in relation with weather station measurements. Two-part count data models were built on <em>E. coli</em> population count frequencies (0, [1,10), [10,100), [100,1000), [1000, 10000), (&gt;=10000)) based on the data structure. The use of artificial neural network, AdaBoost, and random forest were determined based on the mean absolute error (MAE) value over pre-tested six models. The MAE was also used to compare the performance of two-part count data models with artificial neural network and ensemble models. Over-dispersed <em>E. coli</em> population count frequencies was calculated between 2.2 and 52.2% for all ponds. Observed and predicted zero <em>E. coli</em> population counts for all ponds were matched from 82 to 100% for zero-inflated Poisson and 100% for hurdle negative binomial regression models. Overdispersion reduced the performance of tested models. AdaBoost-Twelve Estimators had the best performance with the lowest MAE values for all ponds (from 0.87 to 46.60). The ensemble models used in this study provided more promising performance when compared to tested regression models with count data.</p></div>","PeriodicalId":48593,"journal":{"name":"Microbial Risk Analysis","volume":null,"pages":null},"PeriodicalIF":3.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1016/j.mran.2021.100171","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microbial Risk Analysis","FirstCategoryId":"93","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S235235222100013X","RegionNum":4,"RegionCategory":"环境科学与生态学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENVIRONMENTAL SCIENCES","Score":null,"Total":0}
引用次数: 15

Abstract

Indicator microorganisms are monitored in agricultural waters to foster produce safety. Various prediction models are used to estimate the population of indicator microorganisms and pathogens when no observation is available. The purpose of this study was to compare the performance of regression models with count data (zero-inflated Poisson and hurdle negative binomial) to artificial neural network and ensemble models (random forest and AdaBoost) for the prediction of generic Escherichia coli population in agricultural surface waters in relation with weather station measurements. Two-part count data models were built on E. coli population count frequencies (0, [1,10), [10,100), [100,1000), [1000, 10000), (>=10000)) based on the data structure. The use of artificial neural network, AdaBoost, and random forest were determined based on the mean absolute error (MAE) value over pre-tested six models. The MAE was also used to compare the performance of two-part count data models with artificial neural network and ensemble models. Over-dispersed E. coli population count frequencies was calculated between 2.2 and 52.2% for all ponds. Observed and predicted zero E. coli population counts for all ponds were matched from 82 to 100% for zero-inflated Poisson and 100% for hurdle negative binomial regression models. Overdispersion reduced the performance of tested models. AdaBoost-Twelve Estimators had the best performance with the lowest MAE values for all ponds (from 0.87 to 46.60). The ensemble models used in this study provided more promising performance when compared to tested regression models with count data.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
基于气象站数据的农业池塘一般大肠杆菌种群预测与人工神经网络和集合模型的比较
监测农业用水中的指示微生物,以促进生产安全。在没有观测资料的情况下,使用各种预测模型来估计指示微生物和病原体的种群。本研究的目的是比较使用计数数据(零膨胀泊松和障碍负二项)的回归模型与人工神经网络和集合模型(随机森林和AdaBoost)的性能,以预测与气象站测量数据相关的农业地表水中的一般大肠杆菌种群。基于数据结构,以大肠杆菌种群计数频率(0,[1,10),[10,100),[100,1000),[1000,10000),(>=10000))为基础,建立两部分计数数据模型。根据预先测试的六个模型的平均绝对误差(MAE)值确定人工神经网络、AdaBoost和随机森林的使用。MAE还用于比较两部分计数数据模型与人工神经网络和集成模型的性能。所有池塘的过度分散大肠杆菌种群计数频率在2.2 ~ 52.2%之间。观察到的和预测的所有池塘的大肠杆菌种群数为零,在零膨胀泊松模型中为82% - 100%,在跨栏负二项回归模型中为100%。过度分散降低了测试模型的性能。adaboost - 12 Estimators在所有池塘中表现最好,MAE值最低(从0.87到46.60)。与已测试的计数回归模型相比,本研究中使用的集成模型提供了更有希望的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Microbial Risk Analysis
Microbial Risk Analysis Medicine-Microbiology (medical)
CiteScore
5.70
自引率
7.10%
发文量
28
审稿时长
52 days
期刊介绍: The journal Microbial Risk Analysis accepts articles dealing with the study of risk analysis applied to microbial hazards. Manuscripts should at least cover any of the components of risk assessment (risk characterization, exposure assessment, etc.), risk management and/or risk communication in any microbiology field (clinical, environmental, food, veterinary, etc.). This journal also accepts article dealing with predictive microbiology, quantitative microbial ecology, mathematical modeling, risk studies applied to microbial ecology, quantitative microbiology for epidemiological studies, statistical methods applied to microbiology, and laws and regulatory policies aimed at lessening the risk of microbial hazards. Work focusing on risk studies of viruses, parasites, microbial toxins, antimicrobial resistant organisms, genetically modified organisms (GMOs), and recombinant DNA products are also acceptable.
期刊最新文献
Biothermodynamic analysis of the Dengue virus: Empirical formulas, biosynthesis reactions and thermodynamic properties of antigen-receptor binding and biosynthesis Harmonizing Campylobacter risk assessments across European countries – can the pooled process hygiene criteria data be used in the Danish risk assessment model? An approach to risk categorization of Products of Animal Origin imported into the United Kingdom Risk of BSE transmission when fishmeal derived from fish fed bovine spray-dried red blood cells is included in calf milk replacers A procedure for surveillance data-driven risk assessment to inform Campylobacter risk-based control
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1