Multi-class random forest model to classify wastewater treatment imbalanced data

IF 6.2 2区 经济学 Q1 ECONOMICS Socio-economic Planning Sciences Pub Date : 2024-07-20 DOI:10.1016/j.seps.2024.102021
Veronica Distefano , Monica Palma , Sandra De Iaco
{"title":"Multi-class random forest model to classify wastewater treatment imbalanced data","authors":"Veronica Distefano ,&nbsp;Monica Palma ,&nbsp;Sandra De Iaco","doi":"10.1016/j.seps.2024.102021","DOIUrl":null,"url":null,"abstract":"<div><p>The odor emissions generated by treatment plants imply complex environmental and economic issues. The modern instrumental odor monitoring systems, based on an array of several sensors, continuously record the gaseous compounds. However they are characterized by poor selectivity, compromising the possibility to discriminate and identify the emission sources. In this paper, the ability of odor sensors to distinguish between the treatment plant sections generating the gaseous compounds is evaluated on the basis of the random forest classifier, and is also compared to the discriminant analysis performance. Taking into account that a multi-parametric system of sensors can be affected by the presence of a small sample size with imbalanced classes, several strategies for data balancing are proposed and analyzed. The findings show that the random forest classifier is characterized by a better capacity to distinguish the emissions sources with respect to the classical multiple discriminant analysis, in terms of all evaluation metrics. This is also confirmed for different resampling techniques, especially in the over-sampling case. The data concerning measurements from 10 sensors of multi-parametric systems of odor monitoring collected from a company specialized in environmental assistance are considered for this analysis.</p></div>","PeriodicalId":22033,"journal":{"name":"Socio-economic Planning Sciences","volume":"95 ","pages":"Article 102021"},"PeriodicalIF":6.2000,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0038012124002209/pdfft?md5=ba8e1184f47c2ae26d0fb1d843243021&pid=1-s2.0-S0038012124002209-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Socio-economic Planning Sciences","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0038012124002209","RegionNum":2,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0

Abstract

The odor emissions generated by treatment plants imply complex environmental and economic issues. The modern instrumental odor monitoring systems, based on an array of several sensors, continuously record the gaseous compounds. However they are characterized by poor selectivity, compromising the possibility to discriminate and identify the emission sources. In this paper, the ability of odor sensors to distinguish between the treatment plant sections generating the gaseous compounds is evaluated on the basis of the random forest classifier, and is also compared to the discriminant analysis performance. Taking into account that a multi-parametric system of sensors can be affected by the presence of a small sample size with imbalanced classes, several strategies for data balancing are proposed and analyzed. The findings show that the random forest classifier is characterized by a better capacity to distinguish the emissions sources with respect to the classical multiple discriminant analysis, in terms of all evaluation metrics. This is also confirmed for different resampling techniques, especially in the over-sampling case. The data concerning measurements from 10 sensors of multi-parametric systems of odor monitoring collected from a company specialized in environmental assistance are considered for this analysis.

查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
对污水处理不平衡数据进行分类的多类随机森林模型
污水处理厂产生的臭气排放会带来复杂的环境和经济问题。现代仪器气味监测系统以多个传感器阵列为基础,可持续记录气态化合物。然而,它们的特点是选择性差,影响了区分和识别排放源的可能性。本文在随机森林分类器的基础上,对气味传感器区分产生气体化合物的处理厂部分的能力进行了评估,并与判别分析性能进行了比较。考虑到多参数传感器系统可能会受到小样本量和不平衡类别的影响,提出并分析了几种数据平衡策略。研究结果表明,与经典的多重判别分析相比,随机森林分类器在所有评价指标方面都具有更好的排放源判别能力。不同的重采样技术也证实了这一点,尤其是在过度采样的情况下。本分析考虑了从一家专门从事环境援助的公司收集的 10 个多参数气味监测系统传感器的测量数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Socio-economic Planning Sciences
Socio-economic Planning Sciences OPERATIONS RESEARCH & MANAGEMENT SCIENCE-
CiteScore
9.40
自引率
13.10%
发文量
294
审稿时长
58 days
期刊介绍: Studies directed toward the more effective utilization of existing resources, e.g. mathematical programming models of health care delivery systems with relevance to more effective program design; systems analysis of fire outbreaks and its relevance to the location of fire stations; statistical analysis of the efficiency of a developing country economy or industry. Studies relating to the interaction of various segments of society and technology, e.g. the effects of government health policies on the utilization and design of hospital facilities; the relationship between housing density and the demands on public transportation or other service facilities: patterns and implications of urban development and air or water pollution. Studies devoted to the anticipations of and response to future needs for social, health and other human services, e.g. the relationship between industrial growth and the development of educational resources in affected areas; investigation of future demands for material and child health resources in a developing country; design of effective recycling in an urban setting.
期刊最新文献
Low-carbon route optimization model for multimodal freight transport considering value and time attributes Measurement and comparison of different dimensions of renewable energy policy implementation in the agricultural sector A Kansei engineering-based decision-making method for offline medical service quality evaluation with multidimensional attributes Investigating water sustainability towards indicators: An empirical illustration using country-level data What about QR codes on wine bottles? A statistical analysis of technology's influence on purchase decisions among Italian wine consumers
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1