Veronica Distefano , Monica Palma , Sandra De Iaco
{"title":"Multi-class random forest model to classify wastewater treatment imbalanced data","authors":"Veronica Distefano , Monica Palma , Sandra De Iaco","doi":"10.1016/j.seps.2024.102021","DOIUrl":null,"url":null,"abstract":"<div><p>The odor emissions generated by treatment plants imply complex environmental and economic issues. The modern instrumental odor monitoring systems, based on an array of several sensors, continuously record the gaseous compounds. However they are characterized by poor selectivity, compromising the possibility to discriminate and identify the emission sources. In this paper, the ability of odor sensors to distinguish between the treatment plant sections generating the gaseous compounds is evaluated on the basis of the random forest classifier, and is also compared to the discriminant analysis performance. Taking into account that a multi-parametric system of sensors can be affected by the presence of a small sample size with imbalanced classes, several strategies for data balancing are proposed and analyzed. The findings show that the random forest classifier is characterized by a better capacity to distinguish the emissions sources with respect to the classical multiple discriminant analysis, in terms of all evaluation metrics. This is also confirmed for different resampling techniques, especially in the over-sampling case. The data concerning measurements from 10 sensors of multi-parametric systems of odor monitoring collected from a company specialized in environmental assistance are considered for this analysis.</p></div>","PeriodicalId":22033,"journal":{"name":"Socio-economic Planning Sciences","volume":"95 ","pages":"Article 102021"},"PeriodicalIF":6.2000,"publicationDate":"2024-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0038012124002209/pdfft?md5=ba8e1184f47c2ae26d0fb1d843243021&pid=1-s2.0-S0038012124002209-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Socio-economic Planning Sciences","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0038012124002209","RegionNum":2,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0
Abstract
The odor emissions generated by treatment plants imply complex environmental and economic issues. The modern instrumental odor monitoring systems, based on an array of several sensors, continuously record the gaseous compounds. However they are characterized by poor selectivity, compromising the possibility to discriminate and identify the emission sources. In this paper, the ability of odor sensors to distinguish between the treatment plant sections generating the gaseous compounds is evaluated on the basis of the random forest classifier, and is also compared to the discriminant analysis performance. Taking into account that a multi-parametric system of sensors can be affected by the presence of a small sample size with imbalanced classes, several strategies for data balancing are proposed and analyzed. The findings show that the random forest classifier is characterized by a better capacity to distinguish the emissions sources with respect to the classical multiple discriminant analysis, in terms of all evaluation metrics. This is also confirmed for different resampling techniques, especially in the over-sampling case. The data concerning measurements from 10 sensors of multi-parametric systems of odor monitoring collected from a company specialized in environmental assistance are considered for this analysis.
期刊介绍:
Studies directed toward the more effective utilization of existing resources, e.g. mathematical programming models of health care delivery systems with relevance to more effective program design; systems analysis of fire outbreaks and its relevance to the location of fire stations; statistical analysis of the efficiency of a developing country economy or industry.
Studies relating to the interaction of various segments of society and technology, e.g. the effects of government health policies on the utilization and design of hospital facilities; the relationship between housing density and the demands on public transportation or other service facilities: patterns and implications of urban development and air or water pollution.
Studies devoted to the anticipations of and response to future needs for social, health and other human services, e.g. the relationship between industrial growth and the development of educational resources in affected areas; investigation of future demands for material and child health resources in a developing country; design of effective recycling in an urban setting.