Pub Date : 2022-11-01DOI: 10.1109/ICDMW58026.2022.00043
Elizabeth D. Hathaway, R. Hathaway
The iVAT (improved Visual Assessment of cluster Tendency) image is a useful tool for assessing possible cluster structure in an unlabeled, numerical data set. If labeled data are available then it is sometimes helpful to determine how closely the (unlabeled) data clusters agree with the data partitioning based on the labels. In this note the DCiVAT (Diagonally Colorized iVAT) image is introduced for the case of labeled data. It incorporates all available data and label information into a single colorized iVAT image so that it is possible to visually assess the degree to which data clusters are aligned with label categories. The new approach is illustrated with several examples.
{"title":"Diagonally Colorized iVAT Images for Labeled Data","authors":"Elizabeth D. Hathaway, R. Hathaway","doi":"10.1109/ICDMW58026.2022.00043","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00043","url":null,"abstract":"The iVAT (improved Visual Assessment of cluster Tendency) image is a useful tool for assessing possible cluster structure in an unlabeled, numerical data set. If labeled data are available then it is sometimes helpful to determine how closely the (unlabeled) data clusters agree with the data partitioning based on the labels. In this note the DCiVAT (Diagonally Colorized iVAT) image is introduced for the case of labeled data. It incorporates all available data and label information into a single colorized iVAT image so that it is possible to visually assess the degree to which data clusters are aligned with label categories. The new approach is illustrated with several examples.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133342849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1109/ICDMW58026.2022.00010
Niklas Strauß, Max Berrendorf, Tom Haider, M. Schubert
Modern Emergency Medical Services (EMS) benefit from real-time sensor information in various ways as they provide up-to-date location information and help assess current local emergency risks. A critical part of EMS is dynamic ambulance redeployment, i.e., the task of assigning idle ambulances to base stations throughout a community. Although there has been a considerable effort on methods to optimize emergency response systems, a comparison of proposed methods is generally difficult as reported results are mostly based on artificial and proprietary test beds. In this paper, we present a benchmark simulation environment for dynamic ambulance redeployment based on real emergency data from the city of San Francisco. Our proposed simulation environment is highly scalable and is compatible with modern reinforcement learning frameworks. We provide a comparative study of several state-of-the-art methods for various metrics. Results indicate that even simple baseline algorithms can perform considerably well in close-to-realistic settings. The code of our simulator is openly available at https://github.com/niklasdbs/ambusim.
{"title":"A Comparison of Ambulance Redeployment Systems on Real-World Data","authors":"Niklas Strauß, Max Berrendorf, Tom Haider, M. Schubert","doi":"10.1109/ICDMW58026.2022.00010","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00010","url":null,"abstract":"Modern Emergency Medical Services (EMS) benefit from real-time sensor information in various ways as they provide up-to-date location information and help assess current local emergency risks. A critical part of EMS is dynamic ambulance redeployment, i.e., the task of assigning idle ambulances to base stations throughout a community. Although there has been a considerable effort on methods to optimize emergency response systems, a comparison of proposed methods is generally difficult as reported results are mostly based on artificial and proprietary test beds. In this paper, we present a benchmark simulation environment for dynamic ambulance redeployment based on real emergency data from the city of San Francisco. Our proposed simulation environment is highly scalable and is compatible with modern reinforcement learning frameworks. We provide a comparative study of several state-of-the-art methods for various metrics. Results indicate that even simple baseline algorithms can perform considerably well in close-to-realistic settings. The code of our simulator is openly available at https://github.com/niklasdbs/ambusim.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133739470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1109/ICDMW58026.2022.00061
A. Choudhary, E. Cambria
With social media pervading all aspects of our life, the opinions expressed by netizens are a gold mine ready to be exploited in a meaningful way to influence all major public do-mains. Sentiment analysis is a way to interpret this unstructured data using AI tools. It is a well-known fact that there has been a 'Zoom Boom’ in the field of aesthetic plastic surgery due to the COVID-19 pandemic and the same has put the focus of attention sharply on our appearance. Polarity detection of tweets published on popular aesthetic plastic surgery procedures before and after the onset of COVID can provide great insights for aesthetic plastic surgeons and the health industry at large. In this work, we develop an end-to-end system for the sentiment analysis of such tweets incorporating a state-of-the-art fine-tuned deep learning model, an ingenious 'keyword search and filter approach’ and SenticNet. Our system was tested on a large database of 196,900 tweets and the results were visualized using affectively correct word clouds and also subjected to rigorous statistical hypothesis testing to draw meaningful inferences. The results showed a high level of statistical significance.
{"title":"Making Sense of Sentiments for Aesthetic Plastic Surgery","authors":"A. Choudhary, E. Cambria","doi":"10.1109/ICDMW58026.2022.00061","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00061","url":null,"abstract":"With social media pervading all aspects of our life, the opinions expressed by netizens are a gold mine ready to be exploited in a meaningful way to influence all major public do-mains. Sentiment analysis is a way to interpret this unstructured data using AI tools. It is a well-known fact that there has been a 'Zoom Boom’ in the field of aesthetic plastic surgery due to the COVID-19 pandemic and the same has put the focus of attention sharply on our appearance. Polarity detection of tweets published on popular aesthetic plastic surgery procedures before and after the onset of COVID can provide great insights for aesthetic plastic surgeons and the health industry at large. In this work, we develop an end-to-end system for the sentiment analysis of such tweets incorporating a state-of-the-art fine-tuned deep learning model, an ingenious 'keyword search and filter approach’ and SenticNet. Our system was tested on a large database of 196,900 tweets and the results were visualized using affectively correct word clouds and also subjected to rigorous statistical hypothesis testing to draw meaningful inferences. The results showed a high level of statistical significance.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"42 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131449944","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1109/ICDMW58026.2022.00077
J. Wu, Shuo Liu, Jerry Chun‐wei Lin
High utility sequential pattern mining (HUSPM) considers timestamp, internal quantization, and external utility factors to mine high utility sequential patterns (HUSP), which has taken an essential place in data mining. The data collection may be uncertain in real life due to environmental factors, equipment limitations, privacy issues, etc. With the rapid increase of uncertain data volume, the efficiency of traditional mining algorithms decreases seriously. When the data volume is large, the conventional stand-alone algorithm will generate more candidate sequences, occupy a lot of memory, and significantly affect the execution speed. This paper designs a high utility probability sequence pattern mining algorithm based on MapReduce. The algorithm utilizes the MapReduce framework to solve the bottleneck of single-computer operation when the data volume is too large. The algorithm adopts an effective pruning strategy, which can effectively handle and reduce the number of candidate itemsets generated, thus the performance of the designed model can be greatly improved. The performance of the proposed algorithm is verified experimentally, and the correctness and completeness of the proposed algorithm are demonstrated and discussed to show the great achievement of the designed model.
{"title":"Large-Scale Sequential Utility Pattern Mining in Uncertain Environments","authors":"J. Wu, Shuo Liu, Jerry Chun‐wei Lin","doi":"10.1109/ICDMW58026.2022.00077","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00077","url":null,"abstract":"High utility sequential pattern mining (HUSPM) considers timestamp, internal quantization, and external utility factors to mine high utility sequential patterns (HUSP), which has taken an essential place in data mining. The data collection may be uncertain in real life due to environmental factors, equipment limitations, privacy issues, etc. With the rapid increase of uncertain data volume, the efficiency of traditional mining algorithms decreases seriously. When the data volume is large, the conventional stand-alone algorithm will generate more candidate sequences, occupy a lot of memory, and significantly affect the execution speed. This paper designs a high utility probability sequence pattern mining algorithm based on MapReduce. The algorithm utilizes the MapReduce framework to solve the bottleneck of single-computer operation when the data volume is too large. The algorithm adopts an effective pruning strategy, which can effectively handle and reduce the number of candidate itemsets generated, thus the performance of the designed model can be greatly improved. The performance of the proposed algorithm is verified experimentally, and the correctness and completeness of the proposed algorithm are demonstrated and discussed to show the great achievement of the designed model.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132924428","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1109/ICDMW58026.2022.00139
Timon Sachweh, Daniel Boiar, T. Liebig
Data privacy and decentralised data collection has become more and more popular in recent years. In order to solve issues with privacy, communication bandwidth and learning from spatio-temporal data, we will propose two efficient models which use Differential Privacy and decentralized LSTM-Learning: One, in which a Long Short Term Memory (LSTM) model is learned for extracting local temporal node constraints and feeding them into a Dense-Layer (LabeIProportionToLocal). The other approach extends the first one by fetching histogram data from the neighbors and joining the information with the LSTM output (LabeIProportionToDense). For evaluation two popular datasets are used: Pems-Bay and METR-LA. Additionally, we provide an own dataset, which is based on LuST. The evaluation will show the tradeoff between performance and data privacy.
{"title":"Distributed LSTM-Learning from Differentially Private Label Proportions","authors":"Timon Sachweh, Daniel Boiar, T. Liebig","doi":"10.1109/ICDMW58026.2022.00139","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00139","url":null,"abstract":"Data privacy and decentralised data collection has become more and more popular in recent years. In order to solve issues with privacy, communication bandwidth and learning from spatio-temporal data, we will propose two efficient models which use Differential Privacy and decentralized LSTM-Learning: One, in which a Long Short Term Memory (LSTM) model is learned for extracting local temporal node constraints and feeding them into a Dense-Layer (LabeIProportionToLocal). The other approach extends the first one by fetching histogram data from the neighbors and joining the information with the LSTM output (LabeIProportionToDense). For evaluation two popular datasets are used: Pems-Bay and METR-LA. Additionally, we provide an own dataset, which is based on LuST. The evaluation will show the tradeoff between performance and data privacy.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133036493","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1109/ICDMW58026.2022.00038
Rodrigo Fay Verqara, Paulo Henrique dos Santos, Guilherme Fay Verqara, Fábio L. L. Mendonça, C. E. L. Veiga, B. Praciano, Daniel Alves da Silva, Rafael Timóteo de Sousa Júnior
This article presents a study of an automatic speech recognition system in Portuguese applied to videos by the General Attorney of the Union of Brazil. As they are confidential videos, using proprietary software from large companies is not allowed for security reasons. Thus, constructing an artificial intelligence model capable of performing automatic speech recognition in Portuguese in the judicial context and making this model available for large-scale inference is critical to maintaining data security. For this purpose, a dataset in Brazilian Portuguese was used by a combination of 3 datasets already built. The system used TDNN Jasper and QuartzNet architectures for network training, obtaining promising preliminary results, having a word error rate (WER) of 56% without using a linguistic model.
{"title":"A study of automatic speech recognition in Portuguese by the Brazilian General Attorney of the Union","authors":"Rodrigo Fay Verqara, Paulo Henrique dos Santos, Guilherme Fay Verqara, Fábio L. L. Mendonça, C. E. L. Veiga, B. Praciano, Daniel Alves da Silva, Rafael Timóteo de Sousa Júnior","doi":"10.1109/ICDMW58026.2022.00038","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00038","url":null,"abstract":"This article presents a study of an automatic speech recognition system in Portuguese applied to videos by the General Attorney of the Union of Brazil. As they are confidential videos, using proprietary software from large companies is not allowed for security reasons. Thus, constructing an artificial intelligence model capable of performing automatic speech recognition in Portuguese in the judicial context and making this model available for large-scale inference is critical to maintaining data security. For this purpose, a dataset in Brazilian Portuguese was used by a combination of 3 datasets already built. The system used TDNN Jasper and QuartzNet architectures for network training, obtaining promising preliminary results, having a word error rate (WER) of 56% without using a linguistic model.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132387894","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1109/ICDMW58026.2022.00145
Siyan Liu, Dawei Lu, D. Ricciuto, A. Walker
Terrestrial ecosystems play a central role in the global carbon cycle and affect climate change. However, our predictive understanding of these systems is still limited due to their complexity and uncertainty about how key drivers and their legacy effects influence carbon fluxes. Here, we propose an interpretable Long Short-Term Memory (iLSTM) network for predicting net ecosystem CO2 exchange (NEE) and interpreting the influence on the NEE prediction from environmental drivers and their memory effects. We consider five drivers and apply the method to three forest sites in the United States. Besides performing the prediction in each site, we also conduct transfer learning by using the iLSTM model trained in one site to predict at other sites. Results show that the iLSTM model produces good NEE predictions for all three sites and, more importantly, it provides reasonable interpretations on the input driver's importance as well as their temporal importance on the NEE prediction. Additionally, the iLSTM model demonstrates good across-site transferability in terms of both prediction accuracy and interpretability. The transferability can improve the NEE prediction in unobserved forest sites, and the interpretability advances our predictive understanding and guides process-based model development.
{"title":"Improving net ecosystem CO2 flux prediction using memory-based interpretable machine learning","authors":"Siyan Liu, Dawei Lu, D. Ricciuto, A. Walker","doi":"10.1109/ICDMW58026.2022.00145","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00145","url":null,"abstract":"Terrestrial ecosystems play a central role in the global carbon cycle and affect climate change. However, our predictive understanding of these systems is still limited due to their complexity and uncertainty about how key drivers and their legacy effects influence carbon fluxes. Here, we propose an interpretable Long Short-Term Memory (iLSTM) network for predicting net ecosystem CO2 exchange (NEE) and interpreting the influence on the NEE prediction from environmental drivers and their memory effects. We consider five drivers and apply the method to three forest sites in the United States. Besides performing the prediction in each site, we also conduct transfer learning by using the iLSTM model trained in one site to predict at other sites. Results show that the iLSTM model produces good NEE predictions for all three sites and, more importantly, it provides reasonable interpretations on the input driver's importance as well as their temporal importance on the NEE prediction. Additionally, the iLSTM model demonstrates good across-site transferability in terms of both prediction accuracy and interpretability. The transferability can improve the NEE prediction in unobserved forest sites, and the interpretability advances our predictive understanding and guides process-based model development.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"R-30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126631298","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1109/ICDMW58026.2022.00067
Penghao Jiang, Ke Xin, Chunxi Li
This paper studies the data sparsity problem in multi-view learning. To solve data sparsity problem in multi-view ratings, we propose a generic architecture of deep transfer tensor factorization (DTTF) by integrating deep learning and cross-domain tensor factorization, where the side information is embedded to provide effective compensation for the tensor sparsity. Then we exhibit instantiation of our architecture by combining stacked denoising autoencoder (SDAE) and CANDE-COMPIPARAFAC (CP) tensor factorization in both source and target domains, where the side information of both users and items is tightly coupled with the sparse multi-view ratings and the latent factors are learned based on the joint optimization. We tightly couple the multi-view ratings and the side information to improve cross-domain tensor factorization based recommendations. Experimental results on real-world datasets demonstrate that our DTTF schemes outperform state-of-the-art methods on multi-view rating predictions.
研究了多视图学习中的数据稀疏性问题。为了解决多视图评级中的数据稀疏性问题,我们提出了一种融合深度学习和跨域张量分解的深度传递张量分解(deep transfer tensor factorization, DTTF)通用架构,其中嵌入了侧信息,为张量稀疏性提供了有效的补偿。然后,我们在源域和目标域中结合堆栈去噪自编码器(SDAE)和CANDE-COMPIPARAFAC (CP)张量分解,展示了我们的架构实例化,其中用户和项目的侧信息与稀疏多视图评级紧密耦合,并基于联合优化学习潜在因素。我们将多视图评分和侧信息紧密耦合,以改进基于跨域张量分解的推荐。在真实数据集上的实验结果表明,我们的DTTF方案在多视图评级预测方面优于最先进的方法。
{"title":"Deep Transfer Tensor Factorization for Multi-View Learning","authors":"Penghao Jiang, Ke Xin, Chunxi Li","doi":"10.1109/ICDMW58026.2022.00067","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00067","url":null,"abstract":"This paper studies the data sparsity problem in multi-view learning. To solve data sparsity problem in multi-view ratings, we propose a generic architecture of deep transfer tensor factorization (DTTF) by integrating deep learning and cross-domain tensor factorization, where the side information is embedded to provide effective compensation for the tensor sparsity. Then we exhibit instantiation of our architecture by combining stacked denoising autoencoder (SDAE) and CANDE-COMPIPARAFAC (CP) tensor factorization in both source and target domains, where the side information of both users and items is tightly coupled with the sparse multi-view ratings and the latent factors are learned based on the joint optimization. We tightly couple the multi-view ratings and the side information to improve cross-domain tensor factorization based recommendations. Experimental results on real-world datasets demonstrate that our DTTF schemes outperform state-of-the-art methods on multi-view rating predictions.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127074551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1109/ICDMW58026.2022.00085
Sajib K. Biswas, Pranab K. Muhuri, Uttam K. Roy
For many machine learning methods, while dealing with problems such as classification, clustering, prediction, and association rule mining, counting the occurrences of given queries plays a crucial role. However, these methods, which usually function in two different steps, i.e., learning and sampling, become impractical for large datasets due to computational costs or excessive memory consumption. Therefore, this paper proposes a novel approach to handle the counting queries. The proposed method is an adaptive archive-based method that offers efficient archiving with reduced computational time and moderate mem-ory requirements. We conduct numerous experiments to show the performance and scalability of the proposed approach on random queries, learning probabilistic networks, and association rule mining. From experimental results, we see that our proposed method outperforms the previously proposed ADtree, Bitmap and Radix strategies when applied to the datasets with higher dimensions and a large set of observations.
{"title":"AARS: A novel adaptive archive-based efficient counting method for machine learning applications","authors":"Sajib K. Biswas, Pranab K. Muhuri, Uttam K. Roy","doi":"10.1109/ICDMW58026.2022.00085","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00085","url":null,"abstract":"For many machine learning methods, while dealing with problems such as classification, clustering, prediction, and association rule mining, counting the occurrences of given queries plays a crucial role. However, these methods, which usually function in two different steps, i.e., learning and sampling, become impractical for large datasets due to computational costs or excessive memory consumption. Therefore, this paper proposes a novel approach to handle the counting queries. The proposed method is an adaptive archive-based method that offers efficient archiving with reduced computational time and moderate mem-ory requirements. We conduct numerous experiments to show the performance and scalability of the proposed approach on random queries, learning probabilistic networks, and association rule mining. From experimental results, we see that our proposed method outperforms the previously proposed ADtree, Bitmap and Radix strategies when applied to the datasets with higher dimensions and a large set of observations.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"35 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126097048","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2022-11-01DOI: 10.1109/ICDMW58026.2022.00022
José Alberto Sousa Torres, Paulo Henrique dos Santos, Daniel Alves da Silva, C. E. L. Veiga, Márcio Bastos Medeiros, Guilherme Fay Verqara, Fábio L. L. Mendonça, Rafael Timóteo de Sousa Júnior
The Amazon Rainforest is the most significant biodiversi-ty reserve on the planet. It plays a central role in combating global warming and climate change on the Earth. De-spite its importance, in 2021, the illegal deforestation process in the Brazilian Amazon rainforest had the worst year in a decade. The data show that more than 10,000 kilometers of native forest were destroyed that year-an increase of 29% compared to 2020. To fight against the action of deforesters, Brazilian environmental inspection agencies imposed more than 14 billion dollars in environmental fines in recent decades. However, it has not effectively reduced deforestation as only 4% of this amount was effectively collected-not inhibiting lawbreakers from deforesting. This is due to the difficulty of identifying the real transgressors, who use scapegoats to hide their crimes. The main objective of this paper is to propose an approach to find the real environmental transgressors through the analysis of data related to the fines imposed by Brazilian governmental agencies in the last three decades. We propose a method that employ clustering techniques in geo-graphic and temporal data extracted from fines to identify non-trivial correlations between scapegoats and large landowners. The automatically identified links were load-ed into a graph analysis database for accuracy assessment. The observed results were positive and indicated that this strategy could effectively identify the real culprits.
{"title":"Using spatial data and cluster analysis to automatically detect non-trivial relationships between environmental transgressors","authors":"José Alberto Sousa Torres, Paulo Henrique dos Santos, Daniel Alves da Silva, C. E. L. Veiga, Márcio Bastos Medeiros, Guilherme Fay Verqara, Fábio L. L. Mendonça, Rafael Timóteo de Sousa Júnior","doi":"10.1109/ICDMW58026.2022.00022","DOIUrl":"https://doi.org/10.1109/ICDMW58026.2022.00022","url":null,"abstract":"The Amazon Rainforest is the most significant biodiversi-ty reserve on the planet. It plays a central role in combating global warming and climate change on the Earth. De-spite its importance, in 2021, the illegal deforestation process in the Brazilian Amazon rainforest had the worst year in a decade. The data show that more than 10,000 kilometers of native forest were destroyed that year-an increase of 29% compared to 2020. To fight against the action of deforesters, Brazilian environmental inspection agencies imposed more than 14 billion dollars in environmental fines in recent decades. However, it has not effectively reduced deforestation as only 4% of this amount was effectively collected-not inhibiting lawbreakers from deforesting. This is due to the difficulty of identifying the real transgressors, who use scapegoats to hide their crimes. The main objective of this paper is to propose an approach to find the real environmental transgressors through the analysis of data related to the fines imposed by Brazilian governmental agencies in the last three decades. We propose a method that employ clustering techniques in geo-graphic and temporal data extracted from fines to identify non-trivial correlations between scapegoats and large landowners. The automatically identified links were load-ed into a graph analysis database for accuracy assessment. The observed results were positive and indicated that this strategy could effectively identify the real culprits.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128219766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}