特征工程vs.提取:通过自动编码器通过空间面板农业数据聚类巴西城市

Anais do XIX Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2022) Pub Date : 2022-11-28 DOI:10.5753/eniac.2022.227622

M. A. S. D. Silva, L. N. Matos, F. E. O. Santos, M. H. Dompieri, F. Moura

{"title":"特征工程vs.提取:通过自动编码器通过空间面板农业数据聚类巴西城市","authors":"M. A. S. D. Silva, L. N. Matos, F. E. O. Santos, M. H. Dompieri, F. Moura","doi":"10.5753/eniac.2022.227622","DOIUrl":null,"url":null,"abstract":"This article compares the clustering of Brazilian municipalities according to their agricultural diversity using two approaches, one based on feature engineering and the other based on feature extraction using Deep Learning based on autoencoders and cluster analysis based on k-means and Self-Organizing Maps. The analyzes were conducted from panel data referring to IBGE’s annual estimates of Brazilian agricultural production between 1999 and 2018. Different structures of simple stacked undercomplete autoencoders were analyzed, varying the number of layers and neurons in each of them, including the latent layer. The asymmetric exponential linear loss function was also evaluated to cope with the sparse data. The results show that in comparison with the ground truth adopted, the autoencoder model combined with the k-means presented a superior result than the clustering of the raw data from the k-means, demonstrating the ability of simple autoencoders to represent from their latent layer important features of the data. Although the general accuracy is low, the results are promising, considering that we evaluated the most simple strategy for Deep Clustering.","PeriodicalId":165095,"journal":{"name":"Anais do XIX Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2022)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Feature engineering vs. extraction: clustering Brazilian municipalities through spatial panel agricultural data via autoencoders\",\"authors\":\"M. A. S. D. Silva, L. N. Matos, F. E. O. Santos, M. H. Dompieri, F. Moura\",\"doi\":\"10.5753/eniac.2022.227622\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article compares the clustering of Brazilian municipalities according to their agricultural diversity using two approaches, one based on feature engineering and the other based on feature extraction using Deep Learning based on autoencoders and cluster analysis based on k-means and Self-Organizing Maps. The analyzes were conducted from panel data referring to IBGE’s annual estimates of Brazilian agricultural production between 1999 and 2018. Different structures of simple stacked undercomplete autoencoders were analyzed, varying the number of layers and neurons in each of them, including the latent layer. The asymmetric exponential linear loss function was also evaluated to cope with the sparse data. The results show that in comparison with the ground truth adopted, the autoencoder model combined with the k-means presented a superior result than the clustering of the raw data from the k-means, demonstrating the ability of simple autoencoders to represent from their latent layer important features of the data. Although the general accuracy is low, the results are promising, considering that we evaluated the most simple strategy for Deep Clustering.\",\"PeriodicalId\":165095,\"journal\":{\"name\":\"Anais do XIX Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2022)\",\"volume\":\"101 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Anais do XIX Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2022)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5753/eniac.2022.227622\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anais do XIX Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2022)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/eniac.2022.227622","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文使用两种方法根据巴西城市的农业多样性对其聚类进行了比较，一种方法基于特征工程，另一种方法基于基于自动编码器的深度学习和基于k-means和自组织地图的聚类分析的特征提取。这些分析是根据IBGE 1999年至2018年巴西农业生产年度估计数的面板数据进行的。分析了不同结构的简单堆叠欠完全自编码器，改变了层数和神经元数，包括潜在层。利用非对称指数线性损失函数来处理稀疏数据。结果表明，与采用的ground truth相比，结合k-means的自编码器模型的聚类结果优于k-means的原始数据聚类结果，证明了简单的自编码器能够从其潜在层表示数据的重要特征。虽然总体精度较低，但考虑到我们评估了最简单的深度聚类策略，结果是有希望的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Feature engineering vs. extraction: clustering Brazilian municipalities through spatial panel agricultural data via autoencoders

This article compares the clustering of Brazilian municipalities according to their agricultural diversity using two approaches, one based on feature engineering and the other based on feature extraction using Deep Learning based on autoencoders and cluster analysis based on k-means and Self-Organizing Maps. The analyzes were conducted from panel data referring to IBGE’s annual estimates of Brazilian agricultural production between 1999 and 2018. Different structures of simple stacked undercomplete autoencoders were analyzed, varying the number of layers and neurons in each of them, including the latent layer. The asymmetric exponential linear loss function was also evaluated to cope with the sparse data. The results show that in comparison with the ground truth adopted, the autoencoder model combined with the k-means presented a superior result than the clustering of the raw data from the k-means, demonstrating the ability of simple autoencoders to represent from their latent layer important features of the data. Although the general accuracy is low, the results are promising, considering that we evaluated the most simple strategy for Deep Clustering.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Anais do XIX Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2022)

自引率

0.00%

发文量