特征工程vs.提取:通过自动编码器通过空间面板农业数据聚类巴西城市

M. A. S. D. Silva, L. N. Matos, F. E. O. Santos, M. H. Dompieri, F. Moura
{"title":"特征工程vs.提取:通过自动编码器通过空间面板农业数据聚类巴西城市","authors":"M. A. S. D. Silva, L. N. Matos, F. E. O. Santos, M. H. Dompieri, F. Moura","doi":"10.5753/eniac.2022.227622","DOIUrl":null,"url":null,"abstract":"This article compares the clustering of Brazilian municipalities according to their agricultural diversity using two approaches, one based on feature engineering and the other based on feature extraction using Deep Learning based on autoencoders and cluster analysis based on k-means and Self-Organizing Maps. The analyzes were conducted from panel data referring to IBGE’s annual estimates of Brazilian agricultural production between 1999 and 2018. Different structures of simple stacked undercomplete autoencoders were analyzed, varying the number of layers and neurons in each of them, including the latent layer. The asymmetric exponential linear loss function was also evaluated to cope with the sparse data. The results show that in comparison with the ground truth adopted, the autoencoder model combined with the k-means presented a superior result than the clustering of the raw data from the k-means, demonstrating the ability of simple autoencoders to represent from their latent layer important features of the data. Although the general accuracy is low, the results are promising, considering that we evaluated the most simple strategy for Deep Clustering.","PeriodicalId":165095,"journal":{"name":"Anais do XIX Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2022)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Feature engineering vs. extraction: clustering Brazilian municipalities through spatial panel agricultural data via autoencoders\",\"authors\":\"M. A. S. D. Silva, L. N. Matos, F. E. O. Santos, M. H. Dompieri, F. Moura\",\"doi\":\"10.5753/eniac.2022.227622\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article compares the clustering of Brazilian municipalities according to their agricultural diversity using two approaches, one based on feature engineering and the other based on feature extraction using Deep Learning based on autoencoders and cluster analysis based on k-means and Self-Organizing Maps. The analyzes were conducted from panel data referring to IBGE’s annual estimates of Brazilian agricultural production between 1999 and 2018. Different structures of simple stacked undercomplete autoencoders were analyzed, varying the number of layers and neurons in each of them, including the latent layer. The asymmetric exponential linear loss function was also evaluated to cope with the sparse data. The results show that in comparison with the ground truth adopted, the autoencoder model combined with the k-means presented a superior result than the clustering of the raw data from the k-means, demonstrating the ability of simple autoencoders to represent from their latent layer important features of the data. Although the general accuracy is low, the results are promising, considering that we evaluated the most simple strategy for Deep Clustering.\",\"PeriodicalId\":165095,\"journal\":{\"name\":\"Anais do XIX Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2022)\",\"volume\":\"101 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Anais do XIX Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2022)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5753/eniac.2022.227622\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anais do XIX Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2022)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/eniac.2022.227622","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文使用两种方法根据巴西城市的农业多样性对其聚类进行了比较,一种方法基于特征工程,另一种方法基于基于自动编码器的深度学习和基于k-means和自组织地图的聚类分析的特征提取。这些分析是根据IBGE 1999年至2018年巴西农业生产年度估计数的面板数据进行的。分析了不同结构的简单堆叠欠完全自编码器,改变了层数和神经元数,包括潜在层。利用非对称指数线性损失函数来处理稀疏数据。结果表明,与采用的ground truth相比,结合k-means的自编码器模型的聚类结果优于k-means的原始数据聚类结果,证明了简单的自编码器能够从其潜在层表示数据的重要特征。虽然总体精度较低,但考虑到我们评估了最简单的深度聚类策略,结果是有希望的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Feature engineering vs. extraction: clustering Brazilian municipalities through spatial panel agricultural data via autoencoders
This article compares the clustering of Brazilian municipalities according to their agricultural diversity using two approaches, one based on feature engineering and the other based on feature extraction using Deep Learning based on autoencoders and cluster analysis based on k-means and Self-Organizing Maps. The analyzes were conducted from panel data referring to IBGE’s annual estimates of Brazilian agricultural production between 1999 and 2018. Different structures of simple stacked undercomplete autoencoders were analyzed, varying the number of layers and neurons in each of them, including the latent layer. The asymmetric exponential linear loss function was also evaluated to cope with the sparse data. The results show that in comparison with the ground truth adopted, the autoencoder model combined with the k-means presented a superior result than the clustering of the raw data from the k-means, demonstrating the ability of simple autoencoders to represent from their latent layer important features of the data. Although the general accuracy is low, the results are promising, considering that we evaluated the most simple strategy for Deep Clustering.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Application of Learned OWA Operators in Pooling and Channel Aggregation Layers in Convolutional Neural Networks Improving steel making off-gas predictions by mixing classification and regression multi-modal multivariate models A Framework for prediction of dropout in distance learning through XAI techniques in Virtual Learning Environment Textile defect detection using YOLOv5 on AITEX Dataset Aspects of a learned model to predict the quality of life of university students in Brazil
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1