首页 > 最新文献

Anais do XVIII Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2021)最新文献

英文 中文
Applying DevOps to Machine Learning Processes: A Systematic Mapping 将DevOps应用于机器学习过程:系统映射
B. M. A. Matsui, D. Goya
Práticas de DevOps têm sido cada vez mais utilizadas por equipes de engenharia de software com o intuito de aprimorar as etapas de desenvolvimento. Em processos que envolvem machine learning (ML), DevOps também pode ser aplicado a fim de implantar modelos de aprendizado de máquina em produção – prática também conhecida como MLOps. Neste mapeamento sistemático objetiva-se entender como DevOps tem sido aplicado a processos de machine learning e quais são os desafios enfrentados. Foram selecionados 15 artigos e observou-se que a maioria faz uso de práticas de CI/CD e propõe arquiteturas para a implantação de modelos de ML. Como maiores desafios, têm-se as características inerentes aos modelos de ML e resistência à mudança.
为了改进开发阶段,软件工程团队越来越多地使用DevOps实践。在涉及机器学习(ML)的过程中,DevOps也可以应用于在生产中部署机器学习模型——实践也称为MLOps。这个系统映射的目的是了解DevOps是如何应用于机器学习过程的,以及面临的挑战是什么。我们选择了15篇文章,观察到大多数使用CI/CD实践,并提出了实现ML模型的架构。作为最大的挑战,ML模型的固有特性和对更改的阻力。
{"title":"Applying DevOps to Machine Learning Processes: A Systematic Mapping","authors":"B. M. A. Matsui, D. Goya","doi":"10.5753/eniac.2021.18284","DOIUrl":"https://doi.org/10.5753/eniac.2021.18284","url":null,"abstract":"Práticas de DevOps têm sido cada vez mais utilizadas por equipes de engenharia de software com o intuito de aprimorar as etapas de desenvolvimento. Em processos que envolvem machine learning (ML), DevOps também pode ser aplicado a fim de implantar modelos de aprendizado de máquina em produção – prática também conhecida como MLOps. Neste mapeamento sistemático objetiva-se entender como DevOps tem sido aplicado a processos de machine learning e quais são os desafios enfrentados. Foram selecionados 15 artigos e observou-se que a maioria faz uso de práticas de CI/CD e propõe arquiteturas para a implantação de modelos de ML. Como maiores desafios, têm-se as características inerentes aos modelos de ML e resistência à mudança.","PeriodicalId":318676,"journal":{"name":"Anais do XVIII Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2021)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122457020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Handling uncertainty through Bayesian inference for Species Distribution Modelling in the Amazon Basin region 利用贝叶斯推理处理亚马逊流域物种分布模型的不确定性
Renato O. Miyaji, Pedro L. P. Corrêa
Uma das ferramentas mais utilizadas para o monitoramento da biodiversidade é a modelagem de distribuição de espécies. Para a sua aplicação, é necessário possuir uma grande base de dados confiáveis a respeito da ocorrência de espécies. Entretanto, essa condição não é satisfeita quando existem poucos registros de ocorrência. Nesse contexto, podem ser aplicadas técnicas de tratamento de incertezas. Assim, este trabalho buscou utilizar a abordagem Bayesiana para permitir a modelagem de distribuição de espécies na região da Bacia Amazônica próxima a Manaus (AM), com base em dados coletados pelo projeto GoAmazon 2014/15. Os resultados foram comparados com os resultantes de técnicas clássicas, obtendo desempenhos semelhantes.
监测生物多样性最常用的工具之一是物种分布模型。对于它的应用,需要一个关于物种发生的大而可靠的数据库。然而,当发生记录很少时,这个条件就不满足了。在这种情况下,可以应用不确定性处理技术。因此,本研究试图基于GoAmazon 2014/15项目收集的数据,使用贝叶斯方法对玛瑙斯(AM)附近的亚马逊盆地地区的物种分布进行建模。结果与经典技术的结果进行了比较,得到了相似的性能。
{"title":"Handling uncertainty through Bayesian inference for Species Distribution Modelling in the Amazon Basin region","authors":"Renato O. Miyaji, Pedro L. P. Corrêa","doi":"10.5753/eniac.2021.18243","DOIUrl":"https://doi.org/10.5753/eniac.2021.18243","url":null,"abstract":"Uma das ferramentas mais utilizadas para o monitoramento da biodiversidade é a modelagem de distribuição de espécies. Para a sua aplicação, é necessário possuir uma grande base de dados confiáveis a respeito da ocorrência de espécies. Entretanto, essa condição não é satisfeita quando existem poucos registros de ocorrência. Nesse contexto, podem ser aplicadas técnicas de tratamento de incertezas. Assim, este trabalho buscou utilizar a abordagem Bayesiana para permitir a modelagem de distribuição de espécies na região da Bacia Amazônica próxima a Manaus (AM), com base em dados coletados pelo projeto GoAmazon 2014/15. Os resultados foram comparados com os resultantes de técnicas clássicas, obtendo desempenhos semelhantes.","PeriodicalId":318676,"journal":{"name":"Anais do XVIII Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2021)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116196717","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Comparative Analysis of Collaborative Filtering-Based Predictors of Scores in Surveys of a Large Company 某大公司问卷调查中基于协同过滤的分数预测因子的比较分析
M. F. Oliveira, M. Delgado, R. Lüders
Collaborative Filtering (CF) can be understood as the process of predicting the preferences of users and deriving useful patterns by studying their activities. In the survey context, it can be used to predict answers to questions as combinations of other available answers. In this paper, we aim to test five CF-based algorithms (item-item, iterative matrix factorization, neural collaborative filtering, logistic matrix factorization, and an ensemble of them) to estimate scores in four survey applications (checkpoints) composed of 700,000 employee's ratings. These data have been collected from 2019 to 2020 by a large Brazilian tech company with more than 10,000 employees. The results show that collaborative filtering approaches provide relevant alternatives to score questions of surveys. They provided good quality estimates. This result can be further explored to eventually reduce the size of questionnaires, avoiding burden phenomena faced by respondents when dealing with large surveys.
协同过滤(CF)可以理解为通过研究用户的活动来预测用户的偏好并得出有用模式的过程。在调查上下文中,它可以作为其他可用答案的组合来预测问题的答案。在本文中,我们的目标是测试五种基于cf的算法(item-item,迭代矩阵分解,神经协同过滤,逻辑矩阵分解和它们的集合),以估计由700,000名员工评分组成的四个调查应用程序(检查点)的分数。这些数据是由一家拥有1万多名员工的巴西大型科技公司从2019年到2020年收集的。结果表明,协同过滤方法为问卷调查问题的评分提供了相关的选择。他们提供了高质量的评估。这一结果可以进一步探讨,最终减少问卷的规模,避免被调查者在处理大型调查时面临负担现象。
{"title":"Comparative Analysis of Collaborative Filtering-Based Predictors of Scores in Surveys of a Large Company","authors":"M. F. Oliveira, M. Delgado, R. Lüders","doi":"10.5753/eniac.2021.18299","DOIUrl":"https://doi.org/10.5753/eniac.2021.18299","url":null,"abstract":"Collaborative Filtering (CF) can be understood as the process of predicting the preferences of users and deriving useful patterns by studying their activities. In the survey context, it can be used to predict answers to questions as combinations of other available answers. In this paper, we aim to test five CF-based algorithms (item-item, iterative matrix factorization, neural collaborative filtering, logistic matrix factorization, and an ensemble of them) to estimate scores in four survey applications (checkpoints) composed of 700,000 employee's ratings. These data have been collected from 2019 to 2020 by a large Brazilian tech company with more than 10,000 employees. The results show that collaborative filtering approaches provide relevant alternatives to score questions of surveys. They provided good quality estimates. This result can be further explored to eventually reduce the size of questionnaires, avoiding burden phenomena faced by respondents when dealing with large surveys.","PeriodicalId":318676,"journal":{"name":"Anais do XVIII Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2021)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114781071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis of a Brazilian Indigenous corpus using machine learning methods 使用机器学习方法分析巴西土著语料库
T. Lima, André C. A. Nascimento, P. Miranda, R. F. Mello
In Brazil, several minority languages suffer a serious risk of extinction. The appropriate documentation of such languages is a fundamental step to avoid that. However, for some of those languages, only a small amount of text corpora is digitally accessible. Meanwhile there are many issues related to the identification of indigenous languages, which may help to identify key similarities among them, as well as to connect related languages and dialects. Therefore, this paper proposes to study and automatically classify 26 neglected Brazilian native languages, considering a small amount of training data, under a supervised and unsupervised setting. Our findings indicate that the use of machine learning models to the analysis of Brazilian Indigenous corpora is very promising, and we hope this work encourage more research on this topic in the next years.
在巴西,几种少数民族语言面临着灭绝的严重危险。这些语言的适当文档是避免这种情况的基本步骤。然而,对于其中的一些语言,只有一小部分的文本语料库是数字访问的。与此同时,还有许多与土著语言识别相关的问题,这可能有助于识别它们之间的关键相似性,并将相关语言和方言联系起来。因此,本文提出在有监督和无监督两种设置下,考虑少量训练数据,对26种被忽视的巴西母语进行研究和自动分类。我们的研究结果表明,使用机器学习模型来分析巴西土著语料库是非常有前途的,我们希望这项工作能在未来几年鼓励更多关于这一主题的研究。
{"title":"Analysis of a Brazilian Indigenous corpus using machine learning methods","authors":"T. Lima, André C. A. Nascimento, P. Miranda, R. F. Mello","doi":"10.5753/eniac.2021.18246","DOIUrl":"https://doi.org/10.5753/eniac.2021.18246","url":null,"abstract":"In Brazil, several minority languages suffer a serious risk of extinction. The appropriate documentation of such languages is a fundamental step to avoid that. However, for some of those languages, only a small amount of text corpora is digitally accessible. Meanwhile there are many issues related to the identification of indigenous languages, which may help to identify key similarities among them, as well as to connect related languages and dialects. Therefore, this paper proposes to study and automatically classify 26 neglected Brazilian native languages, considering a small amount of training data, under a supervised and unsupervised setting. Our findings indicate that the use of machine learning models to the analysis of Brazilian Indigenous corpora is very promising, and we hope this work encourage more research on this topic in the next years.","PeriodicalId":318676,"journal":{"name":"Anais do XVIII Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2021)","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129477550","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Comparative Analysis of Countries' Performance According to SDG Indicators based on Machine Learning 基于机器学习的各国可持续发展目标指标绩效比较分析
Guilherme Souza, J. Santos, Gabriel SantClair, Janaína Gomide, Luan Santos
The Sustainable Development Goals (SDGs) are part of a global effort to reduce the impacts of climate change, promoting social justice and economic growth. The United Nations provides a database with hundreds of indicators to track the SDGs since 2016 for a total of 302 regions. This work aims to assess which countries are in a similar situation regarding sustainable development. Principal Component Analysis was used to reduce the dimension of the dataset and k-means algorithm was used to cluster countries according to their SDGs indicators. For the years of 2016, 2017 and 2018 were obtained 11, 13 and 11 groups, respectively. This paper also analyses clusters changes throughout the years.
可持续发展目标(sdg)是减少气候变化影响、促进社会正义和经济增长的全球努力的一部分。自2016年以来,联合国提供了一个包含数百个指标的数据库,用于跟踪302个地区的可持续发展目标。这项工作的目的是评估哪些国家在可持续发展方面处于类似的情况。使用主成分分析对数据集进行降维,并使用k-means算法根据可持续发展目标指标对国家进行聚类。2016年、2017年和2018年分别获得11组、13组和11组。本文还分析了各年份的集群变化。
{"title":"A Comparative Analysis of Countries' Performance According to SDG Indicators based on Machine Learning","authors":"Guilherme Souza, J. Santos, Gabriel SantClair, Janaína Gomide, Luan Santos","doi":"10.5753/eniac.2021.18248","DOIUrl":"https://doi.org/10.5753/eniac.2021.18248","url":null,"abstract":"The Sustainable Development Goals (SDGs) are part of a global effort to reduce the impacts of climate change, promoting social justice and economic growth. The United Nations provides a database with hundreds of indicators to track the SDGs since 2016 for a total of 302 regions. This work aims to assess which countries are in a similar situation regarding sustainable development. Principal Component Analysis was used to reduce the dimension of the dataset and k-means algorithm was used to cluster countries according to their SDGs indicators. For the years of 2016, 2017 and 2018 were obtained 11, 13 and 11 groups, respectively. This paper also analyses clusters changes throughout the years.","PeriodicalId":318676,"journal":{"name":"Anais do XVIII Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2021)","volume":"117 4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130131814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The 5th Brazilian Competition on Knowledge Discovery in Databases (KDD-BR 2021) 第五届巴西数据库知识发现竞赛(KDD-BR 2021)
A. C. Lorena, F. Verri, Tiago A. Almeida
Este artigo editorial descreve a Competição Brasileira de Descoberta de Conhecimento em Bancos de Dados (KDD-BR 2021) e resume as contribuições das três melhores soluções obtidas em sua quinta edição. A competição de 2021 envolveu a resolução de instâncias do Problema do Caixeiro Viajante, de diferentes tamanhos, usando uma abordagem de previsão de arestas.
本文描述了巴西数据库知识发现竞赛(KDD-BR 2021),并总结了在第五版中获得的三个最佳解决方案的贡献。2021年的竞赛涉及使用边缘预测方法解决不同大小的旅行商问题实例。
{"title":"The 5th Brazilian Competition on Knowledge Discovery in Databases (KDD-BR 2021)","authors":"A. C. Lorena, F. Verri, Tiago A. Almeida","doi":"10.5753/eniac.2021.18425","DOIUrl":"https://doi.org/10.5753/eniac.2021.18425","url":null,"abstract":"Este artigo editorial descreve a Competição Brasileira de Descoberta de Conhecimento em Bancos de Dados (KDD-BR 2021) e resume as contribuições das três melhores soluções obtidas em sua quinta edição. A competição de 2021 envolveu a resolução de instâncias do Problema do Caixeiro Viajante, de diferentes tamanhos, usando uma abordagem de previsão de arestas.","PeriodicalId":318676,"journal":{"name":"Anais do XVIII Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2021)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126777355","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fake News Detection about Covid-19 in the Portuguese Language 葡萄牙语新冠肺炎假新闻检测
Anísio Pereira Batista Filho, Débora da Conceição Araújo, Maverick Andre Dionisio Ferreira, P. M. Mattos Neto
A disseminação de notícias falsas tem sido um problema notado em diversos setores da sociedade, e vem dificultando o combate à pandemia causada pelo novo coronavírus (Sars-Cov-2). Combater desinformação sobre o Sars-Cov-2, principalmente nas redes sociais, é de fundamental importância para o controle da propagação do vírus e, consequentemente, da pandemia. Diante disso, nesse trabalho são construídos modelos de aprendizado supervisionado focados na identificação de notícias falsas sobre o novo coronavírus. Como resultados, foram construídos e avaliados 18 modelos, os quais chegaram a alcançar 0.62%, 0.82% e 0.47% de f-score para as classes consideradas (news, opinion e fake).
假新闻的传播已经成为社会各个部门注意到的一个问题,并阻碍了抗击新型冠状病毒(Sars-Cov-2)造成的大流行。打击关于Sars-Cov-2的虚假信息,特别是在社交网络上,对于控制病毒的传播,从而控制大流行至关重要。因此,在这项工作中,建立了监督学习模型,重点是识别关于新型冠状病毒的假新闻。结果,构建并评估了18个模型,考虑的类别(新闻、意见和假)的f-score分别达到0.62%、0.82%和0.47%。
{"title":"Fake News Detection about Covid-19 in the Portuguese Language","authors":"Anísio Pereira Batista Filho, Débora da Conceição Araújo, Maverick Andre Dionisio Ferreira, P. M. Mattos Neto","doi":"10.5753/eniac.2021.18278","DOIUrl":"https://doi.org/10.5753/eniac.2021.18278","url":null,"abstract":"A disseminação de notícias falsas tem sido um problema notado em diversos setores da sociedade, e vem dificultando o combate à pandemia causada pelo novo coronavírus (Sars-Cov-2). Combater desinformação sobre o Sars-Cov-2, principalmente nas redes sociais, é de fundamental importância para o controle da propagação do vírus e, consequentemente, da pandemia. Diante disso, nesse trabalho são construídos modelos de aprendizado supervisionado focados na identificação de notícias falsas sobre o novo coronavírus. Como resultados, foram construídos e avaliados 18 modelos, os quais chegaram a alcançar 0.62%, 0.82% e 0.47% de f-score para as classes consideradas (news, opinion e fake).","PeriodicalId":318676,"journal":{"name":"Anais do XVIII Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2021)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121158026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Iterative machine learning applied to annotation of text datasets 迭代机器学习在文本数据集标注中的应用
Thiago Abdo, Fabiano Silva
The purpose of this paper is to analyze the use of different machine learning approaches and algorithms to be integrated as an automated assistance on a tool to aid the creation of new annotated datasets. We evaluate how they scale in an environment without dedicated machine learning hardware. In particular, we study the impact over a dataset with few examples and one that is being constructed. We experiment using deep learning algorithms (Bert) and classical learning algorithms with a lower computational cost (W2V and Glove combined with RF and SVM). Our experiments show that deep learning algorithms have a performance advantage over classical techniques. However, deep learning algorithms have a high computational cost, making them inadequate to an environment with reduced hardware resources. Simulations using Active and Iterative machine learning techniques to assist the creation of new datasets are conducted. For these simulations, we use the classical learning algorithms because of their computational cost. The knowledge gathered with our experimental evaluation aims to support the creation of a tool for building new text datasets.
本文的目的是分析不同机器学习方法和算法的使用,将其集成为工具上的自动辅助,以帮助创建新的注释数据集。我们评估它们在没有专用机器学习硬件的环境中如何扩展。特别是,我们研究了对一个数据集的影响,其中有几个例子和一个正在构建的数据集。我们使用深度学习算法(Bert)和计算成本较低的经典学习算法(W2V和Glove结合RF和SVM)进行了实验。我们的实验表明,深度学习算法比经典技术具有性能优势。然而,深度学习算法的计算成本很高,不适合硬件资源较少的环境。模拟使用主动和迭代机器学习技术,以协助创建新的数据集进行。对于这些模拟,我们使用经典的学习算法,因为它们的计算成本。通过我们的实验评估收集的知识旨在支持创建用于构建新文本数据集的工具。
{"title":"Iterative machine learning applied to annotation of text datasets","authors":"Thiago Abdo, Fabiano Silva","doi":"10.5753/eniac.2021.18268","DOIUrl":"https://doi.org/10.5753/eniac.2021.18268","url":null,"abstract":"The purpose of this paper is to analyze the use of different machine learning approaches and algorithms to be integrated as an automated assistance on a tool to aid the creation of new annotated datasets. We evaluate how they scale in an environment without dedicated machine learning hardware. In particular, we study the impact over a dataset with few examples and one that is being constructed. We experiment using deep learning algorithms (Bert) and classical learning algorithms with a lower computational cost (W2V and Glove combined with RF and SVM). Our experiments show that deep learning algorithms have a performance advantage over classical techniques. However, deep learning algorithms have a high computational cost, making them inadequate to an environment with reduced hardware resources. Simulations using Active and Iterative machine learning techniques to assist the creation of new datasets are conducted. For these simulations, we use the classical learning algorithms because of their computational cost. The knowledge gathered with our experimental evaluation aims to support the creation of a tool for building new text datasets.","PeriodicalId":318676,"journal":{"name":"Anais do XVIII Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2021)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124914794","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Vector space models for trace clustering: a comparative study 轨迹聚类的向量空间模型:比较研究
Mateus Alex dos Santos Luna, Andre Paulino de Lima, T. Neubauer, M. Fantinato, S. M. Peres
Process mining explores event logs to offer valuable insights to business process managers. Some types of business processes are hard to mine, including unstructured and knowledge-intensive processes. Then, trace clustering is usually applied to event logs aiming to break it into sublogs, making it more amenable to the typical process mining task. However, applying clustering algorithms involves decisions, such as how traces are represented, that can lead to better results. In this paper, we compare four vector space models for trace clustering, using them with an agglomerative clustering algorithm in synthetic and real-world event logs. Our analyses suggest the embeddings-based vector space model can properly handle trace clustering in unstructured processes.
流程挖掘探索事件日志,为业务流程管理人员提供有价值的见解。某些类型的业务流程难以挖掘,包括非结构化和知识密集型流程。然后,跟踪聚类通常应用于事件日志,目的是将其分解为子日志,使其更适合典型的流程挖掘任务。但是,应用聚类算法涉及决策,例如如何表示轨迹,这可能会导致更好的结果。在本文中,我们比较了四种用于跟踪聚类的向量空间模型,并将它们与聚类算法一起用于合成和真实事件日志。分析表明,基于嵌入的向量空间模型可以很好地处理非结构化过程中的轨迹聚类。
{"title":"Vector space models for trace clustering: a comparative study","authors":"Mateus Alex dos Santos Luna, Andre Paulino de Lima, T. Neubauer, M. Fantinato, S. M. Peres","doi":"10.5753/eniac.2021.18274","DOIUrl":"https://doi.org/10.5753/eniac.2021.18274","url":null,"abstract":"Process mining explores event logs to offer valuable insights to business process managers. Some types of business processes are hard to mine, including unstructured and knowledge-intensive processes. Then, trace clustering is usually applied to event logs aiming to break it into sublogs, making it more amenable to the typical process mining task. However, applying clustering algorithms involves decisions, such as how traces are represented, that can lead to better results. In this paper, we compare four vector space models for trace clustering, using them with an agglomerative clustering algorithm in synthetic and real-world event logs. Our analyses suggest the embeddings-based vector space model can properly handle trace clustering in unstructured processes.","PeriodicalId":318676,"journal":{"name":"Anais do XVIII Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2021)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132637698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Uma Abordagem de Agrupamento Automático de Dados Baseada na Otimização por Busca em Grupo Memética 一种基于模因组搜索优化的自动数据聚类方法
Luciano D. S. Pacífico, Teresa B Ludermir
Uma das tarefas mais primitivas em organização de padrões, a Análise de Agrupamentos, é um problema difícil em análise exploratória de dados. Muitos dos algoritmos de agrupamento são facilmente presos em mínimos locais, por não possuírem bons operadores de busca global. Neste trabalho, um algoritmo de Inteligência de Enxames (SIs) memético é apresentado, baseado na Otimização por Busca em Grupo e no K-Means, chamado MGSO, que tenta encontrar o melhor número de agrupamentos, assim como a melhor distribuição dos dados nesses agrupamentos, simultaneamente. O MGSO mostrou-se capaz de encontrar boas soluções globais quando testado em nove problemas reais, em comparação a outros SIs e Algoritmos Evolucionários da literatura.
聚类分析是模式组织中最原始的任务之一,在探索性数据分析中是一个困难的问题。由于缺乏良好的全局搜索操作符,许多聚类算法很容易陷入局部最小值。摘要提出了一种基于群搜索优化和K-Means优化的模因群智能算法MGSO,该算法试图同时找到最佳的群数和数据在这些群中的最佳分布。与文献中的其他SIs和进化算法相比,MGSO能够在9个实际问题中找到良好的全局解。
{"title":"Uma Abordagem de Agrupamento Automático de Dados Baseada na Otimização por Busca em Grupo Memética","authors":"Luciano D. S. Pacífico, Teresa B Ludermir","doi":"10.5753/eniac.2021.18262","DOIUrl":"https://doi.org/10.5753/eniac.2021.18262","url":null,"abstract":"Uma das tarefas mais primitivas em organização de padrões, a Análise de Agrupamentos, é um problema difícil em análise exploratória de dados. Muitos dos algoritmos de agrupamento são facilmente presos em mínimos locais, por não possuírem bons operadores de busca global. Neste trabalho, um algoritmo de Inteligência de Enxames (SIs) memético é apresentado, baseado na Otimização por Busca em Grupo e no K-Means, chamado MGSO, que tenta encontrar o melhor número de agrupamentos, assim como a melhor distribuição dos dados nesses agrupamentos, simultaneamente. O MGSO mostrou-se capaz de encontrar boas soluções globais quando testado em nove problemas reais, em comparação a outros SIs e Algoritmos Evolucionários da literatura.","PeriodicalId":318676,"journal":{"name":"Anais do XVIII Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2021)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115246647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Anais do XVIII Encontro Nacional de Inteligência Artificial e Computacional (ENIAC 2021)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1