首页 > 最新文献

Research initiative, treatment action : RITA最新文献

英文 中文
Unsupervised Feature Selection Methodology for Clustering in High Dimensionality Datasets 高维数据集聚类的无监督特征选择方法
Pub Date : 2020-04-27 DOI: 10.22456/2175-2745.96081
Marcos de Souza Oliveira, Sergio Queiroz
Feature selection is an important research area that seeks to eliminate unwanted features from datasets. Many feature selection methods are suggested in the literature, but the evaluation of the best set of features is usually performed using supervised metrics, where labels are required. In this work we propose a methodology that tries to aid data specialists to answer simple but important questions, such as: (1) do current feature selection methods give similar results? (2) is there is a consistently better method ? (3) how to select the  m -best features? (4) as the methods are not parameter-free, how to choose the best parameters in the unsupervised scenario? and (5) given different options of selection, could we get better results if we fusion the results of the methods? If yes, how can we combine the results? We analyze these issues and propose a methodology that, based on some unsupervised methods, will make feature selection using strategies that turn the execution of the process fully automatic and unsupervised, in high-dimensional datasets. After, we evaluate the obtained results, when we see that they are better than those obtained by using the selection methods at standard configurations. In the end, we also list some further improvements that can be made in future works.
特征选择是一个重要的研究领域,旨在从数据集中消除不需要的特征。文献中提出了许多特征选择方法,但对最佳特征集的评估通常使用监督度量来执行,其中需要标签。在这项工作中,我们提出了一种方法,试图帮助数据专家回答简单但重要的问题,例如:(1)当前的特征选择方法是否给出类似的结果?(2)是否有一贯更好的方法?(3)如何选择m个最优特征?(4)由于方法不是无参数的,如何在无监督场景下选择最佳参数?(5)在不同的选择选项下,如果将这些方法的结果进行融合,是否会得到更好的结果?如果是,我们如何结合结果?我们分析了这些问题,并提出了一种基于一些无监督方法的方法,该方法将在高维数据集中使用策略进行特征选择,使过程的执行完全自动化和无监督。之后,我们对得到的结果进行了评价,当我们看到它们比在标准配置下使用选择方法得到的结果要好。最后,提出了在今后的工作中需要进一步改进的地方。
{"title":"Unsupervised Feature Selection Methodology for Clustering in High Dimensionality Datasets","authors":"Marcos de Souza Oliveira, Sergio Queiroz","doi":"10.22456/2175-2745.96081","DOIUrl":"https://doi.org/10.22456/2175-2745.96081","url":null,"abstract":"Feature selection is an important research area that seeks to eliminate unwanted features from datasets. Many feature selection methods are suggested in the literature, but the evaluation of the best set of features is usually performed using supervised metrics, where labels are required. In this work we propose a methodology that tries to aid data specialists to answer simple but important questions, such as: (1) do current feature selection methods give similar results? (2) is there is a consistently better method ? (3) how to select the  m -best features? (4) as the methods are not parameter-free, how to choose the best parameters in the unsupervised scenario? and (5) given different options of selection, could we get better results if we fusion the results of the methods? If yes, how can we combine the results? We analyze these issues and propose a methodology that, based on some unsupervised methods, will make feature selection using strategies that turn the execution of the process fully automatic and unsupervised, in high-dimensional datasets. After, we evaluate the obtained results, when we see that they are better than those obtained by using the selection methods at standard configurations. In the end, we also list some further improvements that can be made in future works.","PeriodicalId":82472,"journal":{"name":"Research initiative, treatment action : RITA","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88187662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Scrumie: Scrum Teaching Agent Oriented Game Scrum: Scrum教学代理导向游戏
Pub Date : 2020-04-27 DOI: 10.22456/2175-2745.98203
L. Marinho, Suelen Regina C. dos Santos, Leonardo Andrade, Bruna Costa Cons, Marcelo Schots, Vera Werneck
The use of agile methods has become essential in software development at the present time. Among the existing methods, Scrum is one of the major ones, and is used to manage projects in companies, even outside of the scope of software systems development. Considering the relevance of this subject and the success usually obtained in learning through educational games, Scrumie was proposed to teach the management of Scrum projects. Scrumie applies intelligence in multiagent architecture being developed with Agile Passi an agent oriented methodology. This paper contains a proposal, modeling, implementation and evaluation of the Scrumie game.
在当前的软件开发中,敏捷方法的使用已经变得至关重要。在现有的方法中,Scrum是主要的方法之一,它被用于管理公司的项目,甚至在软件系统开发的范围之外。考虑到这一主题的相关性以及通过教育性游戏学习通常会获得的成功,人们提出用Scrum来教授Scrum项目的管理。scrum将智能应用于多智能体架构中,该架构是用面向智能体的敏捷方法开发的。本文包含了Scrumie游戏的提案、建模、实现和评估。
{"title":"Scrumie: Scrum Teaching Agent Oriented Game","authors":"L. Marinho, Suelen Regina C. dos Santos, Leonardo Andrade, Bruna Costa Cons, Marcelo Schots, Vera Werneck","doi":"10.22456/2175-2745.98203","DOIUrl":"https://doi.org/10.22456/2175-2745.98203","url":null,"abstract":"The use of agile methods has become essential in software development at the present time. Among the existing methods, Scrum is one of the major ones, and is used to manage projects in companies, even outside of the scope of software systems development. Considering the relevance of this subject and the success usually obtained in learning through educational games, Scrumie was proposed to teach the management of Scrum projects. Scrumie applies intelligence in multiagent architecture being developed with Agile Passi an agent oriented methodology. This paper contains a proposal, modeling, implementation and evaluation of the Scrumie game.","PeriodicalId":82472,"journal":{"name":"Research initiative, treatment action : RITA","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75066102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Effects of reward distribution strategies and perseverance profiles on agent-based coalitions dynamics 奖励分配策略和毅力对基于主体的联盟动力学的影响
Pub Date : 2020-04-27 DOI: 10.22456/2175-2745.94845
Luis Gustavo Ludescher, Jaime Simão Sichman
In a conventional political system, leaders decide how to distribute benefits to the population and coalitions can emerge when other individuals support the candidates. This work intends to analyze how different leader strategies and individual profiles affect the way coalitions are formed and rewards are shared. Using agent-based simulation, we simulated a model in which individuals of three different perseverance profiles (patient, intermediate and impatient) eventually decide to be part of coalitions by supporting certain leaders when aiming to maximize their own earnings. Leaders can follow one of three different strategies to share rewards: altruistic, intermediate and egoistic. The results show that egoistic leaders stimulate the competition for rewards and the formation of coalitions, causing greater inequalities, while impatient individuals also promote more instability and lead to a higher concentration of rewards.
在传统的政治体制中,领导人决定如何将利益分配给民众,当其他个人支持候选人时,联盟就会出现。这项工作旨在分析不同的领导策略和个人形象如何影响联盟的形成和奖励的分享。使用基于主体的模拟,我们模拟了一个模型,在这个模型中,具有三种不同毅力特征(耐心、中等和不耐烦)的个体最终决定通过支持某些旨在最大化自己收益的领导者来成为联盟的一部分。领导者可以采用三种不同的策略来分享奖励:利他主义、中间主义和利己主义。结果表明,利己主义的领导者刺激了奖励竞争和联盟的形成,造成了更大的不平等,而不耐烦的个体也促进了更多的不稳定,导致更高的奖励集中。
{"title":"Effects of reward distribution strategies and perseverance profiles on agent-based coalitions dynamics","authors":"Luis Gustavo Ludescher, Jaime Simão Sichman","doi":"10.22456/2175-2745.94845","DOIUrl":"https://doi.org/10.22456/2175-2745.94845","url":null,"abstract":"In a conventional political system, leaders decide how to distribute benefits to the population and coalitions can emerge when other individuals support the candidates. This work intends to analyze how different leader strategies and individual profiles affect the way coalitions are formed and rewards are shared. Using agent-based simulation, we simulated a model in which individuals of three different perseverance profiles (patient, intermediate and impatient) eventually decide to be part of coalitions by supporting certain leaders when aiming to maximize their own earnings. Leaders can follow one of three different strategies to share rewards: altruistic, intermediate and egoistic. The results show that egoistic leaders stimulate the competition for rewards and the formation of coalitions, causing greater inequalities, while impatient individuals also promote more instability and lead to a higher concentration of rewards.","PeriodicalId":82472,"journal":{"name":"Research initiative, treatment action : RITA","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87283172","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prize Collecting Traveling Salesman Problem with Ridesharing 拼车的问题
Pub Date : 2020-04-27 DOI: 10.22456/2175-2745.94082
Ygor Alcântara de Medeiros, M. Goldbarg, E. Goldbarg
The Prize Collecting Traveling Salesman Problem with Ridesharing is a model that joins elements from the Prize Collecting Traveling Salesman and the collaborative transport. The salesman is the driver of a capacitated vehicle and uses a ridesharing system to minimize travel costs. There are a penalty and a bonus associated with each vertex of a graph, G, that represents the problem. There is also a cost associated with each edge of G. The salesman must choose a subset of vertices to be visited so that the total bonus collection is at least a given a parameter. The length of the tour plus the sum of penalties of all vertices not visited is as small as possible. There is a set of persons demanding rides. The ride request consists of a pickup and a drop off location, a maximum travel duration, and the maximum amount the person agrees to pay. The driver shares the cost associated with each arc in the tour with the passengers in the vehicle. Constraints from ride requests, as well as the capacity of the car, must be satisfied. We present a mathematical formulation for the problem investigated in this study and solve it in an optimization tool. We also present three heuristics that hybridize exact and heuristic methods. These algorithms use a decomposition strategy that other enriched vehicle routing problems can utilize.
拼车的有奖旅行商问题是一个将有奖旅行商和协同交通的要素结合在一起的模型。销售人员是一辆有容量的汽车的司机,并使用拼车系统来最大限度地降低旅行成本。图G的每个顶点都有相应的惩罚和奖励,G表示问题。g的每条边都有一个成本,销售人员必须选择一个要访问的顶点子集,以便总奖金集合至少是给定的参数。旅行的长度加上所有未访问的顶点的惩罚之和尽可能小。有一群人要求搭车。乘车请求包括上车和下车地点、最长旅行时间和乘客同意支付的最高金额。司机与车上的乘客共同分担每条线路的费用。必须满足乘坐要求的限制,以及汽车的容量。我们提出了这个问题的数学公式,并在一个优化工具中求解。我们还提出了三种混合精确和启发式方法的启发式方法。这些算法使用了一种其他丰富的车辆路线问题可以利用的分解策略。
{"title":"Prize Collecting Traveling Salesman Problem with Ridesharing","authors":"Ygor Alcântara de Medeiros, M. Goldbarg, E. Goldbarg","doi":"10.22456/2175-2745.94082","DOIUrl":"https://doi.org/10.22456/2175-2745.94082","url":null,"abstract":"The Prize Collecting Traveling Salesman Problem with Ridesharing is a model that joins elements from the Prize Collecting Traveling Salesman and the collaborative transport. The salesman is the driver of a capacitated vehicle and uses a ridesharing system to minimize travel costs. There are a penalty and a bonus associated with each vertex of a graph, G, that represents the problem. There is also a cost associated with each edge of G. The salesman must choose a subset of vertices to be visited so that the total bonus collection is at least a given a parameter. The length of the tour plus the sum of penalties of all vertices not visited is as small as possible. There is a set of persons demanding rides. The ride request consists of a pickup and a drop off location, a maximum travel duration, and the maximum amount the person agrees to pay. The driver shares the cost associated with each arc in the tour with the passengers in the vehicle. Constraints from ride requests, as well as the capacity of the car, must be satisfied. We present a mathematical formulation for the problem investigated in this study and solve it in an optimization tool. We also present three heuristics that hybridize exact and heuristic methods. These algorithms use a decomposition strategy that other enriched vehicle routing problems can utilize.","PeriodicalId":82472,"journal":{"name":"Research initiative, treatment action : RITA","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91265601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Studying the Performance of Cognitive Models in Time Series Forecasting 认知模型在时间序列预测中的性能研究
Pub Date : 2020-01-15 DOI: 10.22456/2175-2745.96181
A. B. S. Neto, T. Ferreira, M. D. C. M. Batista, P. Firmino
Cognitive models have been paramount for modeling phenomena for which empirical data are unavailable, scarce, or only partially relevant. These approaches are based on methods dedicated to preparing experts and then to elicit their opinions about the variables that describe the phenomena under study. In time series forecasting exercises, elicitation processes seek to obtain accurate estimates, overcoming human heuristic biases, while being less time consuming. This paper aims to compare the performance of cognitive and mathematical time series predictors, regarding accuracy. The results are based on the comparison of predictors of the cognitive and mathematical models for several time series from the M3-Competition. From the results, one can see that cognitive models are, at least, as accurate as ARIMA models predictions.
对于经验数据不可用、稀缺或仅部分相关的现象,认知模型是至关重要的。这些方法是基于专门为专家准备的方法,然后引出他们对描述正在研究的现象的变量的意见。在时间序列预测练习中,启发过程寻求获得准确的估计,克服人类的启发式偏见,同时减少时间消耗。本文旨在比较认知时间序列预测器和数学时间序列预测器在准确性方面的表现。结果是基于对来自M3-Competition的几个时间序列的认知模型和数学模型的预测因子的比较。从结果可以看出,认知模型至少和ARIMA模型的预测一样准确。
{"title":"Studying the Performance of Cognitive Models in Time Series Forecasting","authors":"A. B. S. Neto, T. Ferreira, M. D. C. M. Batista, P. Firmino","doi":"10.22456/2175-2745.96181","DOIUrl":"https://doi.org/10.22456/2175-2745.96181","url":null,"abstract":"Cognitive models have been paramount for modeling phenomena for which empirical data are unavailable, scarce, or only partially relevant. These approaches are based on methods dedicated to preparing experts and then to elicit their opinions about the variables that describe the phenomena under study. In time series forecasting exercises, elicitation processes seek to obtain accurate estimates, overcoming human heuristic biases, while being less time consuming. This paper aims to compare the performance of cognitive and mathematical time series predictors, regarding accuracy. The results are based on the comparison of predictors of the cognitive and mathematical models for several time series from the M3-Competition. From the results, one can see that cognitive models are, at least, as accurate as ARIMA models predictions.","PeriodicalId":82472,"journal":{"name":"Research initiative, treatment action : RITA","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75978653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Computational Strategy for Classification of Enem Issues Based on Item Response Theory 一种基于项目反应理论的问题分类计算策略
Pub Date : 2020-01-15 DOI: 10.22456/2175-2745.92406
G. H. Nunes, B. A. Oliveira, C. Nametala
The National High School Examination (ENEM) gains each year more importance, as it gradually, replacing traditional vestibular. Many simulations are done almost randomly by teachers or systems, with questions chosen without discretion. With this methodology, if a test needs to be reapplied, it is not possible to recreate it with questions that have the same difficulty as those used in the first evaluation. In this context, the present work presents the development of an ENEM Intelligent Simulation Generation System that calculates the parameters of Item Response Theory (TRI) of questions that have already been applied in ENEM and, based on them, classifies them. in groups of difficulty, thus enabling the generation of balanced tests. For this, the K-means algorithm was used to group the questions into three difficulty groups: easy, medium and difficult. To verify the functioning of the system, a simulation with 180 questions was generated along the ENEM model. It could be seen that in 37.7% of cases this happened. This hit rate was not greater because the algorithm confounded the difficulty of issues that are in close classes. However, the system has a hit rate of 92.8% in the classification of questions that are in distant groups.
国家高中考试(ENEM)越来越重要,它逐渐取代了传统的前庭考试。许多模拟几乎是由老师或系统随机完成的,问题的选择没有经过斟酌。使用这种方法,如果需要重新应用测试,则不可能使用与第一次评估中使用的问题具有相同难度的问题重新创建测试。在这种情况下,目前的工作提出了一个ENEM智能模拟生成系统的发展,该系统计算已经在ENEM中应用的问题的项目反应理论(TRI)的参数,并基于它们对它们进行分类。在组的难度,从而能够生成平衡的测试。为此,使用K-means算法将问题分为简单、中等和困难三个难度组。为了验证系统的功能,沿着ENEM模型生成了一个包含180个问题的仿真。可以看出,在37.7%的案例中发生了这种情况。这个命中率并不高,因为算法混淆了在相近类别中的问题的难度。然而,该系统在远距离分类问题方面的准确率为92.8%。
{"title":"A Computational Strategy for Classification of Enem Issues Based on Item Response Theory","authors":"G. H. Nunes, B. A. Oliveira, C. Nametala","doi":"10.22456/2175-2745.92406","DOIUrl":"https://doi.org/10.22456/2175-2745.92406","url":null,"abstract":"The National High School Examination (ENEM) gains each year more importance, as it gradually, replacing traditional vestibular. Many simulations are done almost randomly by teachers or systems, with questions chosen without discretion. With this methodology, if a test needs to be reapplied, it is not possible to recreate it with questions that have the same difficulty as those used in the first evaluation. In this context, the present work presents the development of an ENEM Intelligent Simulation Generation System that calculates the parameters of Item Response Theory (TRI) of questions that have already been applied in ENEM and, based on them, classifies them. in groups of difficulty, thus enabling the generation of balanced tests. For this, the K-means algorithm was used to group the questions into three difficulty groups: easy, medium and difficult. To verify the functioning of the system, a simulation with 180 questions was generated along the ENEM model. It could be seen that in 37.7% of cases this happened. This hit rate was not greater because the algorithm confounded the difficulty of issues that are in close classes. However, the system has a hit rate of 92.8% in the classification of questions that are in distant groups.","PeriodicalId":82472,"journal":{"name":"Research initiative, treatment action : RITA","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81066039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluating the impact of maintenance policies associated to SLA contracts on the dependability of data centers electrical infrastructures 评估与SLA合同相关的维护策略对数据中心电力基础设施可靠性的影响
Pub Date : 2020-01-15 DOI: 10.22456/2175-2745.88822
Felipe Fernandes Lima Melo, J. F. D. S. Junior, G. Callou
Due to the growth of cloud computing,  data center  environment has grown in importance and in use.  Data centers  are responsible for maintaining and processing several critical-value applications. Therefore,  data center  infrastructures must be evaluated in order to improve the high availability and reliability demanded for such environments. This work adopts Stochastic Petri Nets (SPN) to evaluate the impact of maintenance policies on the data center dependability. The main goal is to analyze maintenance policies, associated to  SLA  contracts, and to propose improvements. In order to accomplish this, an optimization strategy that uses Euclidean distance is adopted to indicate the most appropriate solution assuming conflicting requirements (e.g., cost and availability). To illustrate the applicability of the proposed models and approach, this work presents case studies comparing different  SLA  contracts and maintenance policies (preventive and corrective) applied on  data center  electrical infrastructures.
由于云计算的发展,数据中心环境的重要性和使用范围都在不断增加。数据中心负责维护和处理几个关键价值的应用程序。因此,必须对数据中心基础设施进行评估,以提高此类环境所需的高可用性和可靠性。本文采用随机Petri网(SPN)来评估维护策略对数据中心可靠性的影响。主要目标是分析与SLA契约相关的维护策略,并提出改进建议。为了实现这一目标,采用了一种利用欧几里得距离的优化策略来指出假设冲突需求(例如,成本和可用性)的最合适的解决方案。为了说明所提出的模型和方法的适用性,本工作提供了案例研究,比较了应用于数据中心电气基础设施的不同SLA合同和维护策略(预防性和纠正性)。
{"title":"Evaluating the impact of maintenance policies associated to SLA contracts on the dependability of data centers electrical infrastructures","authors":"Felipe Fernandes Lima Melo, J. F. D. S. Junior, G. Callou","doi":"10.22456/2175-2745.88822","DOIUrl":"https://doi.org/10.22456/2175-2745.88822","url":null,"abstract":"Due to the growth of cloud computing,  data center  environment has grown in importance and in use.  Data centers  are responsible for maintaining and processing several critical-value applications. Therefore,  data center  infrastructures must be evaluated in order to improve the high availability and reliability demanded for such environments. This work adopts Stochastic Petri Nets (SPN) to evaluate the impact of maintenance policies on the data center dependability. The main goal is to analyze maintenance policies, associated to  SLA  contracts, and to propose improvements. In order to accomplish this, an optimization strategy that uses Euclidean distance is adopted to indicate the most appropriate solution assuming conflicting requirements (e.g., cost and availability). To illustrate the applicability of the proposed models and approach, this work presents case studies comparing different  SLA  contracts and maintenance policies (preventive and corrective) applied on  data center  electrical infrastructures.","PeriodicalId":82472,"journal":{"name":"Research initiative, treatment action : RITA","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72447612","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Group Labeling Methodology Using Distance-based Data Grouping Algorithms 使用基于距离的数据分组算法的分组标记方法
Pub Date : 2020-01-15 DOI: 10.22456/2175-2745.91414
Francisco das Chagas Imperes Filho, V. Machado, R. Veras, K. Aires, Aline Montenegro Leal Silva
Clustering algorithms are often used to form groups based on the similarity of their members. In this context, understanding a group is just as important as its composition. Identifying, or labeling groups can assist with their interpretation and, consequently, guide decision-making efforts by taking into account the features from each group. Interpreting groups can be beneficial when it is necessary to know what makes an element a part of a given group, what are the main features of a group, and what are the differences and similarities among them. This work describes a method for finding relevant features and generate labels for the elements of each group, uniquely identifying them. This way, our approach solves the problem of finding relevant definitions that can identify groups. The proposed method transforms the standard output of an unsupervised distance-based clustering algorithm into a Pertinence Degree (GP), where each element of the database receives a GP concerning each formed group. The elements with their GPs are used to formulate ranges of values for their attributes. Such ranges can identify the groups uniquely. The labels produced by this approach averaged 94.83% of correct answers for the analyzed databases, allowing a natural interpretation of the generated definitions.
聚类算法通常用于根据其成员的相似性来形成组。在这种情况下,理解一个群体和它的组成一样重要。识别或标记组可以帮助他们解释,因此,通过考虑每个组的特征来指导决策工作。当有必要知道是什么使一个元素成为给定群体的一部分,一个群体的主要特征是什么,以及它们之间的异同是什么时,解释群体是有益的。这项工作描述了一种寻找相关特征的方法,并为每组元素生成标签,唯一地标识它们。通过这种方式,我们的方法解决了寻找可以识别组的相关定义的问题。该方法将基于无监督距离的聚类算法的标准输出转换为相关度(GP),其中数据库的每个元素接收与每个形成的组相关的GP。带有其gp的元素用于为其属性制定值范围。这样的范围可以唯一地标识组。这种方法生成的标签平均为所分析数据库的94.83%的正确答案,允许对生成的定义进行自然解释。
{"title":"Group Labeling Methodology Using Distance-based Data Grouping Algorithms","authors":"Francisco das Chagas Imperes Filho, V. Machado, R. Veras, K. Aires, Aline Montenegro Leal Silva","doi":"10.22456/2175-2745.91414","DOIUrl":"https://doi.org/10.22456/2175-2745.91414","url":null,"abstract":"Clustering algorithms are often used to form groups based on the similarity of their members. In this context, understanding a group is just as important as its composition. Identifying, or labeling groups can assist with their interpretation and, consequently, guide decision-making efforts by taking into account the features from each group. Interpreting groups can be beneficial when it is necessary to know what makes an element a part of a given group, what are the main features of a group, and what are the differences and similarities among them. This work describes a method for finding relevant features and generate labels for the elements of each group, uniquely identifying them. This way, our approach solves the problem of finding relevant definitions that can identify groups. The proposed method transforms the standard output of an unsupervised distance-based clustering algorithm into a Pertinence Degree (GP), where each element of the database receives a GP concerning each formed group. The elements with their GPs are used to formulate ranges of values for their attributes. Such ranges can identify the groups uniquely. The labels produced by this approach averaged 94.83% of correct answers for the analyzed databases, allowing a natural interpretation of the generated definitions.","PeriodicalId":82472,"journal":{"name":"Research initiative, treatment action : RITA","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77802522","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
PHOC Descriptor Applied for Mammography Classification PHOC描述符用于乳腺摄影分类
Pub Date : 2020-01-15 DOI: 10.22456/2175-2745.89115
G. B. Santos, André Tragancin Filho
This paper describes experiments with PHOC (Pyramid Histogram of Color) features descriptor in terms of capacity for representing features presented in breast radiograph (also known as mammography). Patches were taken from regions in digital mammographies, representing benign, cancerous, normal tissues and image’s background. The motivation is to evaluate the proposal in perspective of using it for execution in an inexpensive ordinary desktop computer in places located far from medical experts. The images were obtained from DDSM database and processed producing the feature-dataset used for training an Artificial Neural Network, the results were evaluated by analysis of the learning rate curve and ROC curves, besides these graphical analytical tools the confusion matrix and other quantitative metrics (TPR, FPR and Accuracy) were also extracted and analyzed. The average accuracy  ≈  0 . 8  and the other metrics extracted from results demonstrate that the proposal presents potential for further developments. At the best effort, PHOC was not found in literature for applications in mammographies such as it is proposed here.
本文描述了PHOC(颜色金字塔直方图)特征描述符在表示乳房x线摄影(也称为乳房x线摄影)中呈现的特征的能力方面的实验。从数字乳房x线摄影的区域中取下斑块,分别代表良性、癌性、正常组织和图像背景。这样做的动机是为了在远离医学专家的地方,在一台廉价的普通台式电脑上执行这项提议。从DDSM数据库中获取图像,处理后生成用于训练人工神经网络的特征数据集,通过学习率曲线和ROC曲线分析对结果进行评价,并提取混淆矩阵和其他定量指标(TPR、FPR和Accuracy)进行分析。平均精度≈0。8和从结果中提取的其他指标表明,该提案具有进一步发展的潜力。在最大的努力下,PHOC在文献中没有发现在乳房x线摄影中的应用,如本文所提出的。
{"title":"PHOC Descriptor Applied for Mammography Classification","authors":"G. B. Santos, André Tragancin Filho","doi":"10.22456/2175-2745.89115","DOIUrl":"https://doi.org/10.22456/2175-2745.89115","url":null,"abstract":"This paper describes experiments with PHOC (Pyramid Histogram of Color) features descriptor in terms of capacity for representing features presented in breast radiograph (also known as mammography). Patches were taken from regions in digital mammographies, representing benign, cancerous, normal tissues and image’s background. The motivation is to evaluate the proposal in perspective of using it for execution in an inexpensive ordinary desktop computer in places located far from medical experts. The images were obtained from DDSM database and processed producing the feature-dataset used for training an Artificial Neural Network, the results were evaluated by analysis of the learning rate curve and ROC curves, besides these graphical analytical tools the confusion matrix and other quantitative metrics (TPR, FPR and Accuracy) were also extracted and analyzed. The average accuracy  ≈  0 . 8  and the other metrics extracted from results demonstrate that the proposal presents potential for further developments. At the best effort, PHOC was not found in literature for applications in mammographies such as it is proposed here.","PeriodicalId":82472,"journal":{"name":"Research initiative, treatment action : RITA","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85269356","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An Online Tree-Based Approach for Mining Non-Stationary High-Speed Data Streams 一种基于在线树的非平稳高速数据流挖掘方法
Pub Date : 2020-01-15 DOI: 10.22456/2175-2745.90822
Agustín Alejandro Ortiz Díaz, Isvani Inocencio Frías Blanco, L. M. Mariño, F. Baldo
This paper presents a new learning algorithm for inducing decision trees from data streams. In these domains, large amounts of data are constantly arriving over time, possibly at high speed. The proposed algorithm uses a top-down induction method for building trees, splitting leaf nodes recursively, until none of them can be expanded. The new algorithm combines two split methods in the tree induction. The first method is able to guarantee, with statistical significance, that each split chosen would be the same as that chosen using infinite examples. By doing so, it aims at ensuring that the tree induced online is close to the optimal model. However, this split method often needs too many examples to make a decision about the best split, which delays the accuracy improvement of the online predictive learning model. Therefore, the second method is used to split nodes more quickly, speeding up the tree growth. The second split method is based on the observation that larger trees are able to store more information about the training examples and to represent more complex concepts. The first split method is also used to correct splits previously suggested by the second one, when it has sufficient evidence. Finally, an additional procedure rebuilds the tree model according to the suggestions made with an adequate level of statistical significance. The proposed algorithm is empirically compared with several well-known induction algorithms for learning decision trees from data streams. In the tests it is possible to observe that the proposed algorithm is more competitive in terms of accuracy and model size using various synthetic and real world datasets.
本文提出了一种新的从数据流中归纳决策树的学习算法。在这些领域中,随着时间的推移,大量的数据不断到达,可能是高速的。该算法采用自顶向下的归纳法构建树,递归地分割叶节点,直到不能扩展为止。新算法结合了树归纳法中的两种分割方法。第一种方法能够保证,在统计显著性的情况下,选择的每个分裂都与使用无限个示例选择的分裂相同。这样做的目的是确保在线诱导的树接近最优模型。然而,这种分割方法往往需要太多的样本来决定最佳分割,这延迟了在线预测学习模型精度的提高。因此,采用第二种方法可以更快地分割节点,加快树的生长速度。第二种分割方法是基于这样的观察,即更大的树能够存储更多关于训练示例的信息,并表示更复杂的概念。当有足够的证据时,第一种分裂方法也用于纠正第二种方法先前建议的分裂。最后,根据所提出的建议,在适当的统计显著性水平上重建树模型。将该算法与几种著名的从数据流中学习决策树的归纳算法进行了实证比较。在测试中,可以观察到所提出的算法在使用各种合成数据集和真实世界数据集的准确性和模型大小方面更具竞争力。
{"title":"An Online Tree-Based Approach for Mining Non-Stationary High-Speed Data Streams","authors":"Agustín Alejandro Ortiz Díaz, Isvani Inocencio Frías Blanco, L. M. Mariño, F. Baldo","doi":"10.22456/2175-2745.90822","DOIUrl":"https://doi.org/10.22456/2175-2745.90822","url":null,"abstract":"This paper presents a new learning algorithm for inducing decision trees from data streams. In these domains, large amounts of data are constantly arriving over time, possibly at high speed. The proposed algorithm uses a top-down induction method for building trees, splitting leaf nodes recursively, until none of them can be expanded. The new algorithm combines two split methods in the tree induction. The first method is able to guarantee, with statistical significance, that each split chosen would be the same as that chosen using infinite examples. By doing so, it aims at ensuring that the tree induced online is close to the optimal model. However, this split method often needs too many examples to make a decision about the best split, which delays the accuracy improvement of the online predictive learning model. Therefore, the second method is used to split nodes more quickly, speeding up the tree growth. The second split method is based on the observation that larger trees are able to store more information about the training examples and to represent more complex concepts. The first split method is also used to correct splits previously suggested by the second one, when it has sufficient evidence. Finally, an additional procedure rebuilds the tree model according to the suggestions made with an adequate level of statistical significance. The proposed algorithm is empirically compared with several well-known induction algorithms for learning decision trees from data streams. In the tests it is possible to observe that the proposed algorithm is more competitive in terms of accuracy and model size using various synthetic and real world datasets.","PeriodicalId":82472,"journal":{"name":"Research initiative, treatment action : RITA","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2020-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"75882151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Research initiative, treatment action : RITA
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1