减少遗传规划中训练案例的数量

2022 IEEE Congress on Evolutionary Computation (CEC) Pub Date : 2022-07-18 DOI:10.1109/CEC55065.2022.9870327

Giacomo Zoppi, L. Vanneschi, M. Giacobini

{"title":"减少遗传规划中训练案例的数量","authors":"Giacomo Zoppi, L. Vanneschi, M. Giacobini","doi":"10.1109/CEC55065.2022.9870327","DOIUrl":null,"url":null,"abstract":"In the field of Machine Learning, one of the most common and discussed questions is how to choose an adequate number of data observations, in order to train our models satisfactorily. In other words, find what is the right amount of data needed to create a model, that is neither underfitted nor overfitted, but instead is able to achieve a reasonable generalization ability. The problem grows in importance when we consider Genetic Programming, where fitness evaluation is often rather slow. Therefore, finding the minimum amount of data that enables us to discover the solution to a given problem could bring significant benefits. Using the notion of entropy in a dataset, we seek to understand the information gain obtainable from each additional data point. We then look for the smallest percentage of data that corresponds to enough information to yield satisfactory results. We present, as a first step, an example derived from the state of art. Then, we question a relevant part of our procedure and introduce two case studies to experimentally validate our theoretical hypothesis.","PeriodicalId":153241,"journal":{"name":"2022 IEEE Congress on Evolutionary Computation (CEC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Reducing the Number of Training Cases in Genetic Programming\",\"authors\":\"Giacomo Zoppi, L. Vanneschi, M. Giacobini\",\"doi\":\"10.1109/CEC55065.2022.9870327\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the field of Machine Learning, one of the most common and discussed questions is how to choose an adequate number of data observations, in order to train our models satisfactorily. In other words, find what is the right amount of data needed to create a model, that is neither underfitted nor overfitted, but instead is able to achieve a reasonable generalization ability. The problem grows in importance when we consider Genetic Programming, where fitness evaluation is often rather slow. Therefore, finding the minimum amount of data that enables us to discover the solution to a given problem could bring significant benefits. Using the notion of entropy in a dataset, we seek to understand the information gain obtainable from each additional data point. We then look for the smallest percentage of data that corresponds to enough information to yield satisfactory results. We present, as a first step, an example derived from the state of art. Then, we question a relevant part of our procedure and introduce two case studies to experimentally validate our theoretical hypothesis.\",\"PeriodicalId\":153241,\"journal\":{\"name\":\"2022 IEEE Congress on Evolutionary Computation (CEC)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Congress on Evolutionary Computation (CEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CEC55065.2022.9870327\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Congress on Evolutionary Computation (CEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEC55065.2022.9870327","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在机器学习领域，最常见和讨论的问题之一是如何选择足够数量的数据观察，以令人满意地训练我们的模型。换句话说，找到创建模型所需的合适数据量，既不是欠拟合也不是过拟合，而是能够实现合理的泛化能力。当我们考虑遗传规划时，这个问题变得更加重要，因为遗传规划的适应度评估通常相当缓慢。因此，找到使我们能够发现给定问题的解决方案的最小数据量可以带来显著的好处。使用数据集中熵的概念，我们试图理解从每个附加数据点获得的信息增益。然后，我们寻找与足够信息相对应的最小数据百分比，以产生令人满意的结果。作为第一步，我们提出了一个源自艺术现状的例子。然后，我们对程序的相关部分提出质疑，并引入两个案例研究来实验验证我们的理论假设。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Reducing the Number of Training Cases in Genetic Programming

In the field of Machine Learning, one of the most common and discussed questions is how to choose an adequate number of data observations, in order to train our models satisfactorily. In other words, find what is the right amount of data needed to create a model, that is neither underfitted nor overfitted, but instead is able to achieve a reasonable generalization ability. The problem grows in importance when we consider Genetic Programming, where fitness evaluation is often rather slow. Therefore, finding the minimum amount of data that enables us to discover the solution to a given problem could bring significant benefits. Using the notion of entropy in a dataset, we seek to understand the information gain obtainable from each additional data point. We then look for the smallest percentage of data that corresponds to enough information to yield satisfactory results. We present, as a first step, an example derived from the state of art. Then, we question a relevant part of our procedure and introduce two case studies to experimentally validate our theoretical hypothesis.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE Congress on Evolutionary Computation (CEC)

自引率

0.00%

发文量