Training with Input Selection and Testing (TWIST) Algorithm: A Significant Advance in Pattern Recognition Performance of Machine Learning

智能学习系统与应用(英文) Pub Date : 2013-02-22 DOI:10.4236/JILSA.2013.51004

M. Buscema, Marco Breda, W. Lodwick

{"title":"Training with Input Selection and Testing (TWIST) Algorithm: A Significant Advance in Pattern Recognition Performance of Machine Learning","authors":"M. Buscema, Marco Breda, W. Lodwick","doi":"10.4236/JILSA.2013.51004","DOIUrl":null,"url":null,"abstract":"This article shows the efficacy of TWIST, a methodology for the design of training and testing data subsets extracted from given dataset associated with a problem to be solved via ANNs. The methodology we present is embedded in algorithms and actualized in computer software. Our methodology as implemented in software is compared to the current standard methods of random cross validation: 10-Fold CV, random split into two subsets and the more advanced T&T. For each strategy, 13 learning machines, representing different families of the main algorithms, have been trained and tested. All algorithms were implemented using the well-known WEKA software package. On one hand a falsification test with randomly distributed dependent variable has been used to show how T&T and TWIST behaves as the other two strategies: when there is no information available on the datasets they are equivalent. On the other hand, using the real Statlog (Heart) dataset, a strong difference in accuracy is experimentally proved. Our results show that TWIST is superior to current methods. Pairs of subsets with similar probability density functions are generated, without coding noise, according to an optimal strategy that extracts the most useful information for pattern classification.","PeriodicalId":69452,"journal":{"name":"智能学习系统与应用(英文)","volume":"05 1","pages":"29-38"},"PeriodicalIF":0.0000,"publicationDate":"2013-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"40","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"智能学习系统与应用(英文)","FirstCategoryId":"1093","ListUrlMain":"https://doi.org/10.4236/JILSA.2013.51004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 40

Abstract

This article shows the efficacy of TWIST, a methodology for the design of training and testing data subsets extracted from given dataset associated with a problem to be solved via ANNs. The methodology we present is embedded in algorithms and actualized in computer software. Our methodology as implemented in software is compared to the current standard methods of random cross validation: 10-Fold CV, random split into two subsets and the more advanced T&T. For each strategy, 13 learning machines, representing different families of the main algorithms, have been trained and tested. All algorithms were implemented using the well-known WEKA software package. On one hand a falsification test with randomly distributed dependent variable has been used to show how T&T and TWIST behaves as the other two strategies: when there is no information available on the datasets they are equivalent. On the other hand, using the real Statlog (Heart) dataset, a strong difference in accuracy is experimentally proved. Our results show that TWIST is superior to current methods. Pairs of subsets with similar probability density functions are generated, without coding noise, according to an optimal strategy that extracts the most useful information for pattern classification.

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

训练与输入选择和测试(TWIST)算法:机器学习模式识别性能的重大进展

本文展示了TWIST的有效性，TWIST是从给定数据集中提取的训练和测试数据子集的设计方法，该数据集与通过人工神经网络解决的问题相关。我们提出的方法嵌入在算法中，并在计算机软件中实现。我们在软件中实现的方法与目前随机交叉验证的标准方法进行了比较:10倍CV，随机分成两个子集和更先进的T&T。对于每种策略，代表不同主要算法族的13台学习机器都经过了训练和测试。所有算法均使用知名的WEKA软件包实现。一方面，随机分布因变量的证伪检验已用于显示T&T和TWIST如何表现为其他两个策略:当数据集上没有可用信息时，它们是等效的。另一方面，使用真实的Statlog (Heart)数据集，实验证明了准确性的巨大差异。我们的结果表明，TWIST优于现有的方法。根据提取最有用的模式分类信息的最优策略，生成具有相似概率密度函数的子集对，且没有编码噪声。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文去求助

来源期刊

智能学习系统与应用(英文)

自引率

0.00%

发文量

135