{"title":"An Automated Unsupervised Discretization Method: A Novel Approach","authors":"H. Drias, Hadjer Moulai, Y. Drias","doi":"10.1142/s2196888820500177","DOIUrl":null,"url":null,"abstract":"In this paper, for the first time, a novel discretization scheme is proposed aiming at enabling scalability but also at least three other strong challenges. It is based on a Left-to-Right (LR) scanning process, which partitions the input stream into intervals. This task can be implemented by an algorithm or by using a generator that builds automatically the discretization program. We focus especially on unsupervised discretization and design a method called Usupervised Left to Right Discretization (ULR-Discr). Extensive experiments were conducted using various cut-point functions on small, large and medical public datasets. First, ULR-Discr variants under different statistics are compared between themselves with the aim at observing the impact of the cut-point functions on accuracy and runtime. Then the proposed method is compared to traditional and recent techniques for classification. The result is that the classification accuracy is highly improved when using our method for discretization.","PeriodicalId":256649,"journal":{"name":"Vietnam. J. Comput. Sci.","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Vietnam. J. Comput. Sci.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/s2196888820500177","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, for the first time, a novel discretization scheme is proposed aiming at enabling scalability but also at least three other strong challenges. It is based on a Left-to-Right (LR) scanning process, which partitions the input stream into intervals. This task can be implemented by an algorithm or by using a generator that builds automatically the discretization program. We focus especially on unsupervised discretization and design a method called Usupervised Left to Right Discretization (ULR-Discr). Extensive experiments were conducted using various cut-point functions on small, large and medical public datasets. First, ULR-Discr variants under different statistics are compared between themselves with the aim at observing the impact of the cut-point functions on accuracy and runtime. Then the proposed method is compared to traditional and recent techniques for classification. The result is that the classification accuracy is highly improved when using our method for discretization.
在本文中,首次提出了一种新的离散化方案,旨在实现可扩展性,但也至少有三个其他强大的挑战。它基于从左到右(LR)扫描过程,该过程将输入流划分为间隔。这项任务可以通过算法或使用自动构建离散化程序的生成器来实现。我们特别关注无监督离散化,并设计了一种称为ussupervised Left to Right discreization (ULR-Discr)的方法。在小型、大型和医疗公共数据集上使用各种切点函数进行了广泛的实验。首先,对不同统计量下的ULR-Discr变量进行比较,观察截点函数对准确率和运行时间的影响。然后将该方法与传统和最新的分类技术进行了比较。结果表明,采用该方法进行离散化处理后,分类精度得到了很大的提高。