{"title":"多组分类中混合数据建模的二值化策略","authors":"Youssef Masmoudi, M. Turkay, H. Chabchoub","doi":"10.1109/ICADLT.2013.6568483","DOIUrl":null,"url":null,"abstract":"This paper presents a binarization pre-processing strategy for mixed datasets. We propose that the use of binary attributes for representing nominal and integer data is beneficial for classification accuracy. We also describe a procedure to convert integer and nominal data into binary attributes. Expectation- Maximization (EM) clustering algorithms was applied to classify the values of the attributes with a wide range to use a small number of binary attributes. Once the data set is pre-processed, we use the Support Vector Machine (LibSVM) for classification. The proposed method was tested on datasets from the literature. We demonstrate the improved accuracy and efficiency of presented binarization strategy for modelling mixed and complex data in comparison to the classification of the original dataset, nominal dataset and binary dataset.","PeriodicalId":269509,"journal":{"name":"2013 International Conference on Advanced Logistics and Transport","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"A binarization strategy for modelling mixed data in multigroup classification\",\"authors\":\"Youssef Masmoudi, M. Turkay, H. Chabchoub\",\"doi\":\"10.1109/ICADLT.2013.6568483\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a binarization pre-processing strategy for mixed datasets. We propose that the use of binary attributes for representing nominal and integer data is beneficial for classification accuracy. We also describe a procedure to convert integer and nominal data into binary attributes. Expectation- Maximization (EM) clustering algorithms was applied to classify the values of the attributes with a wide range to use a small number of binary attributes. Once the data set is pre-processed, we use the Support Vector Machine (LibSVM) for classification. The proposed method was tested on datasets from the literature. We demonstrate the improved accuracy and efficiency of presented binarization strategy for modelling mixed and complex data in comparison to the classification of the original dataset, nominal dataset and binary dataset.\",\"PeriodicalId\":269509,\"journal\":{\"name\":\"2013 International Conference on Advanced Logistics and Transport\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-05-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Advanced Logistics and Transport\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICADLT.2013.6568483\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Advanced Logistics and Transport","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICADLT.2013.6568483","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A binarization strategy for modelling mixed data in multigroup classification
This paper presents a binarization pre-processing strategy for mixed datasets. We propose that the use of binary attributes for representing nominal and integer data is beneficial for classification accuracy. We also describe a procedure to convert integer and nominal data into binary attributes. Expectation- Maximization (EM) clustering algorithms was applied to classify the values of the attributes with a wide range to use a small number of binary attributes. Once the data set is pre-processed, we use the Support Vector Machine (LibSVM) for classification. The proposed method was tested on datasets from the literature. We demonstrate the improved accuracy and efficiency of presented binarization strategy for modelling mixed and complex data in comparison to the classification of the original dataset, nominal dataset and binary dataset.