Ahmad Taufiq Akbar, Rochmat Husaini, Bagus Muhammad Akbar, S. Saifullah
{"title":"提出了一种基于Myers-Briggs型指标的血型分类数据不平衡处理方法","authors":"Ahmad Taufiq Akbar, Rochmat Husaini, Bagus Muhammad Akbar, S. Saifullah","doi":"10.14710/JTSISKOM.2020.13625","DOIUrl":null,"url":null,"abstract":"Blood type still leads to an assumption about its relation to some personality aspects. This study observes preprocessing methods for improving the classification accuracy of MBTI data to determine blood type. The training and testing data use 250 data from the MBTI questionnaire answers given by 250 respondents. The classification uses the k-Nearest Neighbor (k-NN) algorithm. Without preprocessing, k-NN results in about 32 % accuracy, so it needs some preprocessing to handle data imbalance before the classification. The proposed preprocessing consists of two-stage, the first stage is the unsupervised resample, and the second is the supervised resample. For the validation, it uses ten cross-validations. The result of k-Nearest Neighbor classification after using these proposed preprocessing stages has finally increased the accuracy, F-score, and recall significantly.","PeriodicalId":56231,"journal":{"name":"Jurnal Teknologi dan Sistem Komputer","volume":"8 1","pages":"276-283"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"A proposed method for handling an imbalance data in classification of blood type based on Myers-Briggs type indicator\",\"authors\":\"Ahmad Taufiq Akbar, Rochmat Husaini, Bagus Muhammad Akbar, S. Saifullah\",\"doi\":\"10.14710/JTSISKOM.2020.13625\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Blood type still leads to an assumption about its relation to some personality aspects. This study observes preprocessing methods for improving the classification accuracy of MBTI data to determine blood type. The training and testing data use 250 data from the MBTI questionnaire answers given by 250 respondents. The classification uses the k-Nearest Neighbor (k-NN) algorithm. Without preprocessing, k-NN results in about 32 % accuracy, so it needs some preprocessing to handle data imbalance before the classification. The proposed preprocessing consists of two-stage, the first stage is the unsupervised resample, and the second is the supervised resample. For the validation, it uses ten cross-validations. The result of k-Nearest Neighbor classification after using these proposed preprocessing stages has finally increased the accuracy, F-score, and recall significantly.\",\"PeriodicalId\":56231,\"journal\":{\"name\":\"Jurnal Teknologi dan Sistem Komputer\",\"volume\":\"8 1\",\"pages\":\"276-283\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Jurnal Teknologi dan Sistem Komputer\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.14710/JTSISKOM.2020.13625\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Teknologi dan Sistem Komputer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14710/JTSISKOM.2020.13625","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A proposed method for handling an imbalance data in classification of blood type based on Myers-Briggs type indicator
Blood type still leads to an assumption about its relation to some personality aspects. This study observes preprocessing methods for improving the classification accuracy of MBTI data to determine blood type. The training and testing data use 250 data from the MBTI questionnaire answers given by 250 respondents. The classification uses the k-Nearest Neighbor (k-NN) algorithm. Without preprocessing, k-NN results in about 32 % accuracy, so it needs some preprocessing to handle data imbalance before the classification. The proposed preprocessing consists of two-stage, the first stage is the unsupervised resample, and the second is the supervised resample. For the validation, it uses ten cross-validations. The result of k-Nearest Neighbor classification after using these proposed preprocessing stages has finally increased the accuracy, F-score, and recall significantly.