{"title":"使用狄利克雷过程数据生成来维护不平衡的高度依赖的医疗数据","authors":"Tieta Antaresti, M. I. Fanany, A. M. Arymurthy","doi":"10.1109/ICDIM.2011.6093359","DOIUrl":null,"url":null,"abstract":"The existence of imbalanced data between one class and another class is an important issue to be considered in a classification problem. One of the well-known data balancing technique is the artificial oversampling, which increase the size of datasets. In this research, multinomial classification was applied to classify some recorded features obtained from a single ECG (electrocardiograph) sensor. Therefore, a Dirichlet process, a dirichlet distribution of cumulative distribution function of each data partition, was needed to model the distribution of the new generated data by also considering the statistical properties of the previous data. Data balancing process had given the result of 77.21% classification accuracy (CA), and 90.9% area under ROC curve (AUC).","PeriodicalId":355775,"journal":{"name":"2011 Sixth International Conference on Digital Information Management","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Maintaining imbalance highly dependent medical data using dirichlet process data generation\",\"authors\":\"Tieta Antaresti, M. I. Fanany, A. M. Arymurthy\",\"doi\":\"10.1109/ICDIM.2011.6093359\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The existence of imbalanced data between one class and another class is an important issue to be considered in a classification problem. One of the well-known data balancing technique is the artificial oversampling, which increase the size of datasets. In this research, multinomial classification was applied to classify some recorded features obtained from a single ECG (electrocardiograph) sensor. Therefore, a Dirichlet process, a dirichlet distribution of cumulative distribution function of each data partition, was needed to model the distribution of the new generated data by also considering the statistical properties of the previous data. Data balancing process had given the result of 77.21% classification accuracy (CA), and 90.9% area under ROC curve (AUC).\",\"PeriodicalId\":355775,\"journal\":{\"name\":\"2011 Sixth International Conference on Digital Information Management\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 Sixth International Conference on Digital Information Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDIM.2011.6093359\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Sixth International Conference on Digital Information Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDIM.2011.6093359","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Maintaining imbalance highly dependent medical data using dirichlet process data generation
The existence of imbalanced data between one class and another class is an important issue to be considered in a classification problem. One of the well-known data balancing technique is the artificial oversampling, which increase the size of datasets. In this research, multinomial classification was applied to classify some recorded features obtained from a single ECG (electrocardiograph) sensor. Therefore, a Dirichlet process, a dirichlet distribution of cumulative distribution function of each data partition, was needed to model the distribution of the new generated data by also considering the statistical properties of the previous data. Data balancing process had given the result of 77.21% classification accuracy (CA), and 90.9% area under ROC curve (AUC).