使用狄利克雷过程数据生成来维护不平衡的高度依赖的医疗数据

2011 Sixth International Conference on Digital Information Management Pub Date : 2011-12-01 DOI:10.1109/ICDIM.2011.6093359

Tieta Antaresti, M. I. Fanany, A. M. Arymurthy

{"title":"使用狄利克雷过程数据生成来维护不平衡的高度依赖的医疗数据","authors":"Tieta Antaresti, M. I. Fanany, A. M. Arymurthy","doi":"10.1109/ICDIM.2011.6093359","DOIUrl":null,"url":null,"abstract":"The existence of imbalanced data between one class and another class is an important issue to be considered in a classification problem. One of the well-known data balancing technique is the artificial oversampling, which increase the size of datasets. In this research, multinomial classification was applied to classify some recorded features obtained from a single ECG (electrocardiograph) sensor. Therefore, a Dirichlet process, a dirichlet distribution of cumulative distribution function of each data partition, was needed to model the distribution of the new generated data by also considering the statistical properties of the previous data. Data balancing process had given the result of 77.21% classification accuracy (CA), and 90.9% area under ROC curve (AUC).","PeriodicalId":355775,"journal":{"name":"2011 Sixth International Conference on Digital Information Management","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Maintaining imbalance highly dependent medical data using dirichlet process data generation\",\"authors\":\"Tieta Antaresti, M. I. Fanany, A. M. Arymurthy\",\"doi\":\"10.1109/ICDIM.2011.6093359\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The existence of imbalanced data between one class and another class is an important issue to be considered in a classification problem. One of the well-known data balancing technique is the artificial oversampling, which increase the size of datasets. In this research, multinomial classification was applied to classify some recorded features obtained from a single ECG (electrocardiograph) sensor. Therefore, a Dirichlet process, a dirichlet distribution of cumulative distribution function of each data partition, was needed to model the distribution of the new generated data by also considering the statistical properties of the previous data. Data balancing process had given the result of 77.21% classification accuracy (CA), and 90.9% area under ROC curve (AUC).\",\"PeriodicalId\":355775,\"journal\":{\"name\":\"2011 Sixth International Conference on Digital Information Management\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 Sixth International Conference on Digital Information Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDIM.2011.6093359\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 Sixth International Conference on Digital Information Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDIM.2011.6093359","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

一类数据与另一类数据之间是否存在不平衡是分类问题中需要考虑的一个重要问题。人工过采样是众所周知的数据平衡技术之一，它增加了数据集的大小。在这项研究中，多项分类被用于分类从单个心电(心电图)传感器获得的一些记录特征。因此，需要Dirichlet过程，即每个数据分区的累积分布函数的Dirichlet分布，在考虑之前数据的统计特性的情况下，对新生成数据的分布进行建模。经数据平衡处理，分类准确率(CA)为77.21%，ROC曲线下面积(AUC)为90.9%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Maintaining imbalance highly dependent medical data using dirichlet process data generation

The existence of imbalanced data between one class and another class is an important issue to be considered in a classification problem. One of the well-known data balancing technique is the artificial oversampling, which increase the size of datasets. In this research, multinomial classification was applied to classify some recorded features obtained from a single ECG (electrocardiograph) sensor. Therefore, a Dirichlet process, a dirichlet distribution of cumulative distribution function of each data partition, was needed to model the distribution of the new generated data by also considering the statistical properties of the previous data. Data balancing process had given the result of 77.21% classification accuracy (CA), and 90.9% area under ROC curve (AUC).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 Sixth International Conference on Digital Information Management

自引率

0.00%

发文量

期刊最新文献

International program committee Filtering XML content for publication and presentation on the web Automatic text classification and focused crawling Chart image understanding and numerical data extraction Converting Myanmar printed document image into machine understandable text format