Qing He, Qun Wang, Changying Du, Xu-Dong Ma, Zhong-zhi Shi
{"title":"高维数据的并行超表面分类器","authors":"Qing He, Qun Wang, Changying Du, Xu-Dong Ma, Zhong-zhi Shi","doi":"10.1109/KAM.2010.5646172","DOIUrl":null,"url":null,"abstract":"The enlarging volumes of data resources produced in real world makes classification of very large scale data a challenging task. Therefore, parallel process of very large high dimensional data is very important. Hyper-Surface Classification (HSC) is approved to be an effective and efficient classification algorithm to handle two and three dimensional data. Though HSC can be extended to deal with high dimensional data with dimension reduction or ensemble techniques, it is not trivial to tackle high dimensional data directly. Inspired by the decision tree idea, an improvement of HSC is proposed to deal with high dimensional data directly in this work. Furthermore, we parallelize the improved HSC algorithm (PHSC) to handle large scale high dimensional data based on MapReduce framework, which is a current and powerful parallel programming technique used in many fields. Experimental results show that the parallel improved HSC algorithm not only can directly deal with high dimensional data, but also can handle large scale data set. Furthermore, the evaluation criterions of scaleup, speedup and sizeup validate its efficiency.","PeriodicalId":160788,"journal":{"name":"2010 Third International Symposium on Knowledge Acquisition and Modeling","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A parallel Hyper-Surface Classifier for high dimensional data\",\"authors\":\"Qing He, Qun Wang, Changying Du, Xu-Dong Ma, Zhong-zhi Shi\",\"doi\":\"10.1109/KAM.2010.5646172\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The enlarging volumes of data resources produced in real world makes classification of very large scale data a challenging task. Therefore, parallel process of very large high dimensional data is very important. Hyper-Surface Classification (HSC) is approved to be an effective and efficient classification algorithm to handle two and three dimensional data. Though HSC can be extended to deal with high dimensional data with dimension reduction or ensemble techniques, it is not trivial to tackle high dimensional data directly. Inspired by the decision tree idea, an improvement of HSC is proposed to deal with high dimensional data directly in this work. Furthermore, we parallelize the improved HSC algorithm (PHSC) to handle large scale high dimensional data based on MapReduce framework, which is a current and powerful parallel programming technique used in many fields. Experimental results show that the parallel improved HSC algorithm not only can directly deal with high dimensional data, but also can handle large scale data set. Furthermore, the evaluation criterions of scaleup, speedup and sizeup validate its efficiency.\",\"PeriodicalId\":160788,\"journal\":{\"name\":\"2010 Third International Symposium on Knowledge Acquisition and Modeling\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-11-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 Third International Symposium on Knowledge Acquisition and Modeling\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/KAM.2010.5646172\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Third International Symposium on Knowledge Acquisition and Modeling","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KAM.2010.5646172","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A parallel Hyper-Surface Classifier for high dimensional data
The enlarging volumes of data resources produced in real world makes classification of very large scale data a challenging task. Therefore, parallel process of very large high dimensional data is very important. Hyper-Surface Classification (HSC) is approved to be an effective and efficient classification algorithm to handle two and three dimensional data. Though HSC can be extended to deal with high dimensional data with dimension reduction or ensemble techniques, it is not trivial to tackle high dimensional data directly. Inspired by the decision tree idea, an improvement of HSC is proposed to deal with high dimensional data directly in this work. Furthermore, we parallelize the improved HSC algorithm (PHSC) to handle large scale high dimensional data based on MapReduce framework, which is a current and powerful parallel programming technique used in many fields. Experimental results show that the parallel improved HSC algorithm not only can directly deal with high dimensional data, but also can handle large scale data set. Furthermore, the evaluation criterions of scaleup, speedup and sizeup validate its efficiency.