基于智能MapReduce的演化数据流实例标记框架

2013 IEEE 5th International Conference on Cloud Computing Technology and Science Pub Date : 2013-12-02 DOI:10.1109/CloudCom.2013.152

Ahsanul Haque, Brandon Parker, L. Khan, B. Thuraisingham

{"title":"基于智能MapReduce的演化数据流实例标记框架","authors":"Ahsanul Haque, Brandon Parker, L. Khan, B. Thuraisingham","doi":"10.1109/CloudCom.2013.152","DOIUrl":null,"url":null,"abstract":"In our current work, we have proposed a multi-tiered ensemble based robust method to address all of the challenges of labeling instances in evolving data stream. Bottleneck of our current work is, it needs to build ADABOOST ensembles for each of the numeric features. This can face scalability issue as number of features can be very large at times in data stream. In this paper, we propose an intelligent approach to build these large number of ADABOOST ensembles with MapReduce based parallelism. We show that, this approach can help our base method to achieve significant scalability without compromising classification accuracy. We analyze different aspects of our design to depict advantages and disadvantages of the approach. We also compare and analyze performance of the proposed approach in terms of execution time, speedup and scale up.","PeriodicalId":198053,"journal":{"name":"2013 IEEE 5th International Conference on Cloud Computing Technology and Science","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Intelligent MapReduce Based Framework for Labeling Instances in Evolving Data Stream\",\"authors\":\"Ahsanul Haque, Brandon Parker, L. Khan, B. Thuraisingham\",\"doi\":\"10.1109/CloudCom.2013.152\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In our current work, we have proposed a multi-tiered ensemble based robust method to address all of the challenges of labeling instances in evolving data stream. Bottleneck of our current work is, it needs to build ADABOOST ensembles for each of the numeric features. This can face scalability issue as number of features can be very large at times in data stream. In this paper, we propose an intelligent approach to build these large number of ADABOOST ensembles with MapReduce based parallelism. We show that, this approach can help our base method to achieve significant scalability without compromising classification accuracy. We analyze different aspects of our design to depict advantages and disadvantages of the approach. We also compare and analyze performance of the proposed approach in terms of execution time, speedup and scale up.\",\"PeriodicalId\":198053,\"journal\":{\"name\":\"2013 IEEE 5th International Conference on Cloud Computing Technology and Science\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 IEEE 5th International Conference on Cloud Computing Technology and Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CloudCom.2013.152\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 5th International Conference on Cloud Computing Technology and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CloudCom.2013.152","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

在我们目前的工作中，我们提出了一种基于多层集成的鲁棒方法来解决在不断发展的数据流中标记实例的所有挑战。我们目前工作的瓶颈是，它需要为每个数值特征构建ADABOOST集成。这可能会面临可伸缩性问题，因为数据流中的特性数量有时会非常大。在本文中，我们提出了一种基于MapReduce并行性的智能方法来构建这些大量的ADABOOST集成。我们表明，这种方法可以帮助我们的基本方法在不影响分类精度的情况下实现显著的可扩展性。我们分析了设计的不同方面，以描述该方法的优点和缺点。我们还比较和分析了所提出的方法在执行时间、加速和扩展方面的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Intelligent MapReduce Based Framework for Labeling Instances in Evolving Data Stream

In our current work, we have proposed a multi-tiered ensemble based robust method to address all of the challenges of labeling instances in evolving data stream. Bottleneck of our current work is, it needs to build ADABOOST ensembles for each of the numeric features. This can face scalability issue as number of features can be very large at times in data stream. In this paper, we propose an intelligent approach to build these large number of ADABOOST ensembles with MapReduce based parallelism. We show that, this approach can help our base method to achieve significant scalability without compromising classification accuracy. We analyze different aspects of our design to depict advantages and disadvantages of the approach. We also compare and analyze performance of the proposed approach in terms of execution time, speedup and scale up.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2013 IEEE 5th International Conference on Cloud Computing Technology and Science

自引率

0.00%

发文量

期刊最新文献

A Feasibility Study of Host-Level Contention Detection by Guest Virtual Machines Porting Grid Applications to the Cloud with Schlouder Towards Data Handling Requirements-Aware Cloud Computing Providing Desirable Data to Users When Integrating Wireless Sensor Networks with Mobile Cloud MELA: Monitoring and Analyzing Elasticity of Cloud Services