Amin Nazir Nagiwale, Manish R. Umale, Aditya Sinha
{"title":"Design of self-adjusting algorithm for data-intensive MapReduce applications","authors":"Amin Nazir Nagiwale, Manish R. Umale, Aditya Sinha","doi":"10.1109/ICESA.2015.7503401","DOIUrl":null,"url":null,"abstract":"MapReduce framework is suitable for dataintensive applications for large scale processing, but these classes of applications like machine learning algorithms, graph algorithms, sentiment analysis algorithms, etc. have dealt with skewness, diversity of data to adapt changes in real time. For example, it is difficult to adapt to real time changes in training data/corpus for big data applications like Sentiment Analysis, Email spam detection, and log file analysis. To achieve this goal, we have proposed an algorithm that is based on concepts of functional programming and self-adjusting computations that supports effectively accepting changes for system ranging from making training set/ language corpus domain-specific, amortized analysis of algorithm to change in storage, network and architecture design for distributed systems. For experimental purposes, we have implemented Selfie, self -adjusting algorithm with Splay tree for Twitter Sentiment analysis, which makes system responsible for skewness in access pattern and diversity in trends. Proposed algorithm can be helpful for other iterative and interactive applications that faces machine learning challenges like feature generation and selection, over-fitting, explain and improve models to effectively deal with large dynamic data sets.","PeriodicalId":259816,"journal":{"name":"2015 International Conference on Energy Systems and Applications","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Energy Systems and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICESA.2015.7503401","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
MapReduce framework is suitable for dataintensive applications for large scale processing, but these classes of applications like machine learning algorithms, graph algorithms, sentiment analysis algorithms, etc. have dealt with skewness, diversity of data to adapt changes in real time. For example, it is difficult to adapt to real time changes in training data/corpus for big data applications like Sentiment Analysis, Email spam detection, and log file analysis. To achieve this goal, we have proposed an algorithm that is based on concepts of functional programming and self-adjusting computations that supports effectively accepting changes for system ranging from making training set/ language corpus domain-specific, amortized analysis of algorithm to change in storage, network and architecture design for distributed systems. For experimental purposes, we have implemented Selfie, self -adjusting algorithm with Splay tree for Twitter Sentiment analysis, which makes system responsible for skewness in access pattern and diversity in trends. Proposed algorithm can be helpful for other iterative and interactive applications that faces machine learning challenges like feature generation and selection, over-fitting, explain and improve models to effectively deal with large dynamic data sets.