{"title":"A Characterization and Comparison of Spatial-Temporal Applications and Internet Big Data Benchmarks","authors":"Wen Xiong, Kun Yang, Yanhui Zhu","doi":"10.1109/GEOINFORMATICS.2018.8557164","DOIUrl":null,"url":null,"abstract":"Urban traffic data analysis platform is an important infrastructure to a modern city. As the spatial-temporal data produced in traffic transportation systems explosively growth, the operators in traffic field are trying to adopt the new emerging big data solutions born in the internet area. However, it is hard to find a high cost/performance solution to build this platform because diverse combinations of hardware and software configuration. Currently, the operators selecting solutions depend on simple evaluation results based on internet benchmarks such as terasort. Two issues including: (1) is it appropriate that evaluating a solution for spatial-temporal applications by internet benchmark; and (2) what is the characteristic of spatial-temporal application and the potential optimization measurements, have never been fully explored. We address this issue by a novel workload characterization tool, called Extensible Metric Importance Analysis (EMIA), for big data applications. The key idea is a performance model based on ensemble learning, which takes the program metrics as the input, outputs the performance metric such as execution time, and ranks these metrics as their corresponding importance. Based on EMIA, we apply principal component analysis (PCA) to program behaviors of five representative spatial-temporal applications and nine popular internet big data benchmarks. Experimental results show that spatial-temporary applications present unique characteristics and it is unreasonable to evaluate solutions for spatial-temporary applications by internet benchmarks. Moreover, we optimize spatial-temporary applications via applying measurements to the key factors identified by EMIA, achieving obviously performance improvement.","PeriodicalId":142380,"journal":{"name":"2018 26th International Conference on Geoinformatics","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 26th International Conference on Geoinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GEOINFORMATICS.2018.8557164","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Urban traffic data analysis platform is an important infrastructure to a modern city. As the spatial-temporal data produced in traffic transportation systems explosively growth, the operators in traffic field are trying to adopt the new emerging big data solutions born in the internet area. However, it is hard to find a high cost/performance solution to build this platform because diverse combinations of hardware and software configuration. Currently, the operators selecting solutions depend on simple evaluation results based on internet benchmarks such as terasort. Two issues including: (1) is it appropriate that evaluating a solution for spatial-temporal applications by internet benchmark; and (2) what is the characteristic of spatial-temporal application and the potential optimization measurements, have never been fully explored. We address this issue by a novel workload characterization tool, called Extensible Metric Importance Analysis (EMIA), for big data applications. The key idea is a performance model based on ensemble learning, which takes the program metrics as the input, outputs the performance metric such as execution time, and ranks these metrics as their corresponding importance. Based on EMIA, we apply principal component analysis (PCA) to program behaviors of five representative spatial-temporal applications and nine popular internet big data benchmarks. Experimental results show that spatial-temporary applications present unique characteristics and it is unreasonable to evaluate solutions for spatial-temporary applications by internet benchmarks. Moreover, we optimize spatial-temporary applications via applying measurements to the key factors identified by EMIA, achieving obviously performance improvement.