Y. D. Cai, B. Riedl, R. Ratan, Cuihua Shen, A. Picot
{"title":"FeatureSelector:一个支持xsede的大规模游戏日志分析工具","authors":"Y. D. Cai, B. Riedl, R. Ratan, Cuihua Shen, A. Picot","doi":"10.1145/2616498.2616511","DOIUrl":null,"url":null,"abstract":"Due to the huge volume and extreme complexity in online game data collections, selecting essential features for the analysis of massive game logs is not only necessary, but also challenging. This study develops and implements a new XSEDE-enabled tool, FeatureSelector, which uses the parallel processing techniques on high performance computers to perform feature selection. By calculating probability distance measures, based on K-L divergence, this tool quantifies the distance between variables in data sets, and provides guidance for feature selection in massive game log analysis. This tool has helped researchers choose the high-quality and discriminative features from over 300 variables, and select the top pairs of countries with the greatest differences from 231 country-pairs in a 500 GB game log data set. Our study shows that (1) K-L divergence is a good measure for correctly and efficiently selecting important features, and (2) the high performance computing platform supported by XSEDE has substantially accelerated the feature selection processes by over 30 times. Besides demonstrating the effectiveness of FeatureSelector in a cross-country analysis using high performance computing, this study also highlights some lessons learned for feature selection in social science research and some experience on applying parallel processing techniques in intensive data analysis.","PeriodicalId":93364,"journal":{"name":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","volume":"78 1","pages":"17:1-17:7"},"PeriodicalIF":0.0000,"publicationDate":"2014-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"FeatureSelector: an XSEDE-Enabled Tool for Massive Game Log Analysis\",\"authors\":\"Y. D. Cai, B. Riedl, R. Ratan, Cuihua Shen, A. Picot\",\"doi\":\"10.1145/2616498.2616511\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the huge volume and extreme complexity in online game data collections, selecting essential features for the analysis of massive game logs is not only necessary, but also challenging. This study develops and implements a new XSEDE-enabled tool, FeatureSelector, which uses the parallel processing techniques on high performance computers to perform feature selection. By calculating probability distance measures, based on K-L divergence, this tool quantifies the distance between variables in data sets, and provides guidance for feature selection in massive game log analysis. This tool has helped researchers choose the high-quality and discriminative features from over 300 variables, and select the top pairs of countries with the greatest differences from 231 country-pairs in a 500 GB game log data set. Our study shows that (1) K-L divergence is a good measure for correctly and efficiently selecting important features, and (2) the high performance computing platform supported by XSEDE has substantially accelerated the feature selection processes by over 30 times. Besides demonstrating the effectiveness of FeatureSelector in a cross-country analysis using high performance computing, this study also highlights some lessons learned for feature selection in social science research and some experience on applying parallel processing techniques in intensive data analysis.\",\"PeriodicalId\":93364,\"journal\":{\"name\":\"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)\",\"volume\":\"78 1\",\"pages\":\"17:1-17:7\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-07-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2616498.2616511\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of XSEDE16 : Diversity, Big Data, and Science at Scale : July 17-21, 2016, Intercontinental Miami Hotel, Miami, Florida, USA. Conference on Extreme Science and Engineering Discovery Environment (5th : 2016 : Miami, Fla.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2616498.2616511","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
FeatureSelector: an XSEDE-Enabled Tool for Massive Game Log Analysis
Due to the huge volume and extreme complexity in online game data collections, selecting essential features for the analysis of massive game logs is not only necessary, but also challenging. This study develops and implements a new XSEDE-enabled tool, FeatureSelector, which uses the parallel processing techniques on high performance computers to perform feature selection. By calculating probability distance measures, based on K-L divergence, this tool quantifies the distance between variables in data sets, and provides guidance for feature selection in massive game log analysis. This tool has helped researchers choose the high-quality and discriminative features from over 300 variables, and select the top pairs of countries with the greatest differences from 231 country-pairs in a 500 GB game log data set. Our study shows that (1) K-L divergence is a good measure for correctly and efficiently selecting important features, and (2) the high performance computing platform supported by XSEDE has substantially accelerated the feature selection processes by over 30 times. Besides demonstrating the effectiveness of FeatureSelector in a cross-country analysis using high performance computing, this study also highlights some lessons learned for feature selection in social science research and some experience on applying parallel processing techniques in intensive data analysis.