{"title":"Connecting Devices to Cookies via Filtering, Feature Engineering, and Boosting","authors":"M. Kim, Jiwei Liu, Xiaozhou Wang, Wei Yang","doi":"10.1109/ICDMW.2015.236","DOIUrl":null,"url":null,"abstract":"We present a supervised machine learning system capable of matching internet devices to web cookies through filtering, feature engineering, binary classification, and post processing. The system builds a reasonably sized training and testing data set through filtering and feature engineering. We build 415 features in total. Some of these features were engineered to be O(n) time, stand alone classifiers for this problem. Other features use various natural language processing (NLP) techniques. Meta features are created by ridge regression and Adaboost. Then binary classification through two different gradient boosting (XGBoost with logarithmic loss) models is performed. A post processing pipeline connects devices and cookies in a way that maximizes F_0.5 score. Our machine learning system obtained a private F_0.5 score of 0.849562 for a final rank of 12th/340 on the ICDM 2015: Drawbridge Cross-Device Connections challenge.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2015.236","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
We present a supervised machine learning system capable of matching internet devices to web cookies through filtering, feature engineering, binary classification, and post processing. The system builds a reasonably sized training and testing data set through filtering and feature engineering. We build 415 features in total. Some of these features were engineered to be O(n) time, stand alone classifiers for this problem. Other features use various natural language processing (NLP) techniques. Meta features are created by ridge regression and Adaboost. Then binary classification through two different gradient boosting (XGBoost with logarithmic loss) models is performed. A post processing pipeline connects devices and cookies in a way that maximizes F_0.5 score. Our machine learning system obtained a private F_0.5 score of 0.849562 for a final rank of 12th/340 on the ICDM 2015: Drawbridge Cross-Device Connections challenge.