Thanh-Binh Nguyen, Vu Nguyen, Nguyen Cong Thuong, S. Venkatesh, Mohan J. Kumar, Dinh Q. Phung
{"title":"Learning Multifaceted Latent Activities from Heterogeneous Mobile Data","authors":"Thanh-Binh Nguyen, Vu Nguyen, Nguyen Cong Thuong, S. Venkatesh, Mohan J. Kumar, Dinh Q. Phung","doi":"10.1109/DSAA.2016.48","DOIUrl":null,"url":null,"abstract":"Inferring abstract contexts and activities from heterogeneous data is vital to context-aware ubiquitous applications but still remains one of the most challenging problems. Recent advances in Bayesian nonparametric machine learning, in particular the theory of topic models based on Hierarchical Dirichlet Process (HDP), has provided an elegant solution towards these challenges. However, limited existing methods have addressed the problem of inferring latent multifaceted activities and contexts from heterogeneous data sources such as those collected from mobile devices. In this paper, we extend the original HDP to model heterogeneous data using a richer structure of the base measure being a product-space. The proposed model, called product-space HDP (PS-HDP), naturally handles the heterogeneous data from multiple sources and identify the unknown number of latent structures in a principle way. Although this framework is generic, our current work primarily focuses on inferring (latent) threefold activities of who-when-where simultaneously, which corresponds to inducing activities from data collected for identity, location and time. We demonstrate our model on synthetic data as well as on a real-world dataset – the StudentLife dataset. We report results and provide analysis on the discovered activities and patterns to demonstrate the merit of the model. We also quantitatively evaluate the performance of PS-HDP model using standard metrics including F1-score, NMI, RI, purity, and compare them with well-known existing baseline methods.","PeriodicalId":193885,"journal":{"name":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","volume":"1965 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSAA.2016.48","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Inferring abstract contexts and activities from heterogeneous data is vital to context-aware ubiquitous applications but still remains one of the most challenging problems. Recent advances in Bayesian nonparametric machine learning, in particular the theory of topic models based on Hierarchical Dirichlet Process (HDP), has provided an elegant solution towards these challenges. However, limited existing methods have addressed the problem of inferring latent multifaceted activities and contexts from heterogeneous data sources such as those collected from mobile devices. In this paper, we extend the original HDP to model heterogeneous data using a richer structure of the base measure being a product-space. The proposed model, called product-space HDP (PS-HDP), naturally handles the heterogeneous data from multiple sources and identify the unknown number of latent structures in a principle way. Although this framework is generic, our current work primarily focuses on inferring (latent) threefold activities of who-when-where simultaneously, which corresponds to inducing activities from data collected for identity, location and time. We demonstrate our model on synthetic data as well as on a real-world dataset – the StudentLife dataset. We report results and provide analysis on the discovered activities and patterns to demonstrate the merit of the model. We also quantitatively evaluate the performance of PS-HDP model using standard metrics including F1-score, NMI, RI, purity, and compare them with well-known existing baseline methods.