Ge Zhou, Lu Yu, Chuxu Zhang, Chuang Liu, Zi-Ke Zhang, Jianlin Zhang
Online social networks provide us a convenient way to access information, which in turn bring the information overload problem. Most of the previous works focused on analyzing user's retweet behavior on the micro-blogging system, and diverse recommendation algorithms were proposed to push personalized tweet list to users. In this paper, we aim to solve the overload problem in the mention list. We firstly explore the in-depth differences between mention and retweet behaviors, and find the users' various actions for a piece of mention. Then we propose a personalized ranking model with consideration on multi-dimensional relations among users and mention tweets to generate the personalized mention list. The experiment results on a micro-blogging system data set show that the proposed method performs better than benchmark methods.
{"title":"A Novel Approach for Generating Personalized Mention List on Micro-Blogging System","authors":"Ge Zhou, Lu Yu, Chuxu Zhang, Chuang Liu, Zi-Ke Zhang, Jianlin Zhang","doi":"10.1109/ICDMW.2015.51","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.51","url":null,"abstract":"Online social networks provide us a convenient way to access information, which in turn bring the information overload problem. Most of the previous works focused on analyzing user's retweet behavior on the micro-blogging system, and diverse recommendation algorithms were proposed to push personalized tweet list to users. In this paper, we aim to solve the overload problem in the mention list. We firstly explore the in-depth differences between mention and retweet behaviors, and find the users' various actions for a piece of mention. Then we propose a personalized ranking model with consideration on multi-dimensional relations among users and mention tweets to generate the personalized mention list. The experiment results on a micro-blogging system data set show that the proposed method performs better than benchmark methods.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129885291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ahsanur Rahman, Steve T. K. Jan, Hyunju Kim, B. Prakash, T. Murali
Ensembles of graphs arise in several natural applications, such as mobility tracking, computational biology, socialnetworks, and epidemiology. A common problem addressed by many existing mining techniques is to identify subgraphs of interest in these ensembles. In contrast, in this paper, we propose to quickly discover maximally variable regions of the graphs, i.e., sets of nodes that induce very different subgraphs across the ensemble. We first develop two intuitive and novel definitions of such node sets, which we then show can be efficiently enumerated using a level-wise algorithm. Finally, using extensive experiments on multiple real datasets, we show how these sets capture the main structural variations of the given set of networks and also provide us with interesting and relevant insights about these datasets.
{"title":"Mining Unstable Communities from Network Ensembles","authors":"Ahsanur Rahman, Steve T. K. Jan, Hyunju Kim, B. Prakash, T. Murali","doi":"10.1109/ICDMW.2015.87","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.87","url":null,"abstract":"Ensembles of graphs arise in several natural applications, such as mobility tracking, computational biology, socialnetworks, and epidemiology. A common problem addressed by many existing mining techniques is to identify subgraphs of interest in these ensembles. In contrast, in this paper, we propose to quickly discover maximally variable regions of the graphs, i.e., sets of nodes that induce very different subgraphs across the ensemble. We first develop two intuitive and novel definitions of such node sets, which we then show can be efficiently enumerated using a level-wise algorithm. Finally, using extensive experiments on multiple real datasets, we show how these sets capture the main structural variations of the given set of networks and also provide us with interesting and relevant insights about these datasets.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130386655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In performing data mining, a common task is to search for the most appropriate algorithm(s) to retrieve important information from data. With an increasing number of available data mining techniques, it may be impractical to experiment with many techniques on a specific dataset of interest to find the best algorithm(s). In this paper, we demonstrate the suitability of tree-based multi-variable linear regression in predicting algorithm performance. We take into account prior machine learning experience to construct meta-knowledge for supervised learning. The idea is to use summary knowledge about datasets along with past performance of algorithms on these datasets to build this meta-knowledge. We augment pure statistical summaries with descriptive features and a misclassification cost, and discover that transformed datasets obtained by reducing a high dimensional feature space to a smaller dimension still retain significant characteristic knowledge necessary to predict algorithm performance. Our approach works well for both numerical and nominal data obtained from real world environments.
{"title":"Selecting Machine Learning Algorithms Using Regression Models","authors":"Tri Doan, J. Kalita","doi":"10.1109/ICDMW.2015.43","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.43","url":null,"abstract":"In performing data mining, a common task is to search for the most appropriate algorithm(s) to retrieve important information from data. With an increasing number of available data mining techniques, it may be impractical to experiment with many techniques on a specific dataset of interest to find the best algorithm(s). In this paper, we demonstrate the suitability of tree-based multi-variable linear regression in predicting algorithm performance. We take into account prior machine learning experience to construct meta-knowledge for supervised learning. The idea is to use summary knowledge about datasets along with past performance of algorithms on these datasets to build this meta-knowledge. We augment pure statistical summaries with descriptive features and a misclassification cost, and discover that transformed datasets obtained by reducing a high dimensional feature space to a smaller dimension still retain significant characteristic knowledge necessary to predict algorithm performance. Our approach works well for both numerical and nominal data obtained from real world environments.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127942066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Since the dissemination of the first beheading video by the Islamic State in Iraq and Levant (ISIL) of its hostage James Foley (an American journalist), this practice has become increasingly common. Videos of ISIL beheading their hostages in orange jumpsuits swarmed over social media as they swept across Iraq. By showing such shocking videos and images, ISIL is able to spread their opinions and create emotional attitudes for their followers. Through a sophisticated social media strategy and strategic use of botnets, ISIL is succeeding in its propaganda dissemination. ISIL is using social media as a tool to conduct recruitment and radicalization campaigns and raise funds. In this study, we examine the reasons for creating such videos grounded in the literature from cultural anthropology, transnationalism and religious identity, and media & communication. Toward this direction, we collect data from Twitter for the beheadings done by ISIL, especially the Egyptian Copts, the Arab-Israeli "Spy", and the Ethiopian Christians. The study provides insights into the way ISIL uses social media (especially Twitter) to disseminate propaganda and develop a framework to identify sociotechnical behavioral patterns from social and computational science perspective.
{"title":"Examining Botnet Behaviors for Propaganda Dissemination: A Case Study of ISIL's Beheading Videos-Based Propaganda","authors":"Samer Al-khateeb, Nitin Agarwal","doi":"10.1109/ICDMW.2015.41","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.41","url":null,"abstract":"Since the dissemination of the first beheading video by the Islamic State in Iraq and Levant (ISIL) of its hostage James Foley (an American journalist), this practice has become increasingly common. Videos of ISIL beheading their hostages in orange jumpsuits swarmed over social media as they swept across Iraq. By showing such shocking videos and images, ISIL is able to spread their opinions and create emotional attitudes for their followers. Through a sophisticated social media strategy and strategic use of botnets, ISIL is succeeding in its propaganda dissemination. ISIL is using social media as a tool to conduct recruitment and radicalization campaigns and raise funds. In this study, we examine the reasons for creating such videos grounded in the literature from cultural anthropology, transnationalism and religious identity, and media & communication. Toward this direction, we collect data from Twitter for the beheadings done by ISIL, especially the Egyptian Copts, the Arab-Israeli \"Spy\", and the Ethiopian Christians. The study provides insights into the way ISIL uses social media (especially Twitter) to disseminate propaganda and develop a framework to identify sociotechnical behavioral patterns from social and computational science perspective.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128950816","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of constant rows with a group of columns. This kind of clustering is called biclustering. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed. We introduce a new algorithm called, Enumerative Lattice (EnumLat) for biclustering of binary microarray data. EnumLat is an algorithm adopting the approach of enumerating biclusters. This algorithm extracts all biclusters consistent good quality. The main idea of EnumLat is the construction of a new tree structure to represent adequately different biclusters discovered during the process of enumeration. This algorithm adopts the strategy of all biclusters at a time. The performance of the proposed algorithm is assessed using both synthetic and real DNA microarray data, our algorithm outperforms other biclustering algorithms for binary microarray data. Moreover, we test the biological significance using a gene annotation web tool to show that our proposed method is able to produce biologically relevant biclusters.
{"title":"An Enumerative Biclustering Algorithm for DNA Microarray Data","authors":"Haifa Ben Saber, M. Elloumi","doi":"10.1109/ICDMW.2015.168","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.168","url":null,"abstract":"In a number of domains, like in DNA microarray data analysis, we need to cluster simultaneously rows (genes) and columns (conditions) of a data matrix to identify groups of constant rows with a group of columns. This kind of clustering is called biclustering. Biclustering algorithms are extensively used in DNA microarray data analysis. More effective biclustering algorithms are highly desirable and needed. We introduce a new algorithm called, Enumerative Lattice (EnumLat) for biclustering of binary microarray data. EnumLat is an algorithm adopting the approach of enumerating biclusters. This algorithm extracts all biclusters consistent good quality. The main idea of EnumLat is the construction of a new tree structure to represent adequately different biclusters discovered during the process of enumeration. This algorithm adopts the strategy of all biclusters at a time. The performance of the proposed algorithm is assessed using both synthetic and real DNA microarray data, our algorithm outperforms other biclustering algorithms for binary microarray data. Moreover, we test the biological significance using a gene annotation web tool to show that our proposed method is able to produce biologically relevant biclusters.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127487192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents VEGAS - an online system that can illustrate the influence of one scientific paper on citation networks via the influence graph summarization and visualization. The system is built over an algorithm pipeline that maximizes the rate of influence flows in the final summarization. Both visualization and interaction designs are described with respect to a real usage scenario of the VEGAS system.
{"title":"Influence Visualization of Scientific Paper through Flow-Based Citation Network Summarization","authors":"Yue Su, Sibai Sun, Yuan Xuan, Lei Shi","doi":"10.1109/ICDMW.2015.105","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.105","url":null,"abstract":"This paper presents VEGAS - an online system that can illustrate the influence of one scientific paper on citation networks via the influence graph summarization and visualization. The system is built over an algorithm pipeline that maximizes the rate of influence flows in the final summarization. Both visualization and interaction designs are described with respect to a real usage scenario of the VEGAS system.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126973618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reconstruction of image can be defined as the general problem of estimating a two-dimensional object from a partial version of this object (a limited set of "projections"). In this paper, we propose new approach for image reconstruction based onsimple quasicrystals and L1 minimisation. We discuss the exact reconstruction of an image supposed to have small spectra. We show that simple model sets may be used as sampling set for exact recovery. Moreover, by eliminating a finite number of points from the simple model sets we still have exact recovery. This last aspect is very important for practical applications, e.g. lossy compression. We run our approch on benchmark images data sets and show that the quasicrystal sampling is more performant than the random uniform in terms of time execution when the dimension of the input image increases.
{"title":"Pruned Simple Model Sets for Fast Exact Recovery of Image","authors":"Basarab Matei, Younès Bennani","doi":"10.1109/ICDMW.2015.54","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.54","url":null,"abstract":"Reconstruction of image can be defined as the general problem of estimating a two-dimensional object from a partial version of this object (a limited set of \"projections\"). In this paper, we propose new approach for image reconstruction based onsimple quasicrystals and L1 minimisation. We discuss the exact reconstruction of an image supposed to have small spectra. We show that simple model sets may be used as sampling set for exact recovery. Moreover, by eliminating a finite number of points from the simple model sets we still have exact recovery. This last aspect is very important for practical applications, e.g. lossy compression. We run our approch on benchmark images data sets and show that the quasicrystal sampling is more performant than the random uniform in terms of time execution when the dimension of the input image increases.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123002498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Identifying regions of interest (ROIs) in images is a very active research problem as it highly depends on the types and characteristics of images. In this paper we present a comparative evaluation of unsupervised learning methods, in particular clustering, to identify ROIs in solar images from the Solar Dynamics Observatory (SDO) mission. With the purpose of finding regions within the solar images that contain potential solar phenomena, this work focuses on describing an automated, non-supervised methodology that will allow us to reduce the image search space when trying to find similar solar phenomenon between multiple sets of images. By experimenting with multiple methods, we identify a successful approach to automatically detecting ROIs for a more refined and robust search in the SDO Content-Based Image-Retrieval (CBIR) system. We then present an extensive experimental evaluation to identify the best performing parameters for our methodology in terms of overlap with expert curated ROIs. Finally we present an exhaustive evaluation of the proposed approach in several image retrieval scenarios to demonstrate that the performance of the identified ROIs is very similar to that of ROIs identified by dedicated science modules of the SDO mission.
{"title":"Unsupervised Learning Techniques for Detection of Regions of Interest in Solar Images","authors":"J. Banda, R. Angryk","doi":"10.1109/ICDMW.2015.61","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.61","url":null,"abstract":"Identifying regions of interest (ROIs) in images is a very active research problem as it highly depends on the types and characteristics of images. In this paper we present a comparative evaluation of unsupervised learning methods, in particular clustering, to identify ROIs in solar images from the Solar Dynamics Observatory (SDO) mission. With the purpose of finding regions within the solar images that contain potential solar phenomena, this work focuses on describing an automated, non-supervised methodology that will allow us to reduce the image search space when trying to find similar solar phenomenon between multiple sets of images. By experimenting with multiple methods, we identify a successful approach to automatically detecting ROIs for a more refined and robust search in the SDO Content-Based Image-Retrieval (CBIR) system. We then present an extensive experimental evaluation to identify the best performing parameters for our methodology in terms of overlap with expert curated ROIs. Finally we present an exhaustive evaluation of the proposed approach in several image retrieval scenarios to demonstrate that the performance of the identified ROIs is very similar to that of ROIs identified by dedicated science modules of the SDO mission.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122941758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Social media offers a wealth of insight into howsignificant events -- such as the Great East Japan Earthquake, the Arab Spring, and the Boston Bombing -- affect individuals. The scale of available data, however, can be intimidating: duringthe Great East Japan Earthquake, over 8 million tweets weresent each day from Japan alone. Conventional word vector-based event-detection techniques for social media that use Latent SemanticAnalysis, Latent Dirichlet Allocation, or graph communitydetection often cannot scale to such a large volume of data due to their space and time complexity. To alleviate this problem, we propose an efficient method for event detection by leveraging a fast feature selection algorithm called CWC. While we begin withword count vectors of authors and words for each time slot (inour case, every hour), we extract discriminative words from eachslot using CWC, which vastly reduces the number of features to track. We then convert these word vectors into a time series of vector distances from the initial point. The distance betweeneach time slot and the initial point remains high while an eventis happening, yet declines sharply when the event ends, offeringan accurate portrait of the span of an event. This method makes it possible to detect events from vast datasets. To demonstrateour method's effectiveness, we extract events from a dataset ofover two hundred million tweets sent in the 21 days followingthe Great East Japan Earthquake. With CWC, we can identifyevents from this dataset with great speed and accuracy.
{"title":"Event Detection from Millions of Tweets Related to the Great East Japan Earthquake Using Feature Selection Technique","authors":"T. Hashimoto, D. Shepard, T. Kuboyama, Kilho Shin","doi":"10.1109/ICDMW.2015.248","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.248","url":null,"abstract":"Social media offers a wealth of insight into howsignificant events -- such as the Great East Japan Earthquake, the Arab Spring, and the Boston Bombing -- affect individuals. The scale of available data, however, can be intimidating: duringthe Great East Japan Earthquake, over 8 million tweets weresent each day from Japan alone. Conventional word vector-based event-detection techniques for social media that use Latent SemanticAnalysis, Latent Dirichlet Allocation, or graph communitydetection often cannot scale to such a large volume of data due to their space and time complexity. To alleviate this problem, we propose an efficient method for event detection by leveraging a fast feature selection algorithm called CWC. While we begin withword count vectors of authors and words for each time slot (inour case, every hour), we extract discriminative words from eachslot using CWC, which vastly reduces the number of features to track. We then convert these word vectors into a time series of vector distances from the initial point. The distance betweeneach time slot and the initial point remains high while an eventis happening, yet declines sharply when the event ends, offeringan accurate portrait of the span of an event. This method makes it possible to detect events from vast datasets. To demonstrateour method's effectiveness, we extract events from a dataset ofover two hundred million tweets sent in the 21 days followingthe Great East Japan Earthquake. With CWC, we can identifyevents from this dataset with great speed and accuracy.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126675733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mobility prediction has recently attracted plenty of attention since it plays an important part in many applications ranging from urban planning and traffic forecasting to location-based services, including mobile recommendation and mobile advertisement. However, there is little study on exploiting the activity information, being often associated with the trajectories on which prediction is based, for assisting location prediction. To this end, in this paper, we propose a Time-stamped Activity INference Enhanced Predictor (TAINEP) for forecasting next location on activity trajectories. In TAINEP, we propose to leverage topic models for dimension reduction so as to capture co-occurrences of different time-stamped activities. It is then extended to incorporate temporal dependence between topics of consecutive time-stamped activities to infer the activity which may be conducted at the next location and the time when it will happen. Based on the inferred time-stamped activities, a probabilistic mixture model is further put forward to integrate them with commonly-used Markov predictors for forecasting the next locations. We finally evaluate the proposed model on two real-world datasets. The results show that the proposed method outperforms the competing predictors without inferring time-stamped activities. In other words, it lifts the predictability of human mobility.
{"title":"Lifting the Predictability of Human Mobility on Activity Trajectories","authors":"Xianming Li, Defu Lian, Xing Xie, Guangzhong Sun","doi":"10.1109/ICDMW.2015.164","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.164","url":null,"abstract":"Mobility prediction has recently attracted plenty of attention since it plays an important part in many applications ranging from urban planning and traffic forecasting to location-based services, including mobile recommendation and mobile advertisement. However, there is little study on exploiting the activity information, being often associated with the trajectories on which prediction is based, for assisting location prediction. To this end, in this paper, we propose a Time-stamped Activity INference Enhanced Predictor (TAINEP) for forecasting next location on activity trajectories. In TAINEP, we propose to leverage topic models for dimension reduction so as to capture co-occurrences of different time-stamped activities. It is then extended to incorporate temporal dependence between topics of consecutive time-stamped activities to infer the activity which may be conducted at the next location and the time when it will happen. Based on the inferred time-stamped activities, a probabilistic mixture model is further put forward to integrate them with commonly-used Markov predictors for forecasting the next locations. We finally evaluate the proposed model on two real-world datasets. The results show that the proposed method outperforms the competing predictors without inferring time-stamped activities. In other words, it lifts the predictability of human mobility.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126867921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}