This paper describes our solution to ICDM 2015's contest. The challenge is to recover cross-device connections, i.e. identifying device-cookie pairs that is used by the same natural person. To tackle this task, we first model the privateness of each IP, then employ pairwise ranking techniques for predicting the likelihood of each connection, finally ensemble learning is used for integrating multiple models from various settings. Our approach achieves 5th place in the contest (average F-score of 0.8608) using ONLY IP footprint information.
{"title":"Recovering Cross-Device Connections via Mining IP Footprints with Ensemble Learning","authors":"Xuezhi Cao, Weiyue Huang, Yong Yu","doi":"10.1109/ICDMW.2015.129","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.129","url":null,"abstract":"This paper describes our solution to ICDM 2015's contest. The challenge is to recover cross-device connections, i.e. identifying device-cookie pairs that is used by the same natural person. To tackle this task, we first model the privateness of each IP, then employ pairwise ranking techniques for predicting the likelihood of each connection, finally ensemble learning is used for integrating multiple models from various settings. Our approach achieves 5th place in the contest (average F-score of 0.8608) using ONLY IP footprint information.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115425277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Identifying regions of interest (ROIs) in images is a very active research problem as it highly depends on the types and characteristics of images. In this paper we present a comparative evaluation of unsupervised learning methods, in particular clustering, to identify ROIs in solar images from the Solar Dynamics Observatory (SDO) mission. With the purpose of finding regions within the solar images that contain potential solar phenomena, this work focuses on describing an automated, non-supervised methodology that will allow us to reduce the image search space when trying to find similar solar phenomenon between multiple sets of images. By experimenting with multiple methods, we identify a successful approach to automatically detecting ROIs for a more refined and robust search in the SDO Content-Based Image-Retrieval (CBIR) system. We then present an extensive experimental evaluation to identify the best performing parameters for our methodology in terms of overlap with expert curated ROIs. Finally we present an exhaustive evaluation of the proposed approach in several image retrieval scenarios to demonstrate that the performance of the identified ROIs is very similar to that of ROIs identified by dedicated science modules of the SDO mission.
{"title":"Unsupervised Learning Techniques for Detection of Regions of Interest in Solar Images","authors":"J. Banda, R. Angryk","doi":"10.1109/ICDMW.2015.61","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.61","url":null,"abstract":"Identifying regions of interest (ROIs) in images is a very active research problem as it highly depends on the types and characteristics of images. In this paper we present a comparative evaluation of unsupervised learning methods, in particular clustering, to identify ROIs in solar images from the Solar Dynamics Observatory (SDO) mission. With the purpose of finding regions within the solar images that contain potential solar phenomena, this work focuses on describing an automated, non-supervised methodology that will allow us to reduce the image search space when trying to find similar solar phenomenon between multiple sets of images. By experimenting with multiple methods, we identify a successful approach to automatically detecting ROIs for a more refined and robust search in the SDO Content-Based Image-Retrieval (CBIR) system. We then present an extensive experimental evaluation to identify the best performing parameters for our methodology in terms of overlap with expert curated ROIs. Finally we present an exhaustive evaluation of the proposed approach in several image retrieval scenarios to demonstrate that the performance of the identified ROIs is very similar to that of ROIs identified by dedicated science modules of the SDO mission.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122941758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Reconstruction of image can be defined as the general problem of estimating a two-dimensional object from a partial version of this object (a limited set of "projections"). In this paper, we propose new approach for image reconstruction based onsimple quasicrystals and L1 minimisation. We discuss the exact reconstruction of an image supposed to have small spectra. We show that simple model sets may be used as sampling set for exact recovery. Moreover, by eliminating a finite number of points from the simple model sets we still have exact recovery. This last aspect is very important for practical applications, e.g. lossy compression. We run our approch on benchmark images data sets and show that the quasicrystal sampling is more performant than the random uniform in terms of time execution when the dimension of the input image increases.
{"title":"Pruned Simple Model Sets for Fast Exact Recovery of Image","authors":"Basarab Matei, Younès Bennani","doi":"10.1109/ICDMW.2015.54","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.54","url":null,"abstract":"Reconstruction of image can be defined as the general problem of estimating a two-dimensional object from a partial version of this object (a limited set of \"projections\"). In this paper, we propose new approach for image reconstruction based onsimple quasicrystals and L1 minimisation. We discuss the exact reconstruction of an image supposed to have small spectra. We show that simple model sets may be used as sampling set for exact recovery. Moreover, by eliminating a finite number of points from the simple model sets we still have exact recovery. This last aspect is very important for practical applications, e.g. lossy compression. We run our approch on benchmark images data sets and show that the quasicrystal sampling is more performant than the random uniform in terms of time execution when the dimension of the input image increases.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123002498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The availability of data trading mechanisms and platforms is a paramount prerequisite to the development of effective smart city services. In order for data to become a commodity ready for consumption, transformation and exploitation by smart services, it must be made available and tradable on data market places. For such data market places to be viable there is a compelling need for a sound data pricing model that is conducive of the healthiness of the market. In this paper, we discuss the definition of a pricing model in which views are priced and queries are valuated using views. We define the price of a query as the cheapest combination of the prices of a set of views that can answer the query. We discuss the devising of effective and efficient algorithms of the computation of the price of a query. We show that the problem of computing the price is similar but not identical to the problem of answering queries using views. We therefore adapt the MiniCon algorithm, which was designed to answer queries using views, to the task at hand. We finally discuss further challenges created by the definition of a framework for valuating queries using views.
{"title":"Valuating Queries for Data Trading in Modern Cities","authors":"Ruiming Tang, Huayu Wu, Xiuqiang He, S. Bressan","doi":"10.1109/ICDMW.2015.11","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.11","url":null,"abstract":"The availability of data trading mechanisms and platforms is a paramount prerequisite to the development of effective smart city services. In order for data to become a commodity ready for consumption, transformation and exploitation by smart services, it must be made available and tradable on data market places. For such data market places to be viable there is a compelling need for a sound data pricing model that is conducive of the healthiness of the market. In this paper, we discuss the definition of a pricing model in which views are priced and queries are valuated using views. We define the price of a query as the cheapest combination of the prices of a set of views that can answer the query. We discuss the devising of effective and efficient algorithms of the computation of the price of a query. We show that the problem of computing the price is similar but not identical to the problem of answering queries using views. We therefore adapt the MiniCon algorithm, which was designed to answer queries using views, to the task at hand. We finally discuss further challenges created by the definition of a framework for valuating queries using views.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123408986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
How can we retrieve meaningful information from a large and sparse graph?. Traditional approaches focus on generic clustering techniques and discovering dense cumulus in a network graph, however, they tend to omit interesting patterns such as the paradigmatic relations. In this paper, we propose a novel graph clustering technique modelling the relations of a node using the paradigmatic analysis. We exploit node's relations to extract its existing sets of signifiers. The newly found clusters represent a different view of a graph, which provides interesting insights into the structure of a sparse network graph. Our proposed algorithm PaC (Paradigmatic Clustering) for clustering graphs uses paradigmatic analysis supported by a asymmetric similarity, in contrast to traditional graph clustering methods, our algorithm yields worthy results in tasks of word-sense disambiguation. In addition we propose a novel paradigmatic similarity measure. Extensive experiments and empirical analysis are used to evaluate our algorithm on synthetic and real data.
{"title":"Paradigmatic Clustering for NLP","authors":"Julio Santisteban, Javier Tejada-Cárcamo","doi":"10.1109/ICDMW.2015.233","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.233","url":null,"abstract":"How can we retrieve meaningful information from a large and sparse graph?. Traditional approaches focus on generic clustering techniques and discovering dense cumulus in a network graph, however, they tend to omit interesting patterns such as the paradigmatic relations. In this paper, we propose a novel graph clustering technique modelling the relations of a node using the paradigmatic analysis. We exploit node's relations to extract its existing sets of signifiers. The newly found clusters represent a different view of a graph, which provides interesting insights into the structure of a sparse network graph. Our proposed algorithm PaC (Paradigmatic Clustering) for clustering graphs uses paradigmatic analysis supported by a asymmetric similarity, in contrast to traditional graph clustering methods, our algorithm yields worthy results in tasks of word-sense disambiguation. In addition we propose a novel paradigmatic similarity measure. Extensive experiments and empirical analysis are used to evaluate our algorithm on synthetic and real data.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123666342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Link prediction is an important and well-studiedproblem in network analysis, with a broad range of applicationsincluding recommender systems, anomaly detection, and denoising. The general principle in link prediction is to use thetopological characteristics of the nodes in the network to predictedges that might be added to or removed from the network. While early research utilized local network neighborhood tocharacterize the topological relationship between pairs of nodes, recent studies increasingly show that use of global networkinformation improves prediction performance. Meanwhile, in thecontext of disease gene prioritization and functional annotationin computational biology, "global topological similarity" basedmethods are shown to be effective and robust to noise andascertainment bias. These methods compute topological profilesthat represent the global view of the network from the perspectiveof each node and compare these topological profiles to assess thetopological similarity between nodes. Here, we show that, in thecontext of link prediction in large networks, the performance ofthese global-view based methods can be adversely affected byhigh dimensionality. Motivated by this observation, we proposetwo dimensionality reduction techniques that exploit the sparsityand modularity of networks that are encountered in practicalapplications. Our experimental results on predicting futurecollaborations based on a comprehensive co-authorship networkshows that dimensionality reduction renders global-view basedlink prediction highly effective, and the resulting algorithmssignificantly outperform state-of-the-art link prediction methods.
{"title":"Link Prediction in Large Networks by Comparing the Global View of Nodes in the Network","authors":"Mustafa Coşkun, Mehmet Koyutürk","doi":"10.1109/ICDMW.2015.195","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.195","url":null,"abstract":"Link prediction is an important and well-studiedproblem in network analysis, with a broad range of applicationsincluding recommender systems, anomaly detection, and denoising. The general principle in link prediction is to use thetopological characteristics of the nodes in the network to predictedges that might be added to or removed from the network. While early research utilized local network neighborhood tocharacterize the topological relationship between pairs of nodes, recent studies increasingly show that use of global networkinformation improves prediction performance. Meanwhile, in thecontext of disease gene prioritization and functional annotationin computational biology, \"global topological similarity\" basedmethods are shown to be effective and robust to noise andascertainment bias. These methods compute topological profilesthat represent the global view of the network from the perspectiveof each node and compare these topological profiles to assess thetopological similarity between nodes. Here, we show that, in thecontext of link prediction in large networks, the performance ofthese global-view based methods can be adversely affected byhigh dimensionality. Motivated by this observation, we proposetwo dimensionality reduction techniques that exploit the sparsityand modularity of networks that are encountered in practicalapplications. Our experimental results on predicting futurecollaborations based on a comprehensive co-authorship networkshows that dimensionality reduction renders global-view basedlink prediction highly effective, and the resulting algorithmssignificantly outperform state-of-the-art link prediction methods.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123970720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a new incremental learning method for heterogeneous domain adaptation, in which the training data from both source domain and target domains are acquired sequentially, represented by heterogeneous features. Two different projection matrices are learned to map the data from two domains into a discriminative common subspace, where the intra-class samples are closely-related to each other, the inter-class samples are well-separated from each other, and the data distribution mismatch between the source and target domains is reduced. Different from previous work, our method is capable of incrementally optimizing the projection matrices when the training data becomes available as a data stream instead of being given completely in advance. With the gradually coming training data, the new projection matrices are computed by updating the existing ones using an eigenspace merging algorithm, rather than repeating the learning from the begin by keeping the whole training data set. Therefore, our incremental learning solution for the projection matrices can significantly reduce the computational complexity and memory space, which makes it applicable to a wider set of heterogeneous domain adaptation scenarios with a large training dataset. Furthermore, our method is neither restricted to the corresponding training instances in the source and target domains nor restricted to the same type of feature, which meaningfully relaxes the requirement of training data. Comprehensive experiments on three benchmark datasets clearly demonstrate the effectiveness and efficiency of our method.
{"title":"Incremental Discriminant Learning for Heterogeneous Domain Adaptation","authors":"Peng Han, Xinxiao Wu","doi":"10.1109/ICDMW.2015.186","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.186","url":null,"abstract":"This paper proposes a new incremental learning method for heterogeneous domain adaptation, in which the training data from both source domain and target domains are acquired sequentially, represented by heterogeneous features. Two different projection matrices are learned to map the data from two domains into a discriminative common subspace, where the intra-class samples are closely-related to each other, the inter-class samples are well-separated from each other, and the data distribution mismatch between the source and target domains is reduced. Different from previous work, our method is capable of incrementally optimizing the projection matrices when the training data becomes available as a data stream instead of being given completely in advance. With the gradually coming training data, the new projection matrices are computed by updating the existing ones using an eigenspace merging algorithm, rather than repeating the learning from the begin by keeping the whole training data set. Therefore, our incremental learning solution for the projection matrices can significantly reduce the computational complexity and memory space, which makes it applicable to a wider set of heterogeneous domain adaptation scenarios with a large training dataset. Furthermore, our method is neither restricted to the corresponding training instances in the source and target domains nor restricted to the same type of feature, which meaningfully relaxes the requirement of training data. Comprehensive experiments on three benchmark datasets clearly demonstrate the effectiveness and efficiency of our method.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129324802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ge Zhou, Lu Yu, Chuxu Zhang, Chuang Liu, Zi-Ke Zhang, Jianlin Zhang
Online social networks provide us a convenient way to access information, which in turn bring the information overload problem. Most of the previous works focused on analyzing user's retweet behavior on the micro-blogging system, and diverse recommendation algorithms were proposed to push personalized tweet list to users. In this paper, we aim to solve the overload problem in the mention list. We firstly explore the in-depth differences between mention and retweet behaviors, and find the users' various actions for a piece of mention. Then we propose a personalized ranking model with consideration on multi-dimensional relations among users and mention tweets to generate the personalized mention list. The experiment results on a micro-blogging system data set show that the proposed method performs better than benchmark methods.
{"title":"A Novel Approach for Generating Personalized Mention List on Micro-Blogging System","authors":"Ge Zhou, Lu Yu, Chuxu Zhang, Chuang Liu, Zi-Ke Zhang, Jianlin Zhang","doi":"10.1109/ICDMW.2015.51","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.51","url":null,"abstract":"Online social networks provide us a convenient way to access information, which in turn bring the information overload problem. Most of the previous works focused on analyzing user's retweet behavior on the micro-blogging system, and diverse recommendation algorithms were proposed to push personalized tweet list to users. In this paper, we aim to solve the overload problem in the mention list. We firstly explore the in-depth differences between mention and retweet behaviors, and find the users' various actions for a piece of mention. Then we propose a personalized ranking model with consideration on multi-dimensional relations among users and mention tweets to generate the personalized mention list. The experiment results on a micro-blogging system data set show that the proposed method performs better than benchmark methods.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129885291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents VEGAS - an online system that can illustrate the influence of one scientific paper on citation networks via the influence graph summarization and visualization. The system is built over an algorithm pipeline that maximizes the rate of influence flows in the final summarization. Both visualization and interaction designs are described with respect to a real usage scenario of the VEGAS system.
{"title":"Influence Visualization of Scientific Paper through Flow-Based Citation Network Summarization","authors":"Yue Su, Sibai Sun, Yuan Xuan, Lei Shi","doi":"10.1109/ICDMW.2015.105","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.105","url":null,"abstract":"This paper presents VEGAS - an online system that can illustrate the influence of one scientific paper on citation networks via the influence graph summarization and visualization. The system is built over an algorithm pipeline that maximizes the rate of influence flows in the final summarization. Both visualization and interaction designs are described with respect to a real usage scenario of the VEGAS system.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126973618","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Online social media applications have become an integral part of our everyday life. Not only are they being utilised by individuals and legitimate businesses, but also recently several organised groups, such as activists, hactivists, and cyber-criminals have adopted them to communicate and' spread their ideas. This represents a new source for intelligence gathering for law enforcement for instance, as it allows them an inside look at the behaviour of these previously closed, secretive groups. One possible opportunity with this online data source is to utilise the public exchange of social-media messages to identify key users in such groups. This is particularly important for law enforcement that wants to monitor or interrogate influential people in suspicious groups. In this paper, we utilise Social Network Analysis (SNA) techniques to understand the dynamics of the interaction between users in a Facebook-based activist group. Additionally, we aim to identify the most influential users in the group and infer their relationship strength. We incorporate sentiment analysis to identify users with clear positive and negative influences on the group, this could aid in facilitating a better understanding of the group. We also perform a temporal analysis to correlate online activities with relevant real-life events. Our results show that applying such data analysis techniques on users online behaviour is a powerful tool to predict levels of influence and relationship strength between group members. Finally, we validated our results against the ground truth and found that our approach is very promising at achieving its aims.
{"title":"Identifying Key-Players in Online Activist Groups on the Facebook Social Network","authors":"Mariam Nouh, Jason R. C. Nurse","doi":"10.1109/ICDMW.2015.88","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.88","url":null,"abstract":"Online social media applications have become an integral part of our everyday life. Not only are they being utilised by individuals and legitimate businesses, but also recently several organised groups, such as activists, hactivists, and cyber-criminals have adopted them to communicate and' spread their ideas. This represents a new source for intelligence gathering for law enforcement for instance, as it allows them an inside look at the behaviour of these previously closed, secretive groups. One possible opportunity with this online data source is to utilise the public exchange of social-media messages to identify key users in such groups. This is particularly important for law enforcement that wants to monitor or interrogate influential people in suspicious groups. In this paper, we utilise Social Network Analysis (SNA) techniques to understand the dynamics of the interaction between users in a Facebook-based activist group. Additionally, we aim to identify the most influential users in the group and infer their relationship strength. We incorporate sentiment analysis to identify users with clear positive and negative influences on the group, this could aid in facilitating a better understanding of the group. We also perform a temporal analysis to correlate online activities with relevant real-life events. Our results show that applying such data analysis techniques on users online behaviour is a powerful tool to predict levels of influence and relationship strength between group members. Finally, we validated our results against the ground truth and found that our approach is very promising at achieving its aims.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121952936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}