2015 IEEE International Conference on Data Mining Workshop (ICDMW)最新文献

英文中文

Recovering Cross-Device Connections via Mining IP Footprints with Ensemble Learning 通过集成学习挖掘IP足迹恢复跨设备连接

2015 IEEE International Conference on Data Mining Workshop (ICDMW)

Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.129

Xuezhi Cao, Weiyue Huang, Yong Yu

This paper describes our solution to ICDM 2015's contest. The challenge is to recover cross-device connections, i.e. identifying device-cookie pairs that is used by the same natural person. To tackle this task, we first model the privateness of each IP, then employ pairwise ranking techniques for predicting the likelihood of each connection, finally ensemble learning is used for integrating multiple models from various settings. Our approach achieves 5th place in the contest (average F-score of 0.8608) using ONLY IP footprint information.

本文介绍了我们在ICDM 2015竞赛中的解决方案。挑战在于恢复跨设备连接，即识别由同一自然人使用的设备cookie对。为了解决这个问题，我们首先对每个IP的隐私性进行建模，然后使用两两排序技术来预测每个连接的可能性，最后使用集成学习来集成来自不同设置的多个模型。我们的方法仅使用IP占用信息，在竞赛中获得第五名(平均f值为0.8608)。

引用次数: 12

Unsupervised Learning Techniques for Detection of Regions of Interest in Solar Images 太阳图像中感兴趣区域检测的无监督学习技术

2015 IEEE International Conference on Data Mining Workshop (ICDMW)

Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.61

J. Banda, R. Angryk

Identifying regions of interest (ROIs) in images is a very active research problem as it highly depends on the types and characteristics of images. In this paper we present a comparative evaluation of unsupervised learning methods, in particular clustering, to identify ROIs in solar images from the Solar Dynamics Observatory (SDO) mission. With the purpose of finding regions within the solar images that contain potential solar phenomena, this work focuses on describing an automated, non-supervised methodology that will allow us to reduce the image search space when trying to find similar solar phenomenon between multiple sets of images. By experimenting with multiple methods, we identify a successful approach to automatically detecting ROIs for a more refined and robust search in the SDO Content-Based Image-Retrieval (CBIR) system. We then present an extensive experimental evaluation to identify the best performing parameters for our methodology in terms of overlap with expert curated ROIs. Finally we present an exhaustive evaluation of the proposed approach in several image retrieval scenarios to demonstrate that the performance of the identified ROIs is very similar to that of ROIs identified by dedicated science modules of the SDO mission.

图像中兴趣区域的识别是一个非常活跃的研究问题，因为它高度依赖于图像的类型和特征。在本文中，我们提出了一种比较评估的无监督学习方法，特别是聚类，以识别来自太阳动力学天文台(SDO)任务的太阳图像中的roi。为了在太阳图像中找到包含潜在太阳现象的区域，这项工作着重于描述一种自动化的、无监督的方法，该方法将使我们在试图在多组图像之间找到类似的太阳现象时减少图像搜索空间。通过对多种方法的实验，我们确定了一种成功的方法来自动检测roi，以便在SDO基于内容的图像检索(CBIR)系统中进行更精细和健壮的搜索。然后，我们提出了一个广泛的实验评估，以确定我们的方法在与专家策划的roi重叠方面的最佳表现参数。最后，我们在几个图像检索场景中对所提出的方法进行了详尽的评估，以证明所识别的roi的性能与SDO任务的专用科学模块所识别的roi非常相似。

{"title":"Unsupervised Learning Techniques for Detection of Regions of Interest in Solar Images","authors":"J. Banda, R. Angryk","doi":"10.1109/ICDMW.2015.61","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.61","url":null,"abstract":"Identifying regions of interest (ROIs) in images is a very active research problem as it highly depends on the types and characteristics of images. In this paper we present a comparative evaluation of unsupervised learning methods, in particular clustering, to identify ROIs in solar images from the Solar Dynamics Observatory (SDO) mission. With the purpose of finding regions within the solar images that contain potential solar phenomena, this work focuses on describing an automated, non-supervised methodology that will allow us to reduce the image search space when trying to find similar solar phenomenon between multiple sets of images. By experimenting with multiple methods, we identify a successful approach to automatically detecting ROIs for a more refined and robust search in the SDO Content-Based Image-Retrieval (CBIR) system. We then present an extensive experimental evaluation to identify the best performing parameters for our methodology in terms of overlap with expert curated ROIs. Finally we present an exhaustive evaluation of the proposed approach in several image retrieval scenarios to demonstrate that the performance of the identified ROIs is very similar to that of ROIs identified by dedicated science modules of the SDO mission.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"85 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122941758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Pruned Simple Model Sets for Fast Exact Recovery of Image 修剪简单的模型集，快速准确地恢复图像

2015 IEEE International Conference on Data Mining Workshop (ICDMW)

Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.54

Basarab Matei, Younès Bennani

Reconstruction of image can be defined as the general problem of estimating a two-dimensional object from a partial version of this object (a limited set of "projections"). In this paper, we propose new approach for image reconstruction based onsimple quasicrystals and L1 minimisation. We discuss the exact reconstruction of an image supposed to have small spectra. We show that simple model sets may be used as sampling set for exact recovery. Moreover, by eliminating a finite number of points from the simple model sets we still have exact recovery. This last aspect is very important for practical applications, e.g. lossy compression. We run our approch on benchmark images data sets and show that the quasicrystal sampling is more performant than the random uniform in terms of time execution when the dimension of the input image increases.

图像重建可以定义为从该对象的部分版本(有限的“投影”集)估计二维对象的一般问题。在本文中，我们提出了基于简单准晶体和L1最小化的图像重建新方法。我们讨论了假设具有小光谱的图像的精确重建。我们证明了简单的模型集可以作为精确恢复的采样集。此外，通过从简单模型集中消除有限数量的点，我们仍然有精确的恢复。最后一个方面对于实际应用非常重要，例如有损压缩。我们在基准图像数据集上运行了我们的方法，并表明当输入图像的维数增加时，准晶体采样在执行时间方面比随机均匀采样性能更好。

引用次数: 0

Valuating Queries for Data Trading in Modern Cities 现代城市数据交易查询评估

2015 IEEE International Conference on Data Mining Workshop (ICDMW)

Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.11

Ruiming Tang, Huayu Wu, Xiuqiang He, S. Bressan

The availability of data trading mechanisms and platforms is a paramount prerequisite to the development of effective smart city services. In order for data to become a commodity ready for consumption, transformation and exploitation by smart services, it must be made available and tradable on data market places. For such data market places to be viable there is a compelling need for a sound data pricing model that is conducive of the healthiness of the market. In this paper, we discuss the definition of a pricing model in which views are priced and queries are valuated using views. We define the price of a query as the cheapest combination of the prices of a set of views that can answer the query. We discuss the devising of effective and efficient algorithms of the computation of the price of a query. We show that the problem of computing the price is similar but not identical to the problem of answering queries using views. We therefore adapt the MiniCon algorithm, which was designed to answer queries using views, to the task at hand. We finally discuss further challenges created by the definition of a framework for valuating queries using views.

数据交易机制和平台的可用性是发展有效智慧城市服务的首要先决条件。为了让数据成为一种可供智能服务消费、转换和利用的商品，数据必须在数据市场上可用并可交易。为了使这些数据市场可行，迫切需要一种有利于市场健康发展的健全数据定价模式。在本文中，我们讨论了一个定价模型的定义，在这个模型中，视图被定价，查询被使用视图估值。我们将查询的价格定义为可以回答该查询的一组视图的价格的最便宜组合。我们讨论了查询价格计算的有效和高效算法的设计。我们表明，计算价格的问题与使用视图回答查询的问题类似，但不完全相同。因此，我们将MiniCon算法(设计用于使用视图回答查询)调整为适合手头的任务。最后，我们讨论了使用视图计算查询的框架的定义所带来的进一步挑战。

{"title":"Valuating Queries for Data Trading in Modern Cities","authors":"Ruiming Tang, Huayu Wu, Xiuqiang He, S. Bressan","doi":"10.1109/ICDMW.2015.11","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.11","url":null,"abstract":"The availability of data trading mechanisms and platforms is a paramount prerequisite to the development of effective smart city services. In order for data to become a commodity ready for consumption, transformation and exploitation by smart services, it must be made available and tradable on data market places. For such data market places to be viable there is a compelling need for a sound data pricing model that is conducive of the healthiness of the market. In this paper, we discuss the definition of a pricing model in which views are priced and queries are valuated using views. We define the price of a query as the cheapest combination of the prices of a set of views that can answer the query. We discuss the devising of effective and efficient algorithms of the computation of the price of a query. We show that the problem of computing the price is similar but not identical to the problem of answering queries using views. We therefore adapt the MiniCon algorithm, which was designed to answer queries using views, to the task at hand. We finally discuss further challenges created by the definition of a framework for valuating queries using views.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123408986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Paradigmatic Clustering for NLP NLP的范式聚类

2015 IEEE International Conference on Data Mining Workshop (ICDMW)

Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.233

Julio Santisteban, Javier Tejada-Cárcamo

How can we retrieve meaningful information from a large and sparse graph?. Traditional approaches focus on generic clustering techniques and discovering dense cumulus in a network graph, however, they tend to omit interesting patterns such as the paradigmatic relations. In this paper, we propose a novel graph clustering technique modelling the relations of a node using the paradigmatic analysis. We exploit node's relations to extract its existing sets of signifiers. The newly found clusters represent a different view of a graph, which provides interesting insights into the structure of a sparse network graph. Our proposed algorithm PaC (Paradigmatic Clustering) for clustering graphs uses paradigmatic analysis supported by a asymmetric similarity, in contrast to traditional graph clustering methods, our algorithm yields worthy results in tasks of word-sense disambiguation. In addition we propose a novel paradigmatic similarity measure. Extensive experiments and empirical analysis are used to evaluate our algorithm on synthetic and real data.

我们如何从一个大而稀疏的图中检索有意义的信息?传统的方法侧重于通用聚类技术和发现网络图中的密集积云，然而，它们往往忽略了有趣的模式，如范式关系。在本文中，我们提出了一种新的图聚类技术，利用范式分析对节点之间的关系进行建模。我们利用节点的关系来提取其现有的能指集。新发现的聚类代表了图的不同视图，这为稀疏网络图的结构提供了有趣的见解。我们提出的聚类图的聚类算法PaC(范式聚类)使用非对称相似度支持的聚类分析，与传统的图聚类方法相比，我们的算法在词义消歧任务中产生了有价值的结果。此外，我们提出了一种新的范式相似性度量。通过大量的实验和实证分析，在合成数据和实际数据上对我们的算法进行了评估。

引用次数: 3

Link Prediction in Large Networks by Comparing the Global View of Nodes in the Network 比较网络节点全局视图的大型网络链路预测

2015 IEEE International Conference on Data Mining Workshop (ICDMW)

Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.195

Mustafa Coşkun, Mehmet Koyutürk

Link prediction is an important and well-studiedproblem in network analysis, with a broad range of applicationsincluding recommender systems, anomaly detection, and denoising. The general principle in link prediction is to use thetopological characteristics of the nodes in the network to predictedges that might be added to or removed from the network. While early research utilized local network neighborhood tocharacterize the topological relationship between pairs of nodes, recent studies increasingly show that use of global networkinformation improves prediction performance. Meanwhile, in thecontext of disease gene prioritization and functional annotationin computational biology, "global topological similarity" basedmethods are shown to be effective and robust to noise andascertainment bias. These methods compute topological profilesthat represent the global view of the network from the perspectiveof each node and compare these topological profiles to assess thetopological similarity between nodes. Here, we show that, in thecontext of link prediction in large networks, the performance ofthese global-view based methods can be adversely affected byhigh dimensionality. Motivated by this observation, we proposetwo dimensionality reduction techniques that exploit the sparsityand modularity of networks that are encountered in practicalapplications. Our experimental results on predicting futurecollaborations based on a comprehensive co-authorship networkshows that dimensionality reduction renders global-view basedlink prediction highly effective, and the resulting algorithmssignificantly outperform state-of-the-art link prediction methods.

链接预测是网络分析中一个重要且研究得很好的问题，具有广泛的应用范围，包括推荐系统，异常检测和去噪。链路预测的一般原则是使用网络中节点的拓扑特征来预测可能添加到网络或从网络中删除的内容。虽然早期的研究利用局部网络邻域来表征节点对之间的拓扑关系，但最近的研究越来越多地表明，使用全局网络信息可以提高预测性能。同时，在计算生物学中疾病基因优先排序和功能注释的背景下，基于“全局拓扑相似性”的方法被证明是有效的，并且对噪声和确定偏差具有鲁棒性。这些方法从每个节点的角度计算代表网络全局视图的拓扑概况，并比较这些拓扑概况以评估节点之间的拓扑相似性。在这里，我们表明，在大型网络的链接预测背景下，这些基于全局视图的方法的性能可能会受到高维的不利影响。基于这一观察结果，我们提出了利用实际应用中遇到的网络的稀疏性和模块化的两种降维技术。我们基于一个全面的合作网络预测未来合作的实验结果表明，降维使得基于全局视图的链接预测非常有效，所得到的算法明显优于最先进的链接预测方法。

{"title":"Link Prediction in Large Networks by Comparing the Global View of Nodes in the Network","authors":"Mustafa Coşkun, Mehmet Koyutürk","doi":"10.1109/ICDMW.2015.195","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.195","url":null,"abstract":"Link prediction is an important and well-studiedproblem in network analysis, with a broad range of applicationsincluding recommender systems, anomaly detection, and denoising. The general principle in link prediction is to use thetopological characteristics of the nodes in the network to predictedges that might be added to or removed from the network. While early research utilized local network neighborhood tocharacterize the topological relationship between pairs of nodes, recent studies increasingly show that use of global networkinformation improves prediction performance. Meanwhile, in thecontext of disease gene prioritization and functional annotationin computational biology, \"global topological similarity\" basedmethods are shown to be effective and robust to noise andascertainment bias. These methods compute topological profilesthat represent the global view of the network from the perspectiveof each node and compare these topological profiles to assess thetopological similarity between nodes. Here, we show that, in thecontext of link prediction in large networks, the performance ofthese global-view based methods can be adversely affected byhigh dimensionality. Motivated by this observation, we proposetwo dimensionality reduction techniques that exploit the sparsityand modularity of networks that are encountered in practicalapplications. Our experimental results on predicting futurecollaborations based on a comprehensive co-authorship networkshows that dimensionality reduction renders global-view basedlink prediction highly effective, and the resulting algorithmssignificantly outperform state-of-the-art link prediction methods.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123970720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 18

Incremental Discriminant Learning for Heterogeneous Domain Adaptation 异构领域适应的增量判别学习

2015 IEEE International Conference on Data Mining Workshop (ICDMW)

Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.186

Peng Han, Xinxiao Wu

This paper proposes a new incremental learning method for heterogeneous domain adaptation, in which the training data from both source domain and target domains are acquired sequentially, represented by heterogeneous features. Two different projection matrices are learned to map the data from two domains into a discriminative common subspace, where the intra-class samples are closely-related to each other, the inter-class samples are well-separated from each other, and the data distribution mismatch between the source and target domains is reduced. Different from previous work, our method is capable of incrementally optimizing the projection matrices when the training data becomes available as a data stream instead of being given completely in advance. With the gradually coming training data, the new projection matrices are computed by updating the existing ones using an eigenspace merging algorithm, rather than repeating the learning from the begin by keeping the whole training data set. Therefore, our incremental learning solution for the projection matrices can significantly reduce the computational complexity and memory space, which makes it applicable to a wider set of heterogeneous domain adaptation scenarios with a large training dataset. Furthermore, our method is neither restricted to the corresponding training instances in the source and target domains nor restricted to the same type of feature, which meaningfully relaxes the requirement of training data. Comprehensive experiments on three benchmark datasets clearly demonstrate the effectiveness and efficiency of our method.

本文提出了一种新的异构域自适应增量学习方法，该方法将源域和目标域的训练数据依次获取，并以异构特征表示。学习了两个不同的投影矩阵，将两个域的数据映射到一个判别性的公共子空间中，使类内样本之间的关系密切，类间样本之间的分离良好，减少了源域和目标域之间的数据分布不匹配。与以往的工作不同，我们的方法能够在训练数据作为数据流可用时增量优化投影矩阵，而不是完全提前给出。随着训练数据的不断增加，新的投影矩阵是通过特征空间合并算法更新已有的投影矩阵来计算的，而不是通过保留整个训练数据集来从头开始重复学习。因此，我们的投影矩阵增量学习方案可以显著降低计算复杂度和内存空间，使其适用于具有大型训练数据集的更广泛的异构域适应场景。此外，我们的方法既不局限于源域和目标域对应的训练实例，也不局限于同一类型的特征，这有意义地放宽了对训练数据的要求。在三个基准数据集上的综合实验清楚地证明了该方法的有效性和高效性。

{"title":"Incremental Discriminant Learning for Heterogeneous Domain Adaptation","authors":"Peng Han, Xinxiao Wu","doi":"10.1109/ICDMW.2015.186","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.186","url":null,"abstract":"This paper proposes a new incremental learning method for heterogeneous domain adaptation, in which the training data from both source domain and target domains are acquired sequentially, represented by heterogeneous features. Two different projection matrices are learned to map the data from two domains into a discriminative common subspace, where the intra-class samples are closely-related to each other, the inter-class samples are well-separated from each other, and the data distribution mismatch between the source and target domains is reduced. Different from previous work, our method is capable of incrementally optimizing the projection matrices when the training data becomes available as a data stream instead of being given completely in advance. With the gradually coming training data, the new projection matrices are computed by updating the existing ones using an eigenspace merging algorithm, rather than repeating the learning from the begin by keeping the whole training data set. Therefore, our incremental learning solution for the projection matrices can significantly reduce the computational complexity and memory space, which makes it applicable to a wider set of heterogeneous domain adaptation scenarios with a large training dataset. Furthermore, our method is neither restricted to the corresponding training instances in the source and target domains nor restricted to the same type of feature, which meaningfully relaxes the requirement of training data. Comprehensive experiments on three benchmark datasets clearly demonstrate the effectiveness and efficiency of our method.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129324802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

A Novel Approach for Generating Personalized Mention List on Micro-Blogging System 微博系统个性化提及列表生成的一种新方法

2015 IEEE International Conference on Data Mining Workshop (ICDMW)

Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.51

Ge Zhou, Lu Yu, Chuxu Zhang, Chuang Liu, Zi-Ke Zhang, Jianlin Zhang

Online social networks provide us a convenient way to access information, which in turn bring the information overload problem. Most of the previous works focused on analyzing user's retweet behavior on the micro-blogging system, and diverse recommendation algorithms were proposed to push personalized tweet list to users. In this paper, we aim to solve the overload problem in the mention list. We firstly explore the in-depth differences between mention and retweet behaviors, and find the users' various actions for a piece of mention. Then we propose a personalized ranking model with consideration on multi-dimensional relations among users and mention tweets to generate the personalized mention list. The experiment results on a micro-blogging system data set show that the proposed method performs better than benchmark methods.

在线社交网络为我们提供了一种方便的获取信息的方式，这反过来又带来了信息过载的问题。以往的工作大多侧重于分析用户在微博系统上的转发行为，并提出了多种推荐算法向用户推送个性化的推文列表。在本文中，我们的目标是解决提及表中的过载问题。我们首先深入探讨了提及和转发行为之间的差异，并找到了用户对于一条提及的各种行为。然后，我们提出了一种考虑用户与提及tweets之间多维关系的个性化排名模型，生成个性化的提及列表。在微博系统数据集上的实验结果表明，该方法的性能优于基准方法。

引用次数: 13

Influence Visualization of Scientific Paper through Flow-Based Citation Network Summarization

2015 IEEE International Conference on Data Mining Workshop (ICDMW)

Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.105

Yue Su, Sibai Sun, Yuan Xuan, Lei Shi

This paper presents VEGAS - an online system that can illustrate the influence of one scientific paper on citation networks via the influence graph summarization and visualization. The system is built over an algorithm pipeline that maximizes the rate of influence flows in the final summarization. Both visualization and interaction designs are described with respect to a real usage scenario of the VEGAS system.

本文介绍了一个在线系统VEGAS，它可以通过影响图的汇总和可视化来说明一篇科学论文对引文网络的影响。该系统是建立在一个算法管道，最大限度地提高影响流的速率在最后的总结。可视化和交互设计都是根据VEGAS系统的实际使用场景来描述的。

引用次数: 1

Identifying Key-Players in Online Activist Groups on the Facebook Social Network 识别Facebook社交网络上在线活动团体的关键人物

2015 IEEE International Conference on Data Mining Workshop (ICDMW)

Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.88

Mariam Nouh, Jason R. C. Nurse

Online social media applications have become an integral part of our everyday life. Not only are they being utilised by individuals and legitimate businesses, but also recently several organised groups, such as activists, hactivists, and cyber-criminals have adopted them to communicate and' spread their ideas. This represents a new source for intelligence gathering for law enforcement for instance, as it allows them an inside look at the behaviour of these previously closed, secretive groups. One possible opportunity with this online data source is to utilise the public exchange of social-media messages to identify key users in such groups. This is particularly important for law enforcement that wants to monitor or interrogate influential people in suspicious groups. In this paper, we utilise Social Network Analysis (SNA) techniques to understand the dynamics of the interaction between users in a Facebook-based activist group. Additionally, we aim to identify the most influential users in the group and infer their relationship strength. We incorporate sentiment analysis to identify users with clear positive and negative influences on the group, this could aid in facilitating a better understanding of the group. We also perform a temporal analysis to correlate online activities with relevant real-life events. Our results show that applying such data analysis techniques on users online behaviour is a powerful tool to predict levels of influence and relationship strength between group members. Finally, we validated our results against the ground truth and found that our approach is very promising at achieving its aims.

在线社交媒体应用程序已经成为我们日常生活中不可或缺的一部分。它们不仅被个人和合法企业使用，最近一些有组织的团体，如活动家、活动家和网络罪犯也采用它们来交流和“传播他们的想法”。例如，这代表了执法部门收集情报的新来源，因为它允许他们深入了解这些以前封闭的秘密组织的行为。这种在线数据源的一个可能的机会是利用社交媒体信息的公开交换来识别这些群体中的关键用户。这对于想要监视或审问可疑群体中有影响力的人的执法部门尤其重要。在这篇论文中，我们利用社会网络分析(SNA)技术来了解一个基于facebook的活动家团体中用户之间互动的动态。此外，我们的目标是确定群体中最具影响力的用户，并推断他们的关系强度。我们结合了情感分析来识别对群体有明确积极和消极影响的用户，这有助于更好地了解群体。我们还进行了时间分析，将在线活动与相关的现实生活事件联系起来。我们的研究结果表明，将此类数据分析技术应用于用户在线行为是预测群组成员之间影响力和关系强度水平的有力工具。最后，我们根据实际情况验证了我们的结果，发现我们的方法在实现其目标方面非常有希望。

{"title":"Identifying Key-Players in Online Activist Groups on the Facebook Social Network","authors":"Mariam Nouh, Jason R. C. Nurse","doi":"10.1109/ICDMW.2015.88","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.88","url":null,"abstract":"Online social media applications have become an integral part of our everyday life. Not only are they being utilised by individuals and legitimate businesses, but also recently several organised groups, such as activists, hactivists, and cyber-criminals have adopted them to communicate and' spread their ideas. This represents a new source for intelligence gathering for law enforcement for instance, as it allows them an inside look at the behaviour of these previously closed, secretive groups. One possible opportunity with this online data source is to utilise the public exchange of social-media messages to identify key users in such groups. This is particularly important for law enforcement that wants to monitor or interrogate influential people in suspicious groups. In this paper, we utilise Social Network Analysis (SNA) techniques to understand the dynamics of the interaction between users in a Facebook-based activist group. Additionally, we aim to identify the most influential users in the group and infer their relationship strength. We incorporate sentiment analysis to identify users with clear positive and negative influences on the group, this could aid in facilitating a better understanding of the group. We also perform a temporal analysis to correlate online activities with relevant real-life events. Our results show that applying such data analysis techniques on users online behaviour is a powerful tool to predict levels of influence and relationship strength between group members. Finally, we validated our results against the ground truth and found that our approach is very promising at achieving its aims.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121952936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2015 IEEE International Conference on Data Mining Workshop (ICDMW)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀