首页 > 最新文献

2015 IEEE International Conference on Data Mining Workshop (ICDMW)最新文献

英文 中文
Defending Suspected Users by Exploiting Specific Distance Metric in Collaborative Filtering Recommender Systems 协同过滤推荐系统中利用特定距离度量防御可疑用户
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.89
Zhihai Yang, Zhongmin Cai
Collaborative filtering recommender systems (CFRSs) are critical components of existing popular e-commerce websites to make personalized recommendations. In practice, CFRSs are highly vulnerable to "shilling" attacks or "profile injection" attacks due to its openness. A number of detection methods have been proposed to make CFRSs resistant to such attacks. However, some of them distinguished attackers by using typical similarity metrics, which are difficult to fully defend all attackers and show high computation time, although they can be effective to capture the concerned attackers in some extent. In this paper, we propose an unsupervised method to detect such attacks. Firstly, we filter out more genuine users by using suspected target items as far as possible in order to reduce time consumption. Based on the remained result of the first stage, we employ a new similarity metric to further filter out the remained genuine users, which combines the traditional similarity metric and the linkage information between users to improve the accuracy of similarity of users. Experimental results show that our proposed detection method is superior to benchmarked method.
协同过滤推荐系统(CFRSs)是当前流行的电子商务网站进行个性化推荐的关键组件。实际上,由于cfrs的开放性,它极易受到“先令”攻击或“配置文件注入”攻击。已经提出了许多检测方法来使cfrs抵抗此类攻击。然而,其中一些方法使用典型的相似度度量来区分攻击者,这种方法在一定程度上可以有效地捕获相关攻击者,但难以完全防御所有攻击者,且计算时间长。在本文中,我们提出了一种无监督的方法来检测这种攻击。首先,我们尽可能使用可疑的目标项目来过滤掉更多的真实用户,以减少时间消耗。在第一阶段剩余用户的基础上,采用新的相似度度量进一步过滤剩余真实用户,将传统的相似度度量与用户间的关联信息相结合,提高用户相似度的准确性。实验结果表明,本文提出的检测方法优于基准检测方法。
{"title":"Defending Suspected Users by Exploiting Specific Distance Metric in Collaborative Filtering Recommender Systems","authors":"Zhihai Yang, Zhongmin Cai","doi":"10.1109/ICDMW.2015.89","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.89","url":null,"abstract":"Collaborative filtering recommender systems (CFRSs) are critical components of existing popular e-commerce websites to make personalized recommendations. In practice, CFRSs are highly vulnerable to \"shilling\" attacks or \"profile injection\" attacks due to its openness. A number of detection methods have been proposed to make CFRSs resistant to such attacks. However, some of them distinguished attackers by using typical similarity metrics, which are difficult to fully defend all attackers and show high computation time, although they can be effective to capture the concerned attackers in some extent. In this paper, we propose an unsupervised method to detect such attacks. Firstly, we filter out more genuine users by using suspected target items as far as possible in order to reduce time consumption. Based on the remained result of the first stage, we employ a new similarity metric to further filter out the remained genuine users, which combines the traditional similarity metric and the linkage information between users to improve the accuracy of similarity of users. Experimental results show that our proposed detection method is superior to benchmarked method.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116948244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Finding Event Videos via Image Search Engine 通过图像搜索引擎查找事件视频
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.78
Han Wang, Xinxiao Wu
Searching desirable events in uncontrolled videos isa challenging task. Current researches mainly focus on obtaining concepts from numerous labeled videos. But it is time consumingand labor expensive to collect a large amount of required labeled videos to model events under various circumstances. To alleviate the labeling process, we propose to learn models for videos by leveraging abundant Web images which contains a rich source of information with many events taken under various conditions and roughly annotated. However, knowledge from the Web is noisy and diverse, brute force knowledge transfer may hurt the retrieval performance. To address such negative transfer problem, we propose a novel Joint Group Weighting Learning (JGWL) framework to leverage different but related groups of knowledge (source domain) queried from the Web image searching engine to real-world videos (target domain). Under this framework, weights of different groups are learned in a joint optimization framework, and each weight represents how contributive the corresponding image group is to the knowledge transferred to the videos. Moreover, to deal with the feature distribution mismatching between video feature space and image feature space, we build a common feature subspace to bridge these two heterogeneous feature spaces in an unsupervised manner. Experimental results on two challenging video datasets demonstrate that it is effective to use grouped knowledge gained from Web images for video retrieval.
在不受控制的视频中搜索理想事件是一项具有挑战性的任务。目前的研究主要集中在从大量标记视频中获取概念。但是,收集大量需要标记的视频来模拟各种情况下的事件是费时费力的。为了简化标记过程,我们建议利用大量的Web图像来学习视频模型,这些图像包含丰富的信息源,其中包含在各种条件下拍摄的许多事件,并进行了粗略的注释。然而,来自Web的知识具有噪声和多样性,暴力知识迁移可能会影响检索性能。为了解决这种负迁移问题,我们提出了一种新的联合组加权学习(JGWL)框架,利用从Web图像搜索引擎查询的不同但相关的知识组(源域)到现实世界的视频(目标域)。在该框架下,在联合优化框架中学习不同组的权重,每个权重表示相应图像组对转移到视频的知识的贡献程度。此外,为了解决视频特征空间和图像特征空间之间的特征分布不匹配问题,我们构建了一个公共特征子空间,以无监督的方式在这两个异构特征空间之间架起桥梁。在两个具有挑战性的视频数据集上的实验结果表明,利用从Web图像中获得的分组知识进行视频检索是有效的。
{"title":"Finding Event Videos via Image Search Engine","authors":"Han Wang, Xinxiao Wu","doi":"10.1109/ICDMW.2015.78","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.78","url":null,"abstract":"Searching desirable events in uncontrolled videos isa challenging task. Current researches mainly focus on obtaining concepts from numerous labeled videos. But it is time consumingand labor expensive to collect a large amount of required labeled videos to model events under various circumstances. To alleviate the labeling process, we propose to learn models for videos by leveraging abundant Web images which contains a rich source of information with many events taken under various conditions and roughly annotated. However, knowledge from the Web is noisy and diverse, brute force knowledge transfer may hurt the retrieval performance. To address such negative transfer problem, we propose a novel Joint Group Weighting Learning (JGWL) framework to leverage different but related groups of knowledge (source domain) queried from the Web image searching engine to real-world videos (target domain). Under this framework, weights of different groups are learned in a joint optimization framework, and each weight represents how contributive the corresponding image group is to the knowledge transferred to the videos. Moreover, to deal with the feature distribution mismatching between video feature space and image feature space, we build a common feature subspace to bridge these two heterogeneous feature spaces in an unsupervised manner. Experimental results on two challenging video datasets demonstrate that it is effective to use grouped knowledge gained from Web images for video retrieval.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"179 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115105087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Recovering Cross-Device Connections via Mining IP Footprints with Ensemble Learning 通过集成学习挖掘IP足迹恢复跨设备连接
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.129
Xuezhi Cao, Weiyue Huang, Yong Yu
This paper describes our solution to ICDM 2015's contest. The challenge is to recover cross-device connections, i.e. identifying device-cookie pairs that is used by the same natural person. To tackle this task, we first model the privateness of each IP, then employ pairwise ranking techniques for predicting the likelihood of each connection, finally ensemble learning is used for integrating multiple models from various settings. Our approach achieves 5th place in the contest (average F-score of 0.8608) using ONLY IP footprint information.
本文介绍了我们在ICDM 2015竞赛中的解决方案。挑战在于恢复跨设备连接,即识别由同一自然人使用的设备cookie对。为了解决这个问题,我们首先对每个IP的隐私性进行建模,然后使用两两排序技术来预测每个连接的可能性,最后使用集成学习来集成来自不同设置的多个模型。我们的方法仅使用IP占用信息,在竞赛中获得第五名(平均f值为0.8608)。
{"title":"Recovering Cross-Device Connections via Mining IP Footprints with Ensemble Learning","authors":"Xuezhi Cao, Weiyue Huang, Yong Yu","doi":"10.1109/ICDMW.2015.129","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.129","url":null,"abstract":"This paper describes our solution to ICDM 2015's contest. The challenge is to recover cross-device connections, i.e. identifying device-cookie pairs that is used by the same natural person. To tackle this task, we first model the privateness of each IP, then employ pairwise ranking techniques for predicting the likelihood of each connection, finally ensemble learning is used for integrating multiple models from various settings. Our approach achieves 5th place in the contest (average F-score of 0.8608) using ONLY IP footprint information.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115425277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Proposal of LDA-Based Sentiment Visualization of Hotel Reviews 基于lda的酒店评论情感可视化研究
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.72
Yu-Sheng Chen, Lieu-Hen Chen, Y. Takama
With the growth of user generated contents (UGC), it is important to know consumers' opinions about features or deficiencies of products quickly. Such information is important not only for companies, but also for consumers. Keyword-based visualization and clustering are effective methods to observe summary of opinions. In order to decrease users' effort in examining vast amount of UGC, we proposed an interactive visualization system that presents sentiment words with aspects based on natural language processing and sentiment lexicon. This paper also proposes to apply latent Dirichlet allocation (LDA) to cluster reviews into several topics in order to improve understandability of visualization. This paper explains the developed system with case studies.
随着用户生成内容(UGC)的增长,快速了解消费者对产品功能或不足的看法变得非常重要。这些信息不仅对公司很重要,对消费者也很重要。基于关键词的可视化和聚类是观察意见总结的有效方法。为了减少用户检查大量UGC的工作量,我们提出了一种基于自然语言处理和情感词典的情感词分方面呈现的交互式可视化系统。为了提高可视化的可理解性,本文还提出应用潜在狄利克雷分配(latent Dirichlet allocation, LDA)将评论聚类到多个主题中。本文通过案例分析对开发的系统进行了说明。
{"title":"Proposal of LDA-Based Sentiment Visualization of Hotel Reviews","authors":"Yu-Sheng Chen, Lieu-Hen Chen, Y. Takama","doi":"10.1109/ICDMW.2015.72","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.72","url":null,"abstract":"With the growth of user generated contents (UGC), it is important to know consumers' opinions about features or deficiencies of products quickly. Such information is important not only for companies, but also for consumers. Keyword-based visualization and clustering are effective methods to observe summary of opinions. In order to decrease users' effort in examining vast amount of UGC, we proposed an interactive visualization system that presents sentiment words with aspects based on natural language processing and sentiment lexicon. This paper also proposes to apply latent Dirichlet allocation (LDA) to cluster reviews into several topics in order to improve understandability of visualization. This paper explains the developed system with case studies.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115618365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Valuating Queries for Data Trading in Modern Cities 现代城市数据交易查询评估
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.11
Ruiming Tang, Huayu Wu, Xiuqiang He, S. Bressan
The availability of data trading mechanisms and platforms is a paramount prerequisite to the development of effective smart city services. In order for data to become a commodity ready for consumption, transformation and exploitation by smart services, it must be made available and tradable on data market places. For such data market places to be viable there is a compelling need for a sound data pricing model that is conducive of the healthiness of the market. In this paper, we discuss the definition of a pricing model in which views are priced and queries are valuated using views. We define the price of a query as the cheapest combination of the prices of a set of views that can answer the query. We discuss the devising of effective and efficient algorithms of the computation of the price of a query. We show that the problem of computing the price is similar but not identical to the problem of answering queries using views. We therefore adapt the MiniCon algorithm, which was designed to answer queries using views, to the task at hand. We finally discuss further challenges created by the definition of a framework for valuating queries using views.
数据交易机制和平台的可用性是发展有效智慧城市服务的首要先决条件。为了让数据成为一种可供智能服务消费、转换和利用的商品,数据必须在数据市场上可用并可交易。为了使这些数据市场可行,迫切需要一种有利于市场健康发展的健全数据定价模式。在本文中,我们讨论了一个定价模型的定义,在这个模型中,视图被定价,查询被使用视图估值。我们将查询的价格定义为可以回答该查询的一组视图的价格的最便宜组合。我们讨论了查询价格计算的有效和高效算法的设计。我们表明,计算价格的问题与使用视图回答查询的问题类似,但不完全相同。因此,我们将MiniCon算法(设计用于使用视图回答查询)调整为适合手头的任务。最后,我们讨论了使用视图计算查询的框架的定义所带来的进一步挑战。
{"title":"Valuating Queries for Data Trading in Modern Cities","authors":"Ruiming Tang, Huayu Wu, Xiuqiang He, S. Bressan","doi":"10.1109/ICDMW.2015.11","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.11","url":null,"abstract":"The availability of data trading mechanisms and platforms is a paramount prerequisite to the development of effective smart city services. In order for data to become a commodity ready for consumption, transformation and exploitation by smart services, it must be made available and tradable on data market places. For such data market places to be viable there is a compelling need for a sound data pricing model that is conducive of the healthiness of the market. In this paper, we discuss the definition of a pricing model in which views are priced and queries are valuated using views. We define the price of a query as the cheapest combination of the prices of a set of views that can answer the query. We discuss the devising of effective and efficient algorithms of the computation of the price of a query. We show that the problem of computing the price is similar but not identical to the problem of answering queries using views. We therefore adapt the MiniCon algorithm, which was designed to answer queries using views, to the task at hand. We finally discuss further challenges created by the definition of a framework for valuating queries using views.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123408986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Paradigmatic Clustering for NLP NLP的范式聚类
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.233
Julio Santisteban, Javier Tejada-Cárcamo
How can we retrieve meaningful information from a large and sparse graph?. Traditional approaches focus on generic clustering techniques and discovering dense cumulus in a network graph, however, they tend to omit interesting patterns such as the paradigmatic relations. In this paper, we propose a novel graph clustering technique modelling the relations of a node using the paradigmatic analysis. We exploit node's relations to extract its existing sets of signifiers. The newly found clusters represent a different view of a graph, which provides interesting insights into the structure of a sparse network graph. Our proposed algorithm PaC (Paradigmatic Clustering) for clustering graphs uses paradigmatic analysis supported by a asymmetric similarity, in contrast to traditional graph clustering methods, our algorithm yields worthy results in tasks of word-sense disambiguation. In addition we propose a novel paradigmatic similarity measure. Extensive experiments and empirical analysis are used to evaluate our algorithm on synthetic and real data.
我们如何从一个大而稀疏的图中检索有意义的信息?传统的方法侧重于通用聚类技术和发现网络图中的密集积云,然而,它们往往忽略了有趣的模式,如范式关系。在本文中,我们提出了一种新的图聚类技术,利用范式分析对节点之间的关系进行建模。我们利用节点的关系来提取其现有的能指集。新发现的聚类代表了图的不同视图,这为稀疏网络图的结构提供了有趣的见解。我们提出的聚类图的聚类算法PaC(范式聚类)使用非对称相似度支持的聚类分析,与传统的图聚类方法相比,我们的算法在词义消歧任务中产生了有价值的结果。此外,我们提出了一种新的范式相似性度量。通过大量的实验和实证分析,在合成数据和实际数据上对我们的算法进行了评估。
{"title":"Paradigmatic Clustering for NLP","authors":"Julio Santisteban, Javier Tejada-Cárcamo","doi":"10.1109/ICDMW.2015.233","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.233","url":null,"abstract":"How can we retrieve meaningful information from a large and sparse graph?. Traditional approaches focus on generic clustering techniques and discovering dense cumulus in a network graph, however, they tend to omit interesting patterns such as the paradigmatic relations. In this paper, we propose a novel graph clustering technique modelling the relations of a node using the paradigmatic analysis. We exploit node's relations to extract its existing sets of signifiers. The newly found clusters represent a different view of a graph, which provides interesting insights into the structure of a sparse network graph. Our proposed algorithm PaC (Paradigmatic Clustering) for clustering graphs uses paradigmatic analysis supported by a asymmetric similarity, in contrast to traditional graph clustering methods, our algorithm yields worthy results in tasks of word-sense disambiguation. In addition we propose a novel paradigmatic similarity measure. Extensive experiments and empirical analysis are used to evaluate our algorithm on synthetic and real data.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123666342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Identifying Key-Players in Online Activist Groups on the Facebook Social Network 识别Facebook社交网络上在线活动团体的关键人物
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.88
Mariam Nouh, Jason R. C. Nurse
Online social media applications have become an integral part of our everyday life. Not only are they being utilised by individuals and legitimate businesses, but also recently several organised groups, such as activists, hactivists, and cyber-criminals have adopted them to communicate and' spread their ideas. This represents a new source for intelligence gathering for law enforcement for instance, as it allows them an inside look at the behaviour of these previously closed, secretive groups. One possible opportunity with this online data source is to utilise the public exchange of social-media messages to identify key users in such groups. This is particularly important for law enforcement that wants to monitor or interrogate influential people in suspicious groups. In this paper, we utilise Social Network Analysis (SNA) techniques to understand the dynamics of the interaction between users in a Facebook-based activist group. Additionally, we aim to identify the most influential users in the group and infer their relationship strength. We incorporate sentiment analysis to identify users with clear positive and negative influences on the group, this could aid in facilitating a better understanding of the group. We also perform a temporal analysis to correlate online activities with relevant real-life events. Our results show that applying such data analysis techniques on users online behaviour is a powerful tool to predict levels of influence and relationship strength between group members. Finally, we validated our results against the ground truth and found that our approach is very promising at achieving its aims.
在线社交媒体应用程序已经成为我们日常生活中不可或缺的一部分。它们不仅被个人和合法企业使用,最近一些有组织的团体,如活动家、活动家和网络罪犯也采用它们来交流和“传播他们的想法”。例如,这代表了执法部门收集情报的新来源,因为它允许他们深入了解这些以前封闭的秘密组织的行为。这种在线数据源的一个可能的机会是利用社交媒体信息的公开交换来识别这些群体中的关键用户。这对于想要监视或审问可疑群体中有影响力的人的执法部门尤其重要。在这篇论文中,我们利用社会网络分析(SNA)技术来了解一个基于facebook的活动家团体中用户之间互动的动态。此外,我们的目标是确定群体中最具影响力的用户,并推断他们的关系强度。我们结合了情感分析来识别对群体有明确积极和消极影响的用户,这有助于更好地了解群体。我们还进行了时间分析,将在线活动与相关的现实生活事件联系起来。我们的研究结果表明,将此类数据分析技术应用于用户在线行为是预测群组成员之间影响力和关系强度水平的有力工具。最后,我们根据实际情况验证了我们的结果,发现我们的方法在实现其目标方面非常有希望。
{"title":"Identifying Key-Players in Online Activist Groups on the Facebook Social Network","authors":"Mariam Nouh, Jason R. C. Nurse","doi":"10.1109/ICDMW.2015.88","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.88","url":null,"abstract":"Online social media applications have become an integral part of our everyday life. Not only are they being utilised by individuals and legitimate businesses, but also recently several organised groups, such as activists, hactivists, and cyber-criminals have adopted them to communicate and' spread their ideas. This represents a new source for intelligence gathering for law enforcement for instance, as it allows them an inside look at the behaviour of these previously closed, secretive groups. One possible opportunity with this online data source is to utilise the public exchange of social-media messages to identify key users in such groups. This is particularly important for law enforcement that wants to monitor or interrogate influential people in suspicious groups. In this paper, we utilise Social Network Analysis (SNA) techniques to understand the dynamics of the interaction between users in a Facebook-based activist group. Additionally, we aim to identify the most influential users in the group and infer their relationship strength. We incorporate sentiment analysis to identify users with clear positive and negative influences on the group, this could aid in facilitating a better understanding of the group. We also perform a temporal analysis to correlate online activities with relevant real-life events. Our results show that applying such data analysis techniques on users online behaviour is a powerful tool to predict levels of influence and relationship strength between group members. Finally, we validated our results against the ground truth and found that our approach is very promising at achieving its aims.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121952936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Shikake Data Market for Collaborative Shikake Creation Shikake数据市场协同Shikake创作
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.130
N. Matsumura, Hideaki Takeda
A shikake is a trigger for behavioral change to solve a problem. We proposes a Shikake Data Market (SDM) platform for giving everyone an opportunity to implement a shikake with restricted resources, such as ideas, expert knowledge and skill, practitioners, negotiators, and budget. As a preliminary case, we analyzed the collaborative creation at a shikake hackathon and revealed that collaboration among people with diverse expert backgrounds would improve the quality of the output. Based on this result, we discuss collaborative shikake creation.
shikake是解决问题的行为改变的触发器。我们提出了一个Shikake数据市场(SDM)平台,让每个人都有机会利用有限的资源(如想法、专业知识和技能、从业者、谈判者和预算)来实施Shikake。作为初步案例,我们分析了shikake黑客马拉松的协作创作,揭示了不同专家背景的人之间的协作可以提高产出的质量。基于这一结果,我们讨论了协同诗歌创作。
{"title":"Shikake Data Market for Collaborative Shikake Creation","authors":"N. Matsumura, Hideaki Takeda","doi":"10.1109/ICDMW.2015.130","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.130","url":null,"abstract":"A shikake is a trigger for behavioral change to solve a problem. We proposes a Shikake Data Market (SDM) platform for giving everyone an opportunity to implement a shikake with restricted resources, such as ideas, expert knowledge and skill, practitioners, negotiators, and budget. As a preliminary case, we analyzed the collaborative creation at a shikake hackathon and revealed that collaboration among people with diverse expert backgrounds would improve the quality of the output. Based on this result, we discuss collaborative shikake creation.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"136 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124031810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Link Prediction in Large Networks by Comparing the Global View of Nodes in the Network 比较网络节点全局视图的大型网络链路预测
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.195
Mustafa Coşkun, Mehmet Koyutürk
Link prediction is an important and well-studiedproblem in network analysis, with a broad range of applicationsincluding recommender systems, anomaly detection, and denoising. The general principle in link prediction is to use thetopological characteristics of the nodes in the network to predictedges that might be added to or removed from the network. While early research utilized local network neighborhood tocharacterize the topological relationship between pairs of nodes, recent studies increasingly show that use of global networkinformation improves prediction performance. Meanwhile, in thecontext of disease gene prioritization and functional annotationin computational biology, "global topological similarity" basedmethods are shown to be effective and robust to noise andascertainment bias. These methods compute topological profilesthat represent the global view of the network from the perspectiveof each node and compare these topological profiles to assess thetopological similarity between nodes. Here, we show that, in thecontext of link prediction in large networks, the performance ofthese global-view based methods can be adversely affected byhigh dimensionality. Motivated by this observation, we proposetwo dimensionality reduction techniques that exploit the sparsityand modularity of networks that are encountered in practicalapplications. Our experimental results on predicting futurecollaborations based on a comprehensive co-authorship networkshows that dimensionality reduction renders global-view basedlink prediction highly effective, and the resulting algorithmssignificantly outperform state-of-the-art link prediction methods.
链接预测是网络分析中一个重要且研究得很好的问题,具有广泛的应用范围,包括推荐系统,异常检测和去噪。链路预测的一般原则是使用网络中节点的拓扑特征来预测可能添加到网络或从网络中删除的内容。虽然早期的研究利用局部网络邻域来表征节点对之间的拓扑关系,但最近的研究越来越多地表明,使用全局网络信息可以提高预测性能。同时,在计算生物学中疾病基因优先排序和功能注释的背景下,基于“全局拓扑相似性”的方法被证明是有效的,并且对噪声和确定偏差具有鲁棒性。这些方法从每个节点的角度计算代表网络全局视图的拓扑概况,并比较这些拓扑概况以评估节点之间的拓扑相似性。在这里,我们表明,在大型网络的链接预测背景下,这些基于全局视图的方法的性能可能会受到高维的不利影响。基于这一观察结果,我们提出了利用实际应用中遇到的网络的稀疏性和模块化的两种降维技术。我们基于一个全面的合作网络预测未来合作的实验结果表明,降维使得基于全局视图的链接预测非常有效,所得到的算法明显优于最先进的链接预测方法。
{"title":"Link Prediction in Large Networks by Comparing the Global View of Nodes in the Network","authors":"Mustafa Coşkun, Mehmet Koyutürk","doi":"10.1109/ICDMW.2015.195","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.195","url":null,"abstract":"Link prediction is an important and well-studiedproblem in network analysis, with a broad range of applicationsincluding recommender systems, anomaly detection, and denoising. The general principle in link prediction is to use thetopological characteristics of the nodes in the network to predictedges that might be added to or removed from the network. While early research utilized local network neighborhood tocharacterize the topological relationship between pairs of nodes, recent studies increasingly show that use of global networkinformation improves prediction performance. Meanwhile, in thecontext of disease gene prioritization and functional annotationin computational biology, \"global topological similarity\" basedmethods are shown to be effective and robust to noise andascertainment bias. These methods compute topological profilesthat represent the global view of the network from the perspectiveof each node and compare these topological profiles to assess thetopological similarity between nodes. Here, we show that, in thecontext of link prediction in large networks, the performance ofthese global-view based methods can be adversely affected byhigh dimensionality. Motivated by this observation, we proposetwo dimensionality reduction techniques that exploit the sparsityand modularity of networks that are encountered in practicalapplications. Our experimental results on predicting futurecollaborations based on a comprehensive co-authorship networkshows that dimensionality reduction renders global-view basedlink prediction highly effective, and the resulting algorithmssignificantly outperform state-of-the-art link prediction methods.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123970720","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
Incremental Discriminant Learning for Heterogeneous Domain Adaptation 异构领域适应的增量判别学习
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.186
Peng Han, Xinxiao Wu
This paper proposes a new incremental learning method for heterogeneous domain adaptation, in which the training data from both source domain and target domains are acquired sequentially, represented by heterogeneous features. Two different projection matrices are learned to map the data from two domains into a discriminative common subspace, where the intra-class samples are closely-related to each other, the inter-class samples are well-separated from each other, and the data distribution mismatch between the source and target domains is reduced. Different from previous work, our method is capable of incrementally optimizing the projection matrices when the training data becomes available as a data stream instead of being given completely in advance. With the gradually coming training data, the new projection matrices are computed by updating the existing ones using an eigenspace merging algorithm, rather than repeating the learning from the begin by keeping the whole training data set. Therefore, our incremental learning solution for the projection matrices can significantly reduce the computational complexity and memory space, which makes it applicable to a wider set of heterogeneous domain adaptation scenarios with a large training dataset. Furthermore, our method is neither restricted to the corresponding training instances in the source and target domains nor restricted to the same type of feature, which meaningfully relaxes the requirement of training data. Comprehensive experiments on three benchmark datasets clearly demonstrate the effectiveness and efficiency of our method.
本文提出了一种新的异构域自适应增量学习方法,该方法将源域和目标域的训练数据依次获取,并以异构特征表示。学习了两个不同的投影矩阵,将两个域的数据映射到一个判别性的公共子空间中,使类内样本之间的关系密切,类间样本之间的分离良好,减少了源域和目标域之间的数据分布不匹配。与以往的工作不同,我们的方法能够在训练数据作为数据流可用时增量优化投影矩阵,而不是完全提前给出。随着训练数据的不断增加,新的投影矩阵是通过特征空间合并算法更新已有的投影矩阵来计算的,而不是通过保留整个训练数据集来从头开始重复学习。因此,我们的投影矩阵增量学习方案可以显著降低计算复杂度和内存空间,使其适用于具有大型训练数据集的更广泛的异构域适应场景。此外,我们的方法既不局限于源域和目标域对应的训练实例,也不局限于同一类型的特征,这有意义地放宽了对训练数据的要求。在三个基准数据集上的综合实验清楚地证明了该方法的有效性和高效性。
{"title":"Incremental Discriminant Learning for Heterogeneous Domain Adaptation","authors":"Peng Han, Xinxiao Wu","doi":"10.1109/ICDMW.2015.186","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.186","url":null,"abstract":"This paper proposes a new incremental learning method for heterogeneous domain adaptation, in which the training data from both source domain and target domains are acquired sequentially, represented by heterogeneous features. Two different projection matrices are learned to map the data from two domains into a discriminative common subspace, where the intra-class samples are closely-related to each other, the inter-class samples are well-separated from each other, and the data distribution mismatch between the source and target domains is reduced. Different from previous work, our method is capable of incrementally optimizing the projection matrices when the training data becomes available as a data stream instead of being given completely in advance. With the gradually coming training data, the new projection matrices are computed by updating the existing ones using an eigenspace merging algorithm, rather than repeating the learning from the begin by keeping the whole training data set. Therefore, our incremental learning solution for the projection matrices can significantly reduce the computational complexity and memory space, which makes it applicable to a wider set of heterogeneous domain adaptation scenarios with a large training dataset. Furthermore, our method is neither restricted to the corresponding training instances in the source and target domains nor restricted to the same type of feature, which meaningfully relaxes the requirement of training data. Comprehensive experiments on three benchmark datasets clearly demonstrate the effectiveness and efficiency of our method.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129324802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
期刊
2015 IEEE International Conference on Data Mining Workshop (ICDMW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1