首页 > 最新文献

2015 IEEE International Conference on Data Mining Workshop (ICDMW)最新文献

英文 中文
Discovering Anomalies and Root Causes in Applications via Relevant Fields Analysis 通过相关领域分析发现应用中的异常和根本原因
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.68
Yuchen Zhao, Arjun Iyer, Ariel Smoliar
In this paper, we present a powerful end-to-end data mining system that collects application related data and provides insightful relevant fields analysis in addition to search and filtering. We present details on field extraction, indexing, relevant field processing and dynamic baseline derivation. We also propose to demonstrate the effectiveness of various scoring algorithms. Two real-world use cases show relevant fields analysis is effective to detect application anomalies and discover root causes of application incidents.
在本文中,我们提出了一个强大的端到端数据挖掘系统,除了搜索和过滤之外,还可以收集与应用程序相关的数据,并提供深刻的相关领域分析。详细介绍了字段提取、索引、相关字段处理和动态基线推导。我们还建议演示各种评分算法的有效性。两个真实的用例表明,相关领域分析对于检测应用程序异常和发现应用程序事件的根本原因是有效的。
{"title":"Discovering Anomalies and Root Causes in Applications via Relevant Fields Analysis","authors":"Yuchen Zhao, Arjun Iyer, Ariel Smoliar","doi":"10.1109/ICDMW.2015.68","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.68","url":null,"abstract":"In this paper, we present a powerful end-to-end data mining system that collects application related data and provides insightful relevant fields analysis in addition to search and filtering. We present details on field extraction, indexing, relevant field processing and dynamic baseline derivation. We also propose to demonstrate the effectiveness of various scoring algorithms. Two real-world use cases show relevant fields analysis is effective to detect application anomalies and discover root causes of application incidents.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"18 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132810810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multiresolution Mutual Information Method for Social Network Entity Resolution 社会网络实体解析的多分辨率互信息方法
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.94
Cong Shi, Rong Duan
Online Social Networks (OSN) are widely adopted in our daily lives, and it is common for one individual to register with multiple sites for different services. Linking the rich contents of different social network sites is valuable to researchers for understanding human behaviors from different perspectives. For instance, each OSN has its own group of users and thus, has its own biases. Linked accounts can be a good calibration dataset to improve data quality. This Entity Resolution (ER) problem is a challenge in the social network domain that many researchers attempt to tackle. In this paper we take advantage of spatial information posted in different social network sites and propose an efficient multiresolution mutual information approach to link the entities from those sites. The proposed method significantly reduces the computing time by utilizing an iterative coarse-to-fine multiresolution approach, yet is robust in dealing with the sparsity of location data. The human location-wise behavior is also discussed in deciding the resolution level. Public available Twitter and Instagram data collected from their APIs are used to illustrate the method, and the performance is evaluated by comparing it with greedy mutual information approach.
在线社交网络(Online Social Networks, OSN)在我们的日常生活中被广泛采用,一个人在多个网站注册不同的服务是很常见的。将不同社交网站的丰富内容链接起来,对于研究人员从不同角度理解人类行为具有重要价值。例如,每个OSN都有自己的用户组,因此有自己的偏差。关联账户可以是一个很好的校准数据集,以提高数据质量。实体解析(ER)问题是社交网络领域许多研究者试图解决的难题。本文利用不同社交网站上发布的空间信息,提出了一种高效的多分辨率互信息方法来链接这些网站上的实体。该方法采用迭代的从粗到精的多分辨率方法,大大减少了计算时间,并且在处理位置数据的稀疏性方面具有鲁棒性。在确定分辨率水平时,还讨论了人类的位置智能行为。使用公开可用的Twitter和Instagram数据来说明该方法,并通过将其与贪婪互信息方法进行比较来评估性能。
{"title":"Multiresolution Mutual Information Method for Social Network Entity Resolution","authors":"Cong Shi, Rong Duan","doi":"10.1109/ICDMW.2015.94","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.94","url":null,"abstract":"Online Social Networks (OSN) are widely adopted in our daily lives, and it is common for one individual to register with multiple sites for different services. Linking the rich contents of different social network sites is valuable to researchers for understanding human behaviors from different perspectives. For instance, each OSN has its own group of users and thus, has its own biases. Linked accounts can be a good calibration dataset to improve data quality. This Entity Resolution (ER) problem is a challenge in the social network domain that many researchers attempt to tackle. In this paper we take advantage of spatial information posted in different social network sites and propose an efficient multiresolution mutual information approach to link the entities from those sites. The proposed method significantly reduces the computing time by utilizing an iterative coarse-to-fine multiresolution approach, yet is robust in dealing with the sparsity of location data. The human location-wise behavior is also discussed in deciding the resolution level. Public available Twitter and Instagram data collected from their APIs are used to illustrate the method, and the performance is evaluated by comparing it with greedy mutual information approach.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133943045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Dynamic Community Detection Algorithm Based on Incremental Identification 基于增量识别的动态社区检测算法
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.158
Xiaoming Li, Bin Wu, Qian Guo, Xuelin Zeng, C. Shi
Dynamic community detection algorithms try to solve problems that identify communities of dynamic network which consists of a series of network snapshots. To address this issue, here we propose a new dynamic community detection algorithm based on incremental identification according to a vertex-based metric called permanence. We incrementally analyze the community ownership of partial vertices, so as to avoid the reassignment of all the vertices in the network to their respective communities. In addition, we propose a new metrics called evolution strength to measure the error probably caused by incrementally assigning the community ownership or the abrupt change of network structure. The experiment results show that our proposed algorithm is able to identify the community structure in a network with a higher efficiency. Meanwhile, due to the lack of dynamic network data with ground-truth structure and limitation of existing synthetic methods, we propose a novel method for generating synthetic data of dynamic network with ground-truth structure, which defines evolution events and evolution rate of events, so as to get more realistic synthetic data.
动态社区检测算法试图解决由一系列网络快照组成的动态网络中社区的识别问题。为了解决这个问题,我们提出了一种新的动态社区检测算法,该算法基于基于顶点的增量识别,称为持久性。我们逐步分析部分顶点的社区所有权,以避免网络中所有顶点重新分配到各自的社区。此外,我们还提出了一种新的度量进化强度的方法来度量由于社区所有权的增量分配或网络结构的突变可能引起的误差。实验结果表明,本文提出的算法能够以较高的效率识别网络中的社区结构。同时,针对具有地真结构的动态网络数据缺乏和现有合成方法的局限性,提出了一种生成具有地真结构的动态网络合成数据的新方法,该方法定义了进化事件和事件的进化速率,从而获得更真实的合成数据。
{"title":"Dynamic Community Detection Algorithm Based on Incremental Identification","authors":"Xiaoming Li, Bin Wu, Qian Guo, Xuelin Zeng, C. Shi","doi":"10.1109/ICDMW.2015.158","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.158","url":null,"abstract":"Dynamic community detection algorithms try to solve problems that identify communities of dynamic network which consists of a series of network snapshots. To address this issue, here we propose a new dynamic community detection algorithm based on incremental identification according to a vertex-based metric called permanence. We incrementally analyze the community ownership of partial vertices, so as to avoid the reassignment of all the vertices in the network to their respective communities. In addition, we propose a new metrics called evolution strength to measure the error probably caused by incrementally assigning the community ownership or the abrupt change of network structure. The experiment results show that our proposed algorithm is able to identify the community structure in a network with a higher efficiency. Meanwhile, due to the lack of dynamic network data with ground-truth structure and limitation of existing synthetic methods, we propose a novel method for generating synthetic data of dynamic network with ground-truth structure, which defines evolution events and evolution rate of events, so as to get more realistic synthetic data.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"55 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132117065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
A Stochastic Game Theoretic Model for Expanding ATM Services ATM业务扩展的随机博弈论模型
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.125
Raja Rathnam Naidu Kanapaka, Raghu Neelisetti
ATMs aim to extend essential banking services such as cash withdrawal and deposit beyond the working hours of a bank's branch. However, ATMs incur a significant significant cost overhead in the form of capital and operational costs. The problem of ATM location is further complicated as customers of one bank can use their debit cards at any other bank's ATMs. While this might attract charges, some banks often refund these charges to attract customers. Banks need to have a mechanism to quantitatively measure the benefits of managing their own ATM versus paying for services rendered to it's customers by other banks through their ATMs. Game theory is the study of strategic decision making and is an effective technique to identify the best business strategy when provided with multiple options. In this paper we propose a game theoretic model based on stochastic games to identify the best strategy to be adopted by banks for their ATM expansion. We further propose an algorithm to identify the idle locations where a bank should place an ATM should the result of the ATM game recommend that the bank should establish it's own ATM.
自动取款机的目的是在银行分支机构的工作时间之外提供基本的银行服务,如提取现金和存款。然而,自动柜员机在资本和运营成本方面产生了巨大的成本开销。由于一家银行的客户可以在任何一家银行的自动柜员机上使用借记卡,因此自动柜员机的位置问题更加复杂。虽然这可能会收取费用,但一些银行通常会退还这些费用以吸引客户。银行需要有一种机制来定量衡量管理自己的ATM与支付其他银行通过其ATM向客户提供的服务的好处。博弈论是对战略决策的研究,是在提供多种选择时确定最佳商业战略的有效技术。本文提出了一个基于随机博弈的博弈论模型,以确定银行ATM机扩张的最佳策略。我们进一步提出了一种算法,当自动取款机博弈的结果建议银行建立自己的自动取款机时,确定银行应该放置自动取款机的空闲位置。
{"title":"A Stochastic Game Theoretic Model for Expanding ATM Services","authors":"Raja Rathnam Naidu Kanapaka, Raghu Neelisetti","doi":"10.1109/ICDMW.2015.125","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.125","url":null,"abstract":"ATMs aim to extend essential banking services such as cash withdrawal and deposit beyond the working hours of a bank's branch. However, ATMs incur a significant significant cost overhead in the form of capital and operational costs. The problem of ATM location is further complicated as customers of one bank can use their debit cards at any other bank's ATMs. While this might attract charges, some banks often refund these charges to attract customers. Banks need to have a mechanism to quantitatively measure the benefits of managing their own ATM versus paying for services rendered to it's customers by other banks through their ATMs. Game theory is the study of strategic decision making and is an effective technique to identify the best business strategy when provided with multiple options. In this paper we propose a game theoretic model based on stochastic games to identify the best strategy to be adopted by banks for their ATM expansion. We further propose an algorithm to identify the idle locations where a bank should place an ATM should the result of the ATM game recommend that the bank should establish it's own ATM.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134502500","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Multi-Classes Feature Engineering with Sliding Window for Purchase Prediction in Mobile Commerce 基于滑动窗口的移动商务购买预测多类特征工程
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.172
Qiang Li, Maojie Gu, Keren Zhou, Xiaoming Sun
Mobile devices become more and more prevalent in recent years, especially in young groups. The rapid progress of mobile devices promotes the development of M-Commerce business. The purchase on mobile terminals accounts for a considerable percentage in the total trading volume of E-Commerce and begins to draw the attention of E-Commerce corporation. Alibaba held a Mobile Recommendation Algorithm Competition aiming to recommend appropriate items for mobile users at the right time and place. The dataset provided by Alibaba consists of about 6 billion operation logs made by 5 million Taobao users towards over 150 million items spanning a period of one month. Compared with traditional scenarios in purchase predicting, the competition raised three challenges: (1)The dataset is too large to be processed in personal computers, (2)Some days with great discounts provided by Taobao Marketplace are within the period of dataset, (3)Positive samples are too few compared to the dimension of features. In this paper we study the problem of predicting the purchase behaviour of M-Commerce users, by exploring the solution for Alibaba's Mobile Recommendation Algorithm Competition. We first deeply study the habit of customers and filter many outliers. After that we adopt the method of "sliding window" to supply positive samples of training dataset and smooth the burst of sales near Dec 12th. We design a feature engineering framework to extract 6 categories of features that aim to capture the buying potential of user-item pairs. Our features exploit the interaction of user-item pair, user's shopping habit and item' attraction for users. Then we apply Gradient Boost Decision Trees (GBDT) as the training model. In the end, we combine outputs of individual GBDT together by Logistic Regression to get the final predictions. Our solution achieves 8.66% F1 score, and ranks the third place in the final round.
近年来,移动设备变得越来越普遍,尤其是在年轻群体中。移动设备的快速进步促进了移动商务业务的发展。移动端采购在电子商务总交易额中占有相当大的比重,并开始引起电子商务企业的重视。阿里巴巴举办了移动推荐算法大赛,旨在为移动用户在合适的时间和地点推荐合适的商品。阿里巴巴提供的数据集包括500万淘宝用户在一个月内对1.5亿多件商品的约60亿次操作日志。与传统的购买预测场景相比,竞争提出了三个挑战:(1)数据集太大,无法在个人电脑上处理;(2)淘宝提供的大折扣天数在数据集的周期内;(3)与特征维数相比,正样本太少。本文通过探索阿里巴巴移动推荐算法竞赛的解决方案,研究移动商务用户购买行为预测问题。我们首先深入研究顾客的习惯,过滤掉很多异常值。之后我们采用“滑动窗口”的方法提供训练数据集的正样本,平滑12月12日附近的销售爆发。我们设计了一个特征工程框架来提取6类特征,旨在捕捉用户-物品对的购买潜力。我们的特征利用了用户-物品对的交互、用户的购物习惯和物品对用户的吸引力。然后应用梯度提升决策树(GBDT)作为训练模型。最后,我们通过逻辑回归将各个GBDT的输出组合在一起,得到最终的预测结果。我们的方案达到了8.66%的F1得分,在最后一轮中排名第三。
{"title":"Multi-Classes Feature Engineering with Sliding Window for Purchase Prediction in Mobile Commerce","authors":"Qiang Li, Maojie Gu, Keren Zhou, Xiaoming Sun","doi":"10.1109/ICDMW.2015.172","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.172","url":null,"abstract":"Mobile devices become more and more prevalent in recent years, especially in young groups. The rapid progress of mobile devices promotes the development of M-Commerce business. The purchase on mobile terminals accounts for a considerable percentage in the total trading volume of E-Commerce and begins to draw the attention of E-Commerce corporation. Alibaba held a Mobile Recommendation Algorithm Competition aiming to recommend appropriate items for mobile users at the right time and place. The dataset provided by Alibaba consists of about 6 billion operation logs made by 5 million Taobao users towards over 150 million items spanning a period of one month. Compared with traditional scenarios in purchase predicting, the competition raised three challenges: (1)The dataset is too large to be processed in personal computers, (2)Some days with great discounts provided by Taobao Marketplace are within the period of dataset, (3)Positive samples are too few compared to the dimension of features. In this paper we study the problem of predicting the purchase behaviour of M-Commerce users, by exploring the solution for Alibaba's Mobile Recommendation Algorithm Competition. We first deeply study the habit of customers and filter many outliers. After that we adopt the method of \"sliding window\" to supply positive samples of training dataset and smooth the burst of sales near Dec 12th. We design a feature engineering framework to extract 6 categories of features that aim to capture the buying potential of user-item pairs. Our features exploit the interaction of user-item pair, user's shopping habit and item' attraction for users. Then we apply Gradient Boost Decision Trees (GBDT) as the training model. In the end, we combine outputs of individual GBDT together by Logistic Regression to get the final predictions. Our solution achieves 8.66% F1 score, and ranks the third place in the final round.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"134 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134567183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Profit Maximization Analysis Based on Data Mining and the Exponential Retention Model Assumption with Respect to Customer Churn Problems 基于数据挖掘和指数保留模型假设的客户流失问题利润最大化分析
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.84
Zhaojing Zhang, R. Wang, Weihong Zheng, Shizhan Lan, D. Liang, Hao Jin
Confronted with fierce competition, an increasing number of telecommunication companies in China realize that they can increase proflts by reducing the rate of customer churn rather than attracting the same number of new customers. Recently, the availability of big data has increased, which has stimulated the development of data mining techniques. Identifying methods by which to maximize proflts is vital for operators based on big data. Novelly, this paper studies three key factors of the customer churn problem, namely, churn rate, prediction performance, and retention capability. We propose a proflt function that maximizes proflts under different conditions and obtain favorable results in applying it to sample data from China Mobile Communications Corporation. Theoretically, about 7.72 million Chinese Yuan per month can be obtained by applying proposed model to China Mobile Group Guangxi Company Limited, making our research of great economic value.
面对激烈的竞争,中国越来越多的电信公司意识到,他们可以通过降低客户流失率来增加利润,而不是吸引同样数量的新客户。近年来,大数据的可用性增加,刺激了数据挖掘技术的发展。对于基于大数据的运营商来说,确定利润最大化的方法至关重要。本文新颖地研究了客户流失问题的三个关键因素,即流失率、预测性能和保留能力。我们提出了在不同条件下利润最大化的proft函数,并将其应用到中国移动通信公司的样本数据中,取得了良好的效果。理论上,将该模型应用于中国移动集团广西有限公司,每月可获得约772万元人民币,具有较大的经济价值。
{"title":"Profit Maximization Analysis Based on Data Mining and the Exponential Retention Model Assumption with Respect to Customer Churn Problems","authors":"Zhaojing Zhang, R. Wang, Weihong Zheng, Shizhan Lan, D. Liang, Hao Jin","doi":"10.1109/ICDMW.2015.84","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.84","url":null,"abstract":"Confronted with fierce competition, an increasing number of telecommunication companies in China realize that they can increase proflts by reducing the rate of customer churn rather than attracting the same number of new customers. Recently, the availability of big data has increased, which has stimulated the development of data mining techniques. Identifying methods by which to maximize proflts is vital for operators based on big data. Novelly, this paper studies three key factors of the customer churn problem, namely, churn rate, prediction performance, and retention capability. We propose a proflt function that maximizes proflts under different conditions and obtain favorable results in applying it to sample data from China Mobile Communications Corporation. Theoretically, about 7.72 million Chinese Yuan per month can be obtained by applying proposed model to China Mobile Group Guangxi Company Limited, making our research of great economic value.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134602293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Signed Directed Social Network Analysis Applied to Group Conflict 签名导向社会网络分析在群体冲突中的应用
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.107
Q. Zheng, D. Skillicorn, O. Walther
Real-world social networks contain relationships of multiple different types, but this richness is often ignored in graph-theoretic modelling. We show how two recently developed spectral embedding techniques, for directed graphs (relationships are asymmetric) and for signed graphs (relationships are both positive and negative), can be combined. This combination is particularly appropriate for intelligence, terrorism, and law-enforcement applications. We illustrate by applying the novel embedding technique to datasets describing conflict in North-West Africa, and show how unusual interactions can be identified.
现实世界的社会网络包含多种不同类型的关系,但这种丰富性在图论建模中经常被忽略。我们展示了两种最近开发的频谱嵌入技术,用于有向图(关系是不对称的)和符号图(关系是正的和负的),可以结合起来。这种组合特别适用于情报、恐怖主义和执法应用。我们通过将新的嵌入技术应用于描述西北非洲冲突的数据集来说明,并展示了如何识别不寻常的相互作用。
{"title":"Signed Directed Social Network Analysis Applied to Group Conflict","authors":"Q. Zheng, D. Skillicorn, O. Walther","doi":"10.1109/ICDMW.2015.107","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.107","url":null,"abstract":"Real-world social networks contain relationships of multiple different types, but this richness is often ignored in graph-theoretic modelling. We show how two recently developed spectral embedding techniques, for directed graphs (relationships are asymmetric) and for signed graphs (relationships are both positive and negative), can be combined. This combination is particularly appropriate for intelligence, terrorism, and law-enforcement applications. We illustrate by applying the novel embedding technique to datasets describing conflict in North-West Africa, and show how unusual interactions can be identified.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134066326","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Proposal of LDA-Based Sentiment Visualization of Hotel Reviews 基于lda的酒店评论情感可视化研究
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.72
Yu-Sheng Chen, Lieu-Hen Chen, Y. Takama
With the growth of user generated contents (UGC), it is important to know consumers' opinions about features or deficiencies of products quickly. Such information is important not only for companies, but also for consumers. Keyword-based visualization and clustering are effective methods to observe summary of opinions. In order to decrease users' effort in examining vast amount of UGC, we proposed an interactive visualization system that presents sentiment words with aspects based on natural language processing and sentiment lexicon. This paper also proposes to apply latent Dirichlet allocation (LDA) to cluster reviews into several topics in order to improve understandability of visualization. This paper explains the developed system with case studies.
随着用户生成内容(UGC)的增长,快速了解消费者对产品功能或不足的看法变得非常重要。这些信息不仅对公司很重要,对消费者也很重要。基于关键词的可视化和聚类是观察意见总结的有效方法。为了减少用户检查大量UGC的工作量,我们提出了一种基于自然语言处理和情感词典的情感词分方面呈现的交互式可视化系统。为了提高可视化的可理解性,本文还提出应用潜在狄利克雷分配(latent Dirichlet allocation, LDA)将评论聚类到多个主题中。本文通过案例分析对开发的系统进行了说明。
{"title":"Proposal of LDA-Based Sentiment Visualization of Hotel Reviews","authors":"Yu-Sheng Chen, Lieu-Hen Chen, Y. Takama","doi":"10.1109/ICDMW.2015.72","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.72","url":null,"abstract":"With the growth of user generated contents (UGC), it is important to know consumers' opinions about features or deficiencies of products quickly. Such information is important not only for companies, but also for consumers. Keyword-based visualization and clustering are effective methods to observe summary of opinions. In order to decrease users' effort in examining vast amount of UGC, we proposed an interactive visualization system that presents sentiment words with aspects based on natural language processing and sentiment lexicon. This paper also proposes to apply latent Dirichlet allocation (LDA) to cluster reviews into several topics in order to improve understandability of visualization. This paper explains the developed system with case studies.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115618365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 23
Finding Event Videos via Image Search Engine 通过图像搜索引擎查找事件视频
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.78
Han Wang, Xinxiao Wu
Searching desirable events in uncontrolled videos isa challenging task. Current researches mainly focus on obtaining concepts from numerous labeled videos. But it is time consumingand labor expensive to collect a large amount of required labeled videos to model events under various circumstances. To alleviate the labeling process, we propose to learn models for videos by leveraging abundant Web images which contains a rich source of information with many events taken under various conditions and roughly annotated. However, knowledge from the Web is noisy and diverse, brute force knowledge transfer may hurt the retrieval performance. To address such negative transfer problem, we propose a novel Joint Group Weighting Learning (JGWL) framework to leverage different but related groups of knowledge (source domain) queried from the Web image searching engine to real-world videos (target domain). Under this framework, weights of different groups are learned in a joint optimization framework, and each weight represents how contributive the corresponding image group is to the knowledge transferred to the videos. Moreover, to deal with the feature distribution mismatching between video feature space and image feature space, we build a common feature subspace to bridge these two heterogeneous feature spaces in an unsupervised manner. Experimental results on two challenging video datasets demonstrate that it is effective to use grouped knowledge gained from Web images for video retrieval.
在不受控制的视频中搜索理想事件是一项具有挑战性的任务。目前的研究主要集中在从大量标记视频中获取概念。但是,收集大量需要标记的视频来模拟各种情况下的事件是费时费力的。为了简化标记过程,我们建议利用大量的Web图像来学习视频模型,这些图像包含丰富的信息源,其中包含在各种条件下拍摄的许多事件,并进行了粗略的注释。然而,来自Web的知识具有噪声和多样性,暴力知识迁移可能会影响检索性能。为了解决这种负迁移问题,我们提出了一种新的联合组加权学习(JGWL)框架,利用从Web图像搜索引擎查询的不同但相关的知识组(源域)到现实世界的视频(目标域)。在该框架下,在联合优化框架中学习不同组的权重,每个权重表示相应图像组对转移到视频的知识的贡献程度。此外,为了解决视频特征空间和图像特征空间之间的特征分布不匹配问题,我们构建了一个公共特征子空间,以无监督的方式在这两个异构特征空间之间架起桥梁。在两个具有挑战性的视频数据集上的实验结果表明,利用从Web图像中获得的分组知识进行视频检索是有效的。
{"title":"Finding Event Videos via Image Search Engine","authors":"Han Wang, Xinxiao Wu","doi":"10.1109/ICDMW.2015.78","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.78","url":null,"abstract":"Searching desirable events in uncontrolled videos isa challenging task. Current researches mainly focus on obtaining concepts from numerous labeled videos. But it is time consumingand labor expensive to collect a large amount of required labeled videos to model events under various circumstances. To alleviate the labeling process, we propose to learn models for videos by leveraging abundant Web images which contains a rich source of information with many events taken under various conditions and roughly annotated. However, knowledge from the Web is noisy and diverse, brute force knowledge transfer may hurt the retrieval performance. To address such negative transfer problem, we propose a novel Joint Group Weighting Learning (JGWL) framework to leverage different but related groups of knowledge (source domain) queried from the Web image searching engine to real-world videos (target domain). Under this framework, weights of different groups are learned in a joint optimization framework, and each weight represents how contributive the corresponding image group is to the knowledge transferred to the videos. Moreover, to deal with the feature distribution mismatching between video feature space and image feature space, we build a common feature subspace to bridge these two heterogeneous feature spaces in an unsupervised manner. Experimental results on two challenging video datasets demonstrate that it is effective to use grouped knowledge gained from Web images for video retrieval.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"179 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115105087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Reporting L Most Favorite Objects in Uncertain Databases with Probabilistic Reverse Top-k Queries 用概率反向Top-k查询报告不确定数据库中L个最喜欢的对象
Pub Date : 2015-11-14 DOI: 10.1109/ICDMW.2015.47
Guoqing Xiao, Kenli Li, Keqin Li
Top-k queries are widely studied for identifying a ranked set of the k most interesting objects based on the individual user preference. Reverse top-k queries are proposed from the perspective of the product manufacturer, which are essential for manufacturers to assess the potential market and impacts of their products. However, the existing approaches for reverse top-k queries are all based on the assumption that the underlying data are exact. Due to the intrinsic differences between uncertain and certain data, these methods are designed only in certain databases and cannot be applied to uncertain case directly. Motivated by this, in this paper, we firstly model the probabilistic reverse top-k queries in the context of uncertain data. Moreover, we formulate the challenging problem of processing queries that report l most favorite objects to users, where impact factor of an object is defined as the cardinality of the probabilistic reverse top-k query result set. For speeding up the query, we exploit several properties of probabilistic threshold top-k queries and probabilistic skyline queries to reduce the solution space of this problem. In addition, an upper bound of the potential users is estimated to reduce the cost of computing the probabilistic reverse top-k queries for the candidate objects. Furthermore, effective pruning heuristics are presented to further reduce the search space of query processing. Finally, efficient query algorithms are presented seamlessly with integration of the proposed pruning strategies. Extensive experiments demonstrate the efficiency and effectiveness of our proposed algorithms with various experimental settings.
Top-k查询被广泛研究,用于根据个人用户偏好确定k个最有趣对象的排序集。从产品制造商的角度提出反向top-k查询,这对于制造商评估其产品的潜在市场和影响至关重要。然而,现有的反向top-k查询方法都是基于底层数据是精确的假设。由于不确定数据与确定数据的本质区别,这些方法仅针对特定数据库设计,不能直接应用于不确定情况。基于此,本文首先对不确定数据背景下的概率反向top-k查询进行建模。此外,我们还提出了一个具有挑战性的问题,即处理向用户报告l个最喜欢的对象的查询,其中对象的影响因子被定义为概率反向top-k查询结果集的基数。为了加快查询速度,我们利用了概率阈值top-k查询和概率天际线查询的一些特性来减小该问题的解空间。此外,还估计了潜在用户的上限,以减少计算候选对象的概率反向top-k查询的成本。在此基础上,提出了有效的剪枝启发式算法,进一步缩小查询处理的搜索空间。最后,结合所提出的修剪策略,无缝地提出了高效的查询算法。大量的实验证明了我们提出的算法在各种实验设置下的效率和有效性。
{"title":"Reporting L Most Favorite Objects in Uncertain Databases with Probabilistic Reverse Top-k Queries","authors":"Guoqing Xiao, Kenli Li, Keqin Li","doi":"10.1109/ICDMW.2015.47","DOIUrl":"https://doi.org/10.1109/ICDMW.2015.47","url":null,"abstract":"Top-k queries are widely studied for identifying a ranked set of the k most interesting objects based on the individual user preference. Reverse top-k queries are proposed from the perspective of the product manufacturer, which are essential for manufacturers to assess the potential market and impacts of their products. However, the existing approaches for reverse top-k queries are all based on the assumption that the underlying data are exact. Due to the intrinsic differences between uncertain and certain data, these methods are designed only in certain databases and cannot be applied to uncertain case directly. Motivated by this, in this paper, we firstly model the probabilistic reverse top-k queries in the context of uncertain data. Moreover, we formulate the challenging problem of processing queries that report l most favorite objects to users, where impact factor of an object is defined as the cardinality of the probabilistic reverse top-k query result set. For speeding up the query, we exploit several properties of probabilistic threshold top-k queries and probabilistic skyline queries to reduce the solution space of this problem. In addition, an upper bound of the potential users is estimated to reduce the cost of computing the probabilistic reverse top-k queries for the candidate objects. Furthermore, effective pruning heuristics are presented to further reduce the search space of query processing. Finally, efficient query algorithms are presented seamlessly with integration of the proposed pruning strategies. Extensive experiments demonstrate the efficiency and effectiveness of our proposed algorithms with various experimental settings.","PeriodicalId":192888,"journal":{"name":"2015 IEEE International Conference on Data Mining Workshop (ICDMW)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115286707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
期刊
2015 IEEE International Conference on Data Mining Workshop (ICDMW)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1