2018 IEEE International Conference on Data Mining (ICDM)最新文献

英文中文

Estimating Latent Relative Labeling Importances for Multi-label Learning 估计多标签学习的潜在相对标签重要性

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00127

Shuo He, Lei Feng, Li Li

In multi-label learning, each instance is associated with multiple labels simultaneously. Most of the existing approaches directly treat each label in a crisp manner, i.e. one class label is either relevant or irrelevant to the instance. However, the latent relative importance of each relevant label is regrettably ignored. In this paper, we propose a novel multi-label learning approach that aims to estimate the latent labeling importances while training the inductive model simultaneously. Specifically, we present a biconvex formulation with both instance and label graph regularization, and solve this problem using an alternating way. On the one hand, the inductive model is trained by minimizing the least squares loss of fitting the latent relative labeling importances. On the other hand, the latent relative labeling importances are estimated by the modeling outputs via a specially constrained label propagation procedure. Through the mutual adaption of the inductive model training and the specially constrained label propagation, an effective multi-label learning model is therefore built by optimally estimating the latent relative labeling importances. Extensive experimental results clearly show the effectiveness of the proposed approach.

在多标签学习中，每个实例同时与多个标签相关联。大多数现有方法直接以一种清晰的方式处理每个标签，即一个类标签与实例相关或不相关。然而，令人遗憾的是，每个相关标签的潜在相对重要性被忽视了。在本文中，我们提出了一种新的多标签学习方法，旨在在训练归纳模型的同时估计潜在标签重要性。具体来说，我们提出了一个实例图正则化和标签图正则化的双凸公式，并用交替的方法解决了这个问题。一方面，通过最小化拟合潜在相对标记重要性的最小二乘损失来训练归纳模型。另一方面，通过一个特殊约束的标签传播过程，通过建模输出来估计潜在的相对标签重要性。通过归纳模型训练和特殊约束标签传播的相互适应，通过最优估计潜在相对标签重要性构建有效的多标签学习模型。大量的实验结果清楚地表明了该方法的有效性。

{"title":"Estimating Latent Relative Labeling Importances for Multi-label Learning","authors":"Shuo He, Lei Feng, Li Li","doi":"10.1109/ICDM.2018.00127","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00127","url":null,"abstract":"In multi-label learning, each instance is associated with multiple labels simultaneously. Most of the existing approaches directly treat each label in a crisp manner, i.e. one class label is either relevant or irrelevant to the instance. However, the latent relative importance of each relevant label is regrettably ignored. In this paper, we propose a novel multi-label learning approach that aims to estimate the latent labeling importances while training the inductive model simultaneously. Specifically, we present a biconvex formulation with both instance and label graph regularization, and solve this problem using an alternating way. On the one hand, the inductive model is trained by minimizing the least squares loss of fitting the latent relative labeling importances. On the other hand, the latent relative labeling importances are estimated by the modeling outputs via a specially constrained label propagation procedure. Through the mutual adaption of the inductive model training and the specially constrained label propagation, an effective multi-label learning model is therefore built by optimally estimating the latent relative labeling importances. Extensive experimental results clearly show the effectiveness of the proposed approach.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126008299","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

A Knowledge-Enhanced Deep Recommendation Framework Incorporating GAN-Based Models 基于gan模型的知识增强深度推荐框架

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00187

Deqing Yang, Zikai Guo, Ziyi Wang, Juyang Jiang, Yanghua Xiao, Wei Wang

Although many researchers of recommender systems have noted that encoding user-item interactions based on DNNs promotes the performance of collaborative filtering, they ignore that embedding the latent features collected from external sources, e.g., knowledge graphs (KGs), is able to produce more precise recommendation results. Furthermore, CF-based models are still vulnerable to the scenarios of sparse known user-item interactions. In this paper, towards movie recommendation, we propose a novel knowledge-enhanced deep recommendation framework incorporating GAN-based models to acquire robust performance. Specifically, our framework first imports various feature embeddings distilled not only from user-movie interactions, but also from KGs and tags, to constitute initial user/movie representations. Then, user/movie representations are fed into a generator and a discriminator simultaneously to learn final optimal representations through adversarial training, which are conducive to generating better recommendation results. The extensive experiments on a real Douban dataset demonstrate our framework's superiority over some state-of-the-art recommendation models, especially in the scenarios of sparse observed user-movie interactions.

尽管许多推荐系统的研究人员已经注意到，基于dnn编码用户-物品交互可以提高协同过滤的性能，但他们忽略了嵌入从外部来源收集的潜在特征，例如知识图(KGs)，能够产生更精确的推荐结果。此外，基于cf的模型仍然容易受到稀疏已知用户-项目交互场景的影响。在本文中，针对电影推荐，我们提出了一种新的基于gan模型的知识增强深度推荐框架，以获得鲁棒性。具体来说，我们的框架首先导入各种特征嵌入，这些特征嵌入不仅来自用户与电影的交互，还来自KGs和标签，以构成初始的用户/电影表示。然后，将用户/电影表示同时馈送到生成器和判别器中，通过对抗性训练学习最终的最优表示，有利于生成更好的推荐结果。在豆瓣真实数据集上的大量实验表明，我们的框架优于一些最先进的推荐模型，特别是在稀疏观察到的用户-电影交互场景中。

{"title":"A Knowledge-Enhanced Deep Recommendation Framework Incorporating GAN-Based Models","authors":"Deqing Yang, Zikai Guo, Ziyi Wang, Juyang Jiang, Yanghua Xiao, Wei Wang","doi":"10.1109/ICDM.2018.00187","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00187","url":null,"abstract":"Although many researchers of recommender systems have noted that encoding user-item interactions based on DNNs promotes the performance of collaborative filtering, they ignore that embedding the latent features collected from external sources, e.g., knowledge graphs (KGs), is able to produce more precise recommendation results. Furthermore, CF-based models are still vulnerable to the scenarios of sparse known user-item interactions. In this paper, towards movie recommendation, we propose a novel knowledge-enhanced deep recommendation framework incorporating GAN-based models to acquire robust performance. Specifically, our framework first imports various feature embeddings distilled not only from user-movie interactions, but also from KGs and tags, to constitute initial user/movie representations. Then, user/movie representations are fed into a generator and a discriminator simultaneously to learn final optimal representations through adversarial training, which are conducive to generating better recommendation results. The extensive experiments on a real Douban dataset demonstrate our framework's superiority over some state-of-the-art recommendation models, especially in the scenarios of sparse observed user-movie interactions.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128500808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33

Mixed Bagging: A Novel Ensemble Learning Framework for Supervised Classification Based on Instance Hardness 混合Bagging:一种基于实例硬度的监督分类集成学习框架

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00137

A. Kabir, Carolina Ruiz, S. A. Alvarez

We introduce a novel ensemble learning framework for supervised classification. Our proposed framework, mixed bagging, is a form of bootstrap aggregating (bagging) in which the sampling process takes into account the classification hardness of the training instances. The classification hardness, or simply hardness, of an instance is defined as the probability that the instance will be misclassified by a classification model built from the remaining instances in the training set. We incorporate instance hardness into the bagging process by varying the sampling probability of each instance based on its estimated hardness. Bootstraps of differing hardness can be created in this way by over-representing, under-representing and equally representing harder instances. This results in a diverse committee of classifiers induced from the bootstraps, whose individual outputs can be aggregated to achieve a final class prediction. We propose two versions of mixed bagging – one where the bootstraps are grouped as easy, regular or hard, with all bootstraps in one group having the same hardness; and the other where the hardness of bootstraps change gradually from one iteration to the next. We have tested our system on 47 publicly available binary classification problems using C4.5 Decision Trees of varying depth as base learners. We find that the proposed mixed bagging methods perform better than traditional bagging and weighted bagging (wagging) regardless of the base learner. The proposed method also outperforms AdaBoost when the base learner consists of deeper decision trees. We examine the results of mixed bagging in terms of bias-variance decomposition and find that mixed bagging is better than AdaBoost at reducing variance and better than traditional bagging at reducing inductive bias.

提出了一种新的监督分类集成学习框架。我们提出的框架，混合装袋，是一种自举聚合(装袋)形式，其中采样过程考虑了训练实例的分类硬度。一个实例的分类硬度，或者简单的硬度，被定义为该实例被基于训练集中剩余实例构建的分类模型错误分类的概率。我们将实例硬度纳入装袋过程，通过根据其估计硬度改变每个实例的抽样概率。不同硬度的引导可以通过过度表示、不足表示和平均表示更困难的实例来创建。这导致从自举中归纳出一个不同的分类器委员会，其单独的输出可以被聚合以实现最终的类别预测。我们提出了两种版本的混合装袋——一种是将引导分组为容易、规则或困难，所有引导在一组中具有相同的硬度;另一种情况是，从一个迭代到下一个迭代，自举的硬度逐渐改变。我们使用不同深度的C4.5决策树作为基础学习器，在47个公开的二元分类问题上测试了我们的系统。我们发现，无论基础学习器是什么，所提出的混合装袋方法都比传统装袋和加权装袋(摇袋)表现得更好。当基础学习器由更深层次的决策树组成时，所提出的方法也优于AdaBoost。我们从偏差-方差分解的角度检验了混合套袋的结果，发现混合套袋在减少方差方面优于AdaBoost，在减少归纳偏差方面优于传统套袋。

{"title":"Mixed Bagging: A Novel Ensemble Learning Framework for Supervised Classification Based on Instance Hardness","authors":"A. Kabir, Carolina Ruiz, S. A. Alvarez","doi":"10.1109/ICDM.2018.00137","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00137","url":null,"abstract":"We introduce a novel ensemble learning framework for supervised classification. Our proposed framework, mixed bagging, is a form of bootstrap aggregating (bagging) in which the sampling process takes into account the classification hardness of the training instances. The classification hardness, or simply hardness, of an instance is defined as the probability that the instance will be misclassified by a classification model built from the remaining instances in the training set. We incorporate instance hardness into the bagging process by varying the sampling probability of each instance based on its estimated hardness. Bootstraps of differing hardness can be created in this way by over-representing, under-representing and equally representing harder instances. This results in a diverse committee of classifiers induced from the bootstraps, whose individual outputs can be aggregated to achieve a final class prediction. We propose two versions of mixed bagging – one where the bootstraps are grouped as easy, regular or hard, with all bootstraps in one group having the same hardness; and the other where the hardness of bootstraps change gradually from one iteration to the next. We have tested our system on 47 publicly available binary classification problems using C4.5 Decision Trees of varying depth as base learners. We find that the proposed mixed bagging methods perform better than traditional bagging and weighted bagging (wagging) regardless of the base learner. The proposed method also outperforms AdaBoost when the base learner consists of deeper decision trees. We examine the results of mixed bagging in terms of bias-variance decomposition and find that mixed bagging is better than AdaBoost at reducing variance and better than traditional bagging at reducing inductive bias.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128532418","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Defending Against Adversarial Samples Without Security through Obscurity 通过模糊防御对抗性样本

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00029

Wenbo Guo, Qinglong Wang, Kaixuan Zhang, Alexander Ororbia, Sui Huang, Xue Liu, C. Lee Giles, Lin Lin, Xinyu Xing

It has been recently shown that deep neural networks (DNNs) are susceptible to a particular type of attack that exploits a fundamental flaw in their design. This attack consists of generating particular synthetic examples referred to as adversarial samples. These samples are constructed by slightly manipulating real data-points that change "fool" the original DNN model, forcing it to misclassify previously correctly classified samples with high confidence. Many believe addressing this flaw is essential for DNNs to be used in critical applications such as cyber security. Previous work has shown that learning algorithms that enhance the robustness of DNN models all use the tactic of "security through obscurity". This means that security can be guaranteed only if one can obscure the learning algorithms from adversaries. Once the learning technique is disclosed, DNNs protected by these defense mechanisms are still susceptible to adversarial samples. In this work, we investigate by examining how previous research dealt with this and propose a generic approach to enhance a DNN's resistance to adversarial samples. More specifically, our approach integrates a data transformation module with a DNN, making it robust even if we reveal the underlying learning algorithm. To demonstrate the generality of our proposed approach and its potential for handling cyber security applications, we evaluate our method and several other existing solutions on datasets publicly available, such as a large scale malware dataset and MNIST and IMDB datasets. Our results indicate that our approach typically provides superior classification performance and robustness to attacks compared with state-of-art solutions.

最近有研究表明，深度神经网络(dnn)容易受到一种特殊类型的攻击，这种攻击利用了其设计中的一个基本缺陷。这种攻击包括生成称为对抗性样本的特定合成示例。这些样本是通过稍微操纵真实数据点来构建的，这些数据点会“愚弄”原始DNN模型，迫使它以高置信度对先前正确分类的样本进行错误分类。许多人认为，解决这一缺陷对于dnn在网络安全等关键应用中使用至关重要。先前的研究表明，增强深度神经网络模型鲁棒性的学习算法都使用了“通过模糊实现安全”的策略。这意味着只有当一个人能够模糊学习算法，使其不被对手发现时，安全性才能得到保证。一旦学习技术被披露，受这些防御机制保护的dnn仍然容易受到对抗性样本的影响。在这项工作中，我们通过检查以前的研究如何处理这个问题，并提出了一种通用的方法来增强DNN对对抗性样本的抵抗力。更具体地说，我们的方法将数据转换模块与深度神经网络集成在一起，即使我们揭示了底层学习算法，也使其具有鲁棒性。为了证明我们提出的方法的通用性及其处理网络安全应用程序的潜力，我们评估了我们的方法和其他几个现有的数据集公开可用的解决方案，如大规模恶意软件数据集和MNIST和IMDB数据集。我们的结果表明，与最先进的解决方案相比，我们的方法通常提供了更好的分类性能和对攻击的鲁棒性。

{"title":"Defending Against Adversarial Samples Without Security through Obscurity","authors":"Wenbo Guo, Qinglong Wang, Kaixuan Zhang, Alexander Ororbia, Sui Huang, Xue Liu, C. Lee Giles, Lin Lin, Xinyu Xing","doi":"10.1109/ICDM.2018.00029","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00029","url":null,"abstract":"It has been recently shown that deep neural networks (DNNs) are susceptible to a particular type of attack that exploits a fundamental flaw in their design. This attack consists of generating particular synthetic examples referred to as adversarial samples. These samples are constructed by slightly manipulating real data-points that change \"fool\" the original DNN model, forcing it to misclassify previously correctly classified samples with high confidence. Many believe addressing this flaw is essential for DNNs to be used in critical applications such as cyber security. Previous work has shown that learning algorithms that enhance the robustness of DNN models all use the tactic of \"security through obscurity\". This means that security can be guaranteed only if one can obscure the learning algorithms from adversaries. Once the learning technique is disclosed, DNNs protected by these defense mechanisms are still susceptible to adversarial samples. In this work, we investigate by examining how previous research dealt with this and propose a generic approach to enhance a DNN's resistance to adversarial samples. More specifically, our approach integrates a data transformation module with a DNN, making it robust even if we reveal the underlying learning algorithm. To demonstrate the generality of our proposed approach and its potential for handling cyber security applications, we evaluate our method and several other existing solutions on datasets publicly available, such as a large scale malware dataset and MNIST and IMDB datasets. Our results indicate that our approach typically provides superior classification performance and robustness to attacks compared with state-of-art solutions.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127102272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Uncluttered Domain Sub-Similarity Modeling for Transfer Regression 转移回归的整洁域子相似性建模

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00178

Pengfei Wei, Ramón Sagarna, Yiping Ke, Y. Ong

Transfer covariance functions, which can model domain similarities and adaptively control the knowledge transfer across domains, are widely used in Gaussian process (GP) based transfer learning. We focus on regression problems in a black-box learning scenario, and study a family of rather general transfer covariance functions, T_*, that can model the similarity heterogeneity of domains through multiple kernel learning. A necessary and sufficient condition that (i) validates GPs using T_* for any data and (ii) provides semantic interpretations is given. Moreover, building on this condition, we propose a computationally inexpensive model learning rule that can explicitly capture different sub-similarities of domains. Extensive experiments on one synthetic dataset and four real-world datasets demonstrate the effectiveness of the learned GP on the sub-similarity capture and the transfer performance.

迁移协方差函数在基于高斯过程的迁移学习中得到了广泛的应用，它可以对领域相似性进行建模并自适应地控制跨领域的知识迁移。我们关注黑箱学习场景中的回归问题，并研究了一组相当通用的传递协方差函数T_*，它可以通过多核学习来模拟域的相似性异质性。给出了(i)对任意数据使用T_*验证gp和(ii)提供语义解释的充分必要条件。此外，在此条件下，我们提出了一种计算成本低廉的模型学习规则，该规则可以显式地捕获域的不同子相似性。在一个合成数据集和四个真实数据集上的大量实验证明了学习GP在子相似度捕获和传输性能方面的有效性。

引用次数: 8

Utilizing In-store Sensors for Revisit Prediction 利用店内传感器进行重访预测

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00037

Sundong Kim, Jae-Gil Lee

Predicting revisit intention is very important for the retail industry. Converting first-time visitors to repeating customers is of prime importance for high profitability. However, revisit analyses for offline retail businesses have been conducted on a small scale in previous studies, mainly because their methodologies have mostly relied on manually collected data. With the help of noninvasive monitoring, analyzing a customer's behavior inside stores has become possible, and revisit statistics are available from the large portion of customers who turn on their Wi-Fi or Bluetooth devices. Using Wi-Fi fingerprinting data from ZOYI, we propose a systematic framework to predict the revisit intention of customers using only signals received from their mobile devices. Using data collected from seven flagship stores in downtown Seoul, we achieved 67-80% prediction accuracy for all customers and 64-72% prediction accuracy for first-time visitors. The performance improvement by considering customer mobility was 4.7-24.3%. Our framework showed a feasibility to predict revisits using customer mobility from Wi-Fi signals, that have not been considered in previous marketing studies. Toward this goal, we examine the effect of data collection period on the prediction performance and present the robustness of our model on missing customers. Finally, we discuss the difficulties of securing prediction accuracy with the features that look promising but turn out to be unsatisfactory.

预测回访意向对于零售业来说是非常重要的。将首次访问者转化为回头客对高盈利至关重要。然而，在之前的研究中，对线下零售企业的重访分析规模较小，主要是因为它们的方法大多依赖于人工收集的数据。在非侵入性监控的帮助下，分析顾客在商店内的行为已经成为可能，并且可以从大部分打开Wi-Fi或蓝牙设备的顾客那里获得重新访问的统计数据。利用ZOYI的Wi-Fi指纹数据，我们提出了一个系统框架，仅使用从其移动设备接收的信号来预测客户的重访意图。利用从首尔市中心的7家旗舰店收集的数据，我们对所有顾客的预测准确率达到67-80%，对首次光顾的顾客的预测准确率达到64-72%。考虑客户移动性的绩效提升率为4.7% -24.3%。我们的框架显示了利用Wi-Fi信号的客户移动性来预测回访的可行性，这在以前的营销研究中没有被考虑到。为了实现这一目标，我们研究了数据收集周期对预测性能的影响，并展示了我们的模型对缺失客户的鲁棒性。最后，我们讨论了确保预测准确性的困难，这些特征看起来很有希望，但结果却不令人满意。

{"title":"Utilizing In-store Sensors for Revisit Prediction","authors":"Sundong Kim, Jae-Gil Lee","doi":"10.1109/ICDM.2018.00037","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00037","url":null,"abstract":"Predicting revisit intention is very important for the retail industry. Converting first-time visitors to repeating customers is of prime importance for high profitability. However, revisit analyses for offline retail businesses have been conducted on a small scale in previous studies, mainly because their methodologies have mostly relied on manually collected data. With the help of noninvasive monitoring, analyzing a customer's behavior inside stores has become possible, and revisit statistics are available from the large portion of customers who turn on their Wi-Fi or Bluetooth devices. Using Wi-Fi fingerprinting data from ZOYI, we propose a systematic framework to predict the revisit intention of customers using only signals received from their mobile devices. Using data collected from seven flagship stores in downtown Seoul, we achieved 67-80% prediction accuracy for all customers and 64-72% prediction accuracy for first-time visitors. The performance improvement by considering customer mobility was 4.7-24.3%. Our framework showed a feasibility to predict revisits using customer mobility from Wi-Fi signals, that have not been considered in previous marketing studies. Toward this goal, we examine the effect of data collection period on the prediction performance and present the robustness of our model on missing customers. Finally, we discuss the difficulties of securing prediction accuracy with the features that look promising but turn out to be unsatisfactory.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131979432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Active Learning on Heterogeneous Information Networks: A Multi-armed Bandit Approach 异构信息网络上的主动学习:一种多臂强盗方法

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00184

Doris Xin, Ahmed El-Kishky, De Liao, Brandon Norick, Jiawei Han

Active learning exploits inherent structures in the unlabeled data to minimize the number of labels required to train an accurate model. It enables effective machine learning in applications with high labeling cost, such as document classification and drug response prediction. We investigate active learning on heterogeneous information networks, with the objective of obtaining accurate node classifications while minimizing the number of labeled nodes. Our proposed algorithm harnesses a multi-armed bandit (MAB) algorithm to determine network structures that identify the most important nodes to the classification task, accounting for node types and without assuming label assortativity. Evaluations on real-world network classification tasks demonstrate that our algorithm outperforms existing methods independent of the underlying classification model.

主动学习利用未标记数据中的固有结构来最小化训练准确模型所需的标签数量。它可以在高标签成本的应用中实现有效的机器学习，例如文档分类和药物反应预测。我们研究了异构信息网络上的主动学习，目的是在最小化标记节点数量的同时获得准确的节点分类。我们提出的算法利用多臂强盗(MAB)算法来确定识别分类任务中最重要节点的网络结构，考虑节点类型且不假设标签分类性。对真实网络分类任务的评估表明，我们的算法优于现有的独立于底层分类模型的方法。

引用次数: 7

Online CP Decomposition for Sparse Tensors 稀疏张量的在线CP分解

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00202

Shuo Zhou, S. Erfani, J. Bailey

Tensor decomposition techniques such as CANDECOMP/PARAFAC (CP) decomposition have achieved great success across a range of scientific fields. They have been traditionally applied to dense, static data. However, today's datasets are often highly sparse and dynamically changing over time. Traditional decomposition methods such as Alternating Least Squares (ALS) cannot be easily applied to sparse tensors, due to poor efficiency. Furthermore, existing online tensor decomposition methods mostly target dense tensors, and thus also encounter significant scalability issues for sparse data. To address this gap, we propose a new incremental algorithm for tracking the CP decompositions of online sparse tensors on-the-fly. Experiments on nine real-world datasets show that our algorithm is able to produce quality decompositions of comparable quality to the most accurate algorithm, ALS, whilst at the same time achieving speed improvements of up to 250 times and 100 times less memory.

张量分解技术，如CANDECOMP/PARAFAC (CP)分解已经在一系列科学领域取得了巨大的成功。它们传统上应用于密集的静态数据。然而，今天的数据集通常是高度稀疏的，并且随着时间的推移而动态变化。传统的分解方法，如交替最小二乘(ALS)，由于效率不高，不能很容易地应用于稀疏张量。此外，现有的在线张量分解方法大多针对密集张量，因此也遇到了稀疏数据的显著可扩展性问题。为了解决这个问题，我们提出了一种新的增量算法来跟踪在线稀疏张量的动态CP分解。在9个真实数据集上的实验表明，我们的算法能够产生与最精确的ALS算法相当的质量分解，同时实现高达250倍的速度改进和100倍的内存减少。

引用次数: 17

Title Page i 第1页

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/icdm.2018.00001

引用次数: 0

Semi-Supervised Anomaly Detection with an Application to Water Analytics 半监督异常检测在水分析中的应用

2018 IEEE International Conference on Data Mining (ICDM)

Pub Date : 2018-11-01 DOI: 10.1109/ICDM.2018.00068

Vincent Vercruyssen, Wannes Meert, Gust Verbruggen, Koen Maes, Ruben Baumer, Jesse Davis

Nowadays, all aspects of a production process are continuously monitored and visualized in a dashboard. Equipment is monitored using a variety of sensors, natural resource usage is tracked, and interventions are recorded. In this context, a common task is to identify anomalous behavior from the time series data generated by sensors. As manually analyzing such data is laborious and expensive, automated approaches have the potential to be much more efficient as well as cost effective. While anomaly detection could be posed as a supervised learning problem, typically this is not possible as few or no labeled examples of anomalous behavior are available and it is oftentimes infeasible or undesirable to collect them. Therefore, unsupervised approaches are commonly employed which typically identify anomalies as deviations from normal (i.e., common or frequent) behavior. However, in many real-world settings several types of normal behavior exist that occur less frequently than some anomalous behaviors. In this paper, we propose a novel constrained-clustering-based approach for anomaly detection that works in both an unsupervised and semi-supervised setting. Starting from an unlabeled data set, the approach is able to gradually incorporate expert-provided feedback to improve its performance. We evaluated our approach on real-world water monitoring time series data from supermarkets in collaboration with Colruyt Group, one of Belgiums largest retail companies. Empirically, we found that our approach outperforms the current detection system as well as several other baselines. Our system is currently deployed and used by the company to analyze water usage for 20 stores on a daily basis.

如今，生产过程的所有方面都在仪表板上持续监控和可视化。使用各种传感器监测设备，跟踪自然资源的使用情况，并记录干预措施。在这种情况下，一个常见的任务是从传感器生成的时间序列数据中识别异常行为。由于手动分析此类数据既费力又昂贵，因此自动化方法具有更大的效率和成本效益的潜力。虽然异常检测可以作为一个监督学习问题，但通常这是不可能的，因为可用的异常行为的标记示例很少或没有，并且通常不可行或不希望收集它们。因此，通常采用无监督方法，它通常将异常识别为偏离正常(即，常见或频繁)的行为。然而，在许多现实世界的设置中，存在几种类型的正常行为，它们的发生频率低于一些异常行为。在本文中，我们提出了一种新的基于约束聚类的异常检测方法，该方法适用于无监督和半监督环境。从一个未标记的数据集开始，该方法能够逐渐纳入专家提供的反馈以提高其性能。我们与比利时最大的零售公司之一Colruyt集团合作，对来自超市的真实水监测时间序列数据进行了评估。根据经验，我们发现我们的方法优于当前的检测系统以及其他几个基线。我们的系统目前被公司部署并用于分析20家商店的日常用水量。

{"title":"Semi-Supervised Anomaly Detection with an Application to Water Analytics","authors":"Vincent Vercruyssen, Wannes Meert, Gust Verbruggen, Koen Maes, Ruben Baumer, Jesse Davis","doi":"10.1109/ICDM.2018.00068","DOIUrl":"https://doi.org/10.1109/ICDM.2018.00068","url":null,"abstract":"Nowadays, all aspects of a production process are continuously monitored and visualized in a dashboard. Equipment is monitored using a variety of sensors, natural resource usage is tracked, and interventions are recorded. In this context, a common task is to identify anomalous behavior from the time series data generated by sensors. As manually analyzing such data is laborious and expensive, automated approaches have the potential to be much more efficient as well as cost effective. While anomaly detection could be posed as a supervised learning problem, typically this is not possible as few or no labeled examples of anomalous behavior are available and it is oftentimes infeasible or undesirable to collect them. Therefore, unsupervised approaches are commonly employed which typically identify anomalies as deviations from normal (i.e., common or frequent) behavior. However, in many real-world settings several types of normal behavior exist that occur less frequently than some anomalous behaviors. In this paper, we propose a novel constrained-clustering-based approach for anomaly detection that works in both an unsupervised and semi-supervised setting. Starting from an unlabeled data set, the approach is able to gradually incorporate expert-provided feedback to improve its performance. We evaluated our approach on real-world water monitoring time series data from supermarkets in collaboration with Colruyt Group, one of Belgiums largest retail companies. Empirically, we found that our approach outperforms the current detection system as well as several other baselines. Our system is currently deployed and used by the company to analyze water usage for 20 stores on a daily basis.","PeriodicalId":286444,"journal":{"name":"2018 IEEE International Conference on Data Mining (ICDM)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116742829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 59

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2018 IEEE International Conference on Data Mining (ICDM)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀