2020 International Conference on Data Mining Workshops (ICDMW)最新文献

英文中文

Towards an Internal Evaluation Measure for Arbitrarily Oriented Subspace Clustering 一种任意方向子空间聚类的内部评价方法

2020 International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00049

Daniyal Kazempour, Peer Kröger, T. Seidl

In the setting of unsupervised machine learning, especially in clustering tasks, the evaluation of either novel algorithms or the assessment of a clustering of novel data is challenging. While mostly in the literature the evaluation of new methods is performed on labelled data, there are cases where no labels are at our disposal. In other cases we may not want to trust the “ground truth” labels. In general there exists a spectrum of so called internal evaluation measures in the literature. Each of the measures is mostly specialized towards a specific clustering model. The model of arbitrarily oriented subspace clusters is a more recent one. To the best of our knowledge there exist at the current time no internal evaluation measures tailored at assessing this particular type of clusterings. In this work we present the first internal quality measures for arbitrarily oriented subspace clusterings namely the normalized projected energy (NPE) and subspace compactness score (SCS). The results from the experiments show that especially NPE is capable of assessing clusterings by considering archetypical properties of arbitrarily oriented subspace clustering.

在无监督机器学习的环境中，特别是在聚类任务中，对新算法的评估或对新数据聚类的评估是具有挑战性的。虽然在大多数文献中，新方法的评估是在标记的数据上进行的，但在某些情况下，我们没有标签。在其他情况下，我们可能不想相信“基本事实”的标签。一般来说，在文献中存在一系列所谓的内部评估措施。每个度量主要针对特定的聚类模型。任意定向子空间簇的模型是一个较新的模型。据我们所知，目前还没有专门用于评估这种特定类型群集的内部评估措施。在这项工作中，我们提出了任意方向子空间聚类的第一个内部质量度量，即归一化投影能量(NPE)和子空间紧度分数(SCS)。实验结果表明，特别是NPE能够通过考虑任意方向子空间聚类的原型特性来评估聚类。

{"title":"Towards an Internal Evaluation Measure for Arbitrarily Oriented Subspace Clustering","authors":"Daniyal Kazempour, Peer Kröger, T. Seidl","doi":"10.1109/ICDMW51313.2020.00049","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00049","url":null,"abstract":"In the setting of unsupervised machine learning, especially in clustering tasks, the evaluation of either novel algorithms or the assessment of a clustering of novel data is challenging. While mostly in the literature the evaluation of new methods is performed on labelled data, there are cases where no labels are at our disposal. In other cases we may not want to trust the “ground truth” labels. In general there exists a spectrum of so called internal evaluation measures in the literature. Each of the measures is mostly specialized towards a specific clustering model. The model of arbitrarily oriented subspace clusters is a more recent one. To the best of our knowledge there exist at the current time no internal evaluation measures tailored at assessing this particular type of clusterings. In this work we present the first internal quality measures for arbitrarily oriented subspace clusterings namely the normalized projected energy (NPE) and subspace compactness score (SCS). The results from the experiments show that especially NPE is capable of assessing clusterings by considering archetypical properties of arbitrarily oriented subspace clustering.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125296323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

$mu-text{cf}2text{vec}$: Representation Learning for Personalized Algorithm Selection in Recommender Systems $mu-text{cf}2text{vec}$:推荐系统中个性化算法选择的表示学习

2020 International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00034

Tomas Sousa Pereira, T. Cunha, C. Soares

Collaborative Filtering (CF) has become the standard approach to solve recommendation systems problems. Collaborative Filtering algorithms try to make predictions about interests of a user by collecting the personal interests from multiple users. There are multiple CF algorithms, each one of them with its own biases. It is the Machine Learning practitioner that has to choose the best algorithm for each task beforehand. In Recommender Systems, different algorithms have different performance for different users within the same dataset. Meta Learning has been used to choose the best algorithm for a given problem. Meta Learning is usually applied to select algorithms for a whole dataset. Adapting it to select the to the algorithm for a single user in a RS involves several challenges. The most important is the design of the metafeatures which, in typical meta learning, characterize datasets while here, they must characterize a single user. This work presents a new meta-learning based framework named $mu-mathbf{cf}2mathbf{vec}$ to select the best algorithm for each user. We propose using Representation Learning techniques to extract the metafeatures. Representation Learning tries to extract representations that can be reused in other learning tasks. In this work we also implement the framework using different RL techniques to evaluate which one can be more useful to solve this task. In the meta level, the meta learning model will use the metafeatures to extract knowledge that will be used to predict the best algorithm for each user. We evaluated an implementation of this framework using MovieLens 20M dataset. Our implementation achieved consistent gains in the meta level, however, in the base level we only achieved marginal gains.

协同过滤(CF)已经成为解决推荐系统问题的标准方法。协同过滤算法试图通过收集多个用户的个人兴趣来预测用户的兴趣。有多种CF算法，每一种都有自己的偏见。机器学习从业者必须事先为每个任务选择最佳算法。在推荐系统中，不同的算法对同一数据集中的不同用户具有不同的性能。元学习已被用于为给定问题选择最佳算法。元学习通常用于为整个数据集选择算法。使其适应于RS中单个用户的算法选择涉及几个挑战。最重要的是元特征的设计，在典型的元学习中，元特征表征数据集，而在这里，它们必须表征单个用户。这项工作提出了一个新的基于元学习的框架$mu-mathbf{cf}2mathbf{vec}$，为每个用户选择最佳算法。我们建议使用表征学习技术来提取元特征。表示学习试图提取可以在其他学习任务中重用的表示。在这项工作中，我们还使用不同的强化学习技术来实现框架，以评估哪一种技术对解决此任务更有用。在元层面，元学习模型将使用元特征来提取知识，这些知识将用于预测每个用户的最佳算法。我们使用MovieLens 20M数据集评估了该框架的实现。我们的实现在元级别上获得了一致的收益，然而，在基础级别上我们只获得了边际收益。

{"title":"$mu-text{cf}2text{vec}$: Representation Learning for Personalized Algorithm Selection in Recommender Systems","authors":"Tomas Sousa Pereira, T. Cunha, C. Soares","doi":"10.1109/ICDMW51313.2020.00034","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00034","url":null,"abstract":"Collaborative Filtering (CF) has become the standard approach to solve recommendation systems problems. Collaborative Filtering algorithms try to make predictions about interests of a user by collecting the personal interests from multiple users. There are multiple CF algorithms, each one of them with its own biases. It is the Machine Learning practitioner that has to choose the best algorithm for each task beforehand. In Recommender Systems, different algorithms have different performance for different users within the same dataset. Meta Learning has been used to choose the best algorithm for a given problem. Meta Learning is usually applied to select algorithms for a whole dataset. Adapting it to select the to the algorithm for a single user in a RS involves several challenges. The most important is the design of the metafeatures which, in typical meta learning, characterize datasets while here, they must characterize a single user. This work presents a new meta-learning based framework named $mu-mathbf{cf}2mathbf{vec}$ to select the best algorithm for each user. We propose using Representation Learning techniques to extract the metafeatures. Representation Learning tries to extract representations that can be reused in other learning tasks. In this work we also implement the framework using different RL techniques to evaluate which one can be more useful to solve this task. In the meta level, the meta learning model will use the metafeatures to extract knowledge that will be used to predict the best algorithm for each user. We evaluated an implementation of this framework using MovieLens 20M dataset. Our implementation achieved consistent gains in the meta level, however, in the base level we only achieved marginal gains.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125390671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Unsupervised Extraction of Workplace Rights and Duties from Collective Bargaining Agreements 集体谈判协议中工作场所权利和义务的无监督提取

2020 International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00112

Elliott Ash, J. Jacobs, Bentley Macleod, S. Naidu, Dominik Stammbach

This paper describes an unsupervised legal document parser which performs a decomposition of labor union contracts into discrete assignments of rights and duties among agents of interest. We use insights from deontic logic applied to modal categories and other linguistic patterns to generate topic-specific measures of relative legal authority. We illustrate the consistency and efficiency of the pipeline by applying it to a large corpus of 35K contracts and validating the resulting outputs.

本文描述了一个无监督的法律文件解析器，它将工会合同分解为利益代理人之间的权利和义务的离散分配。我们将道义逻辑的见解应用于情态范畴和其他语言模式，以生成相对法律权威的特定主题度量。我们通过将管道应用于35K个合同的大型语料库并验证结果输出来说明管道的一致性和效率。

引用次数: 8

Efficient Distributed MST Based Clustering for Recommender Systems 基于高效分布式MST的推荐系统聚类

2020 International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00037

Ahmad Shahzad, Frans Coenen

This paper presents the Distributed Kruskal Algorithm for Minimum Spanning Tree (MST) based clustering to be used in the context of recommendation engines. The algorithm can operate over large graph data sets distributed over a number of machines. The operation of the algorithm is evaluated by comparing both the quality of the cluster configurations produced, and the accuracy of the predictions, with non-MST based clustering approaches. The results indicate that the proposed approach produces comparable recommendations at much lower storage, hence runtime, costs.

本文提出了一种基于最小生成树(MST)聚类的分布式Kruskal算法，用于推荐引擎。该算法可以处理分布在多台机器上的大型图形数据集。通过与非基于mst的聚类方法比较所产生的聚类配置的质量和预测的准确性来评估该算法的操作。结果表明，所提出的方法以更低的存储(因此运行时成本)产生可比的建议。

引用次数: 1

Improving object detection in paintings based on time contexts 改进基于时间背景的绘画对象检测

2020 International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00133

M. Marinescu, Artem Reshetnikov, J. M. López

This paper proposes a novel approach to object detection for the Cultural Heritage domain, which relies on combining Deep Learning and semantic metadata about candidate objects extracted from existing sources such as Wikidata, dictionaries, or Google NGram. Working with cultural heritage presents challenges not present in every-day images. In computer vision, object detection models are usually trained with datasets whose classes are not imaginary concepts, and have neither symbolic nor time-specific dimensions. Apart from this conceptual problem, the paintings are limited in number and represent the same concept in potentially very different styles. Finally, the metadata associated with the images is often poor or inexistent, which makes it hard to properly train a model. Our approach can improve the precision of object detection by placing the classes detected by a neural network model in time, based on the dates of their first known use. By taking into account the time of inception of objects such as the TV, cell phone, or scissors, and the appearance of some objects in the geographical space that corresponds to a painting (e.g. bananas or broccoli in 15th century Europe), we can correct and refine the detected objects based on their chronologic probability.

本文提出了一种新的文化遗产领域对象检测方法，该方法依赖于结合深度学习和从现有来源(如Wikidata、字典或Google NGram)中提取的候选对象的语义元数据。从事文化遗产的工作所面临的挑战并不存在于日常图像中。在计算机视觉中，目标检测模型通常使用数据集进行训练，这些数据集的类别不是虚构的概念，既没有符号维度，也没有特定时间的维度。除了这个概念上的问题之外，这些画作的数量有限，并且以可能非常不同的风格代表了相同的概念。最后，与图像相关的元数据通常很差或不存在，这使得很难正确训练模型。我们的方法可以通过将神经网络模型检测到的类根据其首次使用的日期及时放置，从而提高目标检测的精度。通过考虑诸如电视、手机或剪刀等物体出现的时间，以及与一幅画对应的地理空间中某些物体的出现(例如15世纪欧洲的香蕉或西兰花)，我们可以根据它们的时间概率来纠正和完善检测到的物体。

{"title":"Improving object detection in paintings based on time contexts","authors":"M. Marinescu, Artem Reshetnikov, J. M. López","doi":"10.1109/ICDMW51313.2020.00133","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00133","url":null,"abstract":"This paper proposes a novel approach to object detection for the Cultural Heritage domain, which relies on combining Deep Learning and semantic metadata about candidate objects extracted from existing sources such as Wikidata, dictionaries, or Google NGram. Working with cultural heritage presents challenges not present in every-day images. In computer vision, object detection models are usually trained with datasets whose classes are not imaginary concepts, and have neither symbolic nor time-specific dimensions. Apart from this conceptual problem, the paintings are limited in number and represent the same concept in potentially very different styles. Finally, the metadata associated with the images is often poor or inexistent, which makes it hard to properly train a model. Our approach can improve the precision of object detection by placing the classes detected by a neural network model in time, based on the dates of their first known use. By taking into account the time of inception of objects such as the TV, cell phone, or scissors, and the appearance of some objects in the geographical space that corresponds to a painting (e.g. bananas or broccoli in 15th century Europe), we can correct and refine the detected objects based on their chronologic probability.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124716779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

User Authentication Method using FIDO based Password Management for Smart Energy Environment 基于FIDO的智能能源环境密码管理用户认证方法

2020 International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00100

Hyunjin Kim, Dongseop Lee, Jaecheol Ryou

In a smart energy environment, user authentication is an essential process in smart energy. After the authentication process to verify whether the user is registered one, the user can access the smart service such as power consumption prediction, intelligent energy management. The user authentication technology has evolved to various authentication methods using ID/PW, security token and biometric information because of the diversification of the Internet of Things and social structure. Despite there are various authentication technologies, the ID/PW authentication still widely used because of low cost and convenience. However, the user using ID/PW methods should use different passwords for each service, as well as passwords that include special symbols. Moreover, it is difficult for users to remember complicated passwords and it is not easy to change passwords periodically. Therefore, in this paper, we propose the user authentication method using FIDO based password management. Through the password management, the user login the information system by password as well as biometric information using hardware security device. In addition, the method is compatible with legacy PW authentication. The proposed mechanism will enhance the security on ID/PW authentication method currently in use on most service.

在智能能源环境中，用户认证是智能能源的重要环节。通过认证过程验证用户是否为注册用户后，用户就可以访问智能业务，如功耗预测、智能能源管理等。由于物联网和社会结构的多样化，用户认证技术已经发展到使用ID/PW、安全令牌和生物特征信息等多种认证方式。尽管认证技术多种多样，但由于成本低、方便，ID/PW认证仍被广泛使用。但是，使用ID/PW方法的用户应该为每个服务使用不同的密码，以及包含特殊符号的密码。此外，用户难以记住复杂的密码，并且不容易定期更改密码。因此，本文提出了基于FIDO密码管理的用户认证方法。通过密码管理，用户通过密码和生物识别信息通过硬件安全装置登录信息系统。此外，该方法与遗留的PW身份验证兼容。该机制将提高目前大多数服务使用的ID/PW认证方法的安全性。

{"title":"User Authentication Method using FIDO based Password Management for Smart Energy Environment","authors":"Hyunjin Kim, Dongseop Lee, Jaecheol Ryou","doi":"10.1109/ICDMW51313.2020.00100","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00100","url":null,"abstract":"In a smart energy environment, user authentication is an essential process in smart energy. After the authentication process to verify whether the user is registered one, the user can access the smart service such as power consumption prediction, intelligent energy management. The user authentication technology has evolved to various authentication methods using ID/PW, security token and biometric information because of the diversification of the Internet of Things and social structure. Despite there are various authentication technologies, the ID/PW authentication still widely used because of low cost and convenience. However, the user using ID/PW methods should use different passwords for each service, as well as passwords that include special symbols. Moreover, it is difficult for users to remember complicated passwords and it is not easy to change passwords periodically. Therefore, in this paper, we propose the user authentication method using FIDO based password management. Through the password management, the user login the information system by password as well as biometric information using hardware security device. In addition, the method is compatible with legacy PW authentication. The proposed mechanism will enhance the security on ID/PW authentication method currently in use on most service.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130532392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A Deep Neural Network Approach to Tracing Paths in Cybersecurity Investigations 网络安全调查中路径追踪的深度神经网络方法

2020 International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00070

Clinton Daniel, T. Gill, A. Hevner, Matthew T. Mullarkey

Security Analysts (SAs) operating within Security Operation Centers (SOCs) conduct cybersecurity investigations on cyber events using methods which pave a measurable path. These paths serve as a source of evidence to study the transitions of the cognitive tasks performed by the SA throughout the investigation. Insight into these paths can support the observation and understanding of how to evaluate and measure the critical decisions made during an investigation such as when a SA transitions from analyzing event logs to observing threat intelligence. We propose a framework we call the Cyber Analysis Transition Framework which applies a quantitative approach for evaluating and measuring the transitions of the SA conducting cyber analysis methods. The novel approach for this framework includes the application of process mining and deep neural network output as a means for evaluating and measuring a SA's performance while conducting cybersecurity investigations.

在安全运营中心(soc)内工作的安全分析师(sa)使用铺设可测量路径的方法对网络事件进行网络安全调查。这些路径可以作为研究在整个调查过程中由SA执行的认知任务转换的证据来源。深入了解这些路径可以帮助观察和理解如何评估和度量在调查过程中做出的关键决策，例如当SA从分析事件日志转变为观察威胁情报时。我们提出了一个框架，我们称之为网络分析过渡框架，它应用定量方法来评估和测量SA进行网络分析方法的过渡。该框架的新方法包括应用过程挖掘和深度神经网络输出，作为在进行网络安全调查时评估和测量SA性能的手段。

引用次数: 0

How Shoppers Walk and Shop in a Supermarket 顾客如何在超市行走和购物

2020 International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00025

K. Yada, Ken Ishibashi, Taku Ohashi, Danhua Wang, S. Tsumoto

The purpose of this study was to classify shopping trip types based on customer path data and to identify differences in effectiveness of sales promotions. Existing studies on shopping trip types have not incorporated customer in-store behavior data as an index for classification. In this paper, we categorize customer shopping trip types into two categories of “major trip” and “fill-in trip”, and investigate the differences in the impact of sales promotions on sales effectiveness using customer path data. Impact of sales is measured by the probability of occurrence in the three processes of the purchase process, based on existing research.

本研究的目的是根据顾客路径数据对购物旅行类型进行分类，并确定促销效果的差异。现有的关于购物旅行类型的研究并没有将顾客入店行为数据作为分类的指标。在本文中，我们将顾客的购物行程类型分为“主要行程”和“填充行程”两类，并利用顾客路径数据研究促销对销售效果影响的差异。在已有研究的基础上，通过购买过程中三个过程发生的概率来衡量销售的影响。

引用次数: 1

Hybrid Learning with Teacher-student Knowledge Distillation for Recommenders 基于师生知识蒸馏的推荐混合式学习

2020 International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00040

Hangbin Zhang, R. Wong, Victor W. Chu

Latent variable models have been widely adopted by recommender systems due to the advancements of their learning scalability and performance. Recent research has focused on hybrid models. However, due to the sparsity of user and/or item data, most of these proposals have convoluted model architectures and objective functions. In particular, the latter are mostly tailored for sparse data from either user or item spaces. Although it is possible to derive an analogous model for both spaces, this makes a system overly complicated. To address this problem, we propose a deep learning based latent model called Distilled Hybrid Network (DHN) with a teacher-student learning architecture. Unlike other related work that tried to better incorporate content components to improve accuracy, we instead focus on model learning optimization. To the best of our knowledge, we are the first to employ teacher-student learning architecture for recommender systems. Experiment results show that our proposed model notably outperforms state-of-the-art approaches. We also show that our proposed architecture can be applied to existing recommender models to improve their accuracies.

隐变量模型由于其学习可扩展性和性能的提高而被推荐系统广泛采用。最近的研究集中在混合模式上。然而，由于用户和/或项目数据的稀疏性，这些建议中的大多数都有复杂的模型架构和目标函数。特别是，后者主要针对来自用户或项目空间的稀疏数据进行定制。虽然有可能为这两个空间推导出类似的模型，但这会使系统过于复杂。为了解决这个问题，我们提出了一种基于深度学习的潜在模型，称为蒸馏混合网络(DHN)，具有师生学习架构。与其他试图更好地整合内容组件以提高准确性的相关工作不同，我们专注于模型学习优化。据我们所知，我们是第一个采用师生学习架构的推荐系统。实验结果表明，我们提出的模型明显优于最先进的方法。我们还表明，我们提出的体系结构可以应用于现有的推荐模型，以提高它们的准确性。

{"title":"Hybrid Learning with Teacher-student Knowledge Distillation for Recommenders","authors":"Hangbin Zhang, R. Wong, Victor W. Chu","doi":"10.1109/ICDMW51313.2020.00040","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00040","url":null,"abstract":"Latent variable models have been widely adopted by recommender systems due to the advancements of their learning scalability and performance. Recent research has focused on hybrid models. However, due to the sparsity of user and/or item data, most of these proposals have convoluted model architectures and objective functions. In particular, the latter are mostly tailored for sparse data from either user or item spaces. Although it is possible to derive an analogous model for both spaces, this makes a system overly complicated. To address this problem, we propose a deep learning based latent model called Distilled Hybrid Network (DHN) with a teacher-student learning architecture. Unlike other related work that tried to better incorporate content components to improve accuracy, we instead focus on model learning optimization. To the best of our knowledge, we are the first to employ teacher-student learning architecture for recommender systems. Experiment results show that our proposed model notably outperforms state-of-the-art approaches. We also show that our proposed architecture can be applied to existing recommender models to improve their accuracies.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126267941","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep Fuzzy Clustering with Weighted Intra-class Variance and Extended Mutual Information Regularization 基于加权类内方差和扩展互信息正则化的深度模糊聚类

2020 International Conference on Data Mining Workshops (ICDMW)

Pub Date : 2020-11-01 DOI: 10.1109/ICDMW51313.2020.00137

Yunsheng Pang, Feiyu Chen, Sheng Huang, Yongxin Ge, Wei Wang, Taiping Zhang

Recently, many joint deep clustering methods, which simultaneously learn latent embedding and predict clustering assignments through deep neural network, have received a lot of attention. Among these methods, KL divergence based clustering framework is one of the most popular branches. However, the clustering performances of these methods depend on an additional auxiliary target distribution. In this paper, we build a novel deep fuzzy clustering (DFC) network to learn discriminative and balanced assignment without the need of any auxiliary distribution. Specifically, we design an elaborate fuzzy clustering layer (FCL) to estimate more discriminative assignments, and utilize weighted intra-class variance (WIV) as clustering objective function to enhance the compactness of the learned embedding. Moreover, we propose extended mutual information (EMI) between input data and the corresponding clustering assignments as a regularization to achieve “fair” but “firm” assignment. Extensive experiments conducted on several datasets illustrate the superiority of the proposed approach comparing to the state-of-the-art methods.

近年来，利用深度神经网络学习潜在嵌入和预测聚类分配的联合深度聚类方法受到了广泛的关注。在这些方法中，基于KL散度的聚类框架是最受欢迎的分支之一。然而，这些方法的聚类性能依赖于一个额外的辅助目标分布。在本文中，我们建立了一种新的深度模糊聚类(DFC)网络，在不需要任何辅助分布的情况下学习判别和平衡分配。具体来说，我们设计了一个精细的模糊聚类层(FCL)来估计更多的判别分配，并利用加权类内方差(WIV)作为聚类目标函数来增强学习嵌入的紧密性。此外，我们提出了输入数据和相应的聚类分配之间的扩展互信息(EMI)作为正则化，以实现“公平”但“确定”的分配。在几个数据集上进行的大量实验表明，与最先进的方法相比，所提出的方法具有优越性。

{"title":"Deep Fuzzy Clustering with Weighted Intra-class Variance and Extended Mutual Information Regularization","authors":"Yunsheng Pang, Feiyu Chen, Sheng Huang, Yongxin Ge, Wei Wang, Taiping Zhang","doi":"10.1109/ICDMW51313.2020.00137","DOIUrl":"https://doi.org/10.1109/ICDMW51313.2020.00137","url":null,"abstract":"Recently, many joint deep clustering methods, which simultaneously learn latent embedding and predict clustering assignments through deep neural network, have received a lot of attention. Among these methods, KL divergence based clustering framework is one of the most popular branches. However, the clustering performances of these methods depend on an additional auxiliary target distribution. In this paper, we build a novel deep fuzzy clustering (DFC) network to learn discriminative and balanced assignment without the need of any auxiliary distribution. Specifically, we design an elaborate fuzzy clustering layer (FCL) to estimate more discriminative assignments, and utilize weighted intra-class variance (WIV) as clustering objective function to enhance the compactness of the learned embedding. Moreover, we propose extended mutual information (EMI) between input data and the corresponding clustering assignments as a regularization to achieve “fair” but “firm” assignment. Extensive experiments conducted on several datasets illustrate the superiority of the proposed approach comparing to the state-of-the-art methods.","PeriodicalId":426846,"journal":{"name":"2020 International Conference on Data Mining Workshops (ICDMW)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131634622","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2020 International Conference on Data Mining Workshops (ICDMW)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀