2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)最新文献

英文中文

Discovering process models through relational disjunctive patterns mining 通过关系析取模式挖掘发现流程模型

2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)

Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949299

Corrado Loglisci, Michelangelo Ceci, A. Appice, D. Malerba

The automatic discovery of process models can help to gain insight into various perspectives (e.g., control flow or data perspective) of the process executions traced in an event log. Frequent patterns mining offers a means to build human understandable representations of these process models. This paper describes the application of a multi-relational method of frequent pattern discovery into process mining. Multi-relational data mining is demanded for the variety of activities and actors involved in the process executions traced in an event log which leads to a relational (or structural) representation of the process executions. Peculiarity of this work is in the integration of disjunctive forms into relational patterns discovered from event logs. The introduction of disjunctive forms enables relational patterns to express frequent variants of process models. The effectiveness of using relational patterns with disjunctions to describe process models with variants is assessed on real logs of process executions.

流程模型的自动发现有助于深入了解事件日志中跟踪的流程执行的各种透视图(例如，控制流或数据透视图)。频繁的模式挖掘提供了一种方法来构建人类可以理解的这些流程模型的表示。本文描述了一种多关系频繁模式发现方法在过程挖掘中的应用。在事件日志中跟踪流程执行中涉及的各种活动和参与者，从而生成流程执行的关系(或结构)表示，因此需要进行多关系数据挖掘。这项工作的特点是将析取形式集成到从事件日志中发现的关系模式中。析取形式的引入使关系模式能够表达流程模型的频繁变体。在流程执行的真实日志上，评估了使用带断点的关系模式来描述带有变量的流程模型的有效性。

引用次数: 1

Active classifier training with the 3DS strategy 基于3DS策略的主动分类器训练

2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)

Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949421

Tobias Reitmaier, B. Sick

In this article, we introduce and investigate 3DS, a novel selection strategy for pool-based active training of a generative classifier, namely CMM (classifier based on a probabilistic mixture model). Such a generative classifier aims at modeling the processes underlying the “generation” of the data. The strategy 3DS considers the distance of samples to the decision boundary, the density in regions where samples are selected, and the diversity of samples in the query set that are chosen for labeling, e.g., by a human domain expert. The combination of the three measures in 3DS is adaptive in the sense that the weights of the distance and the density measure depend on the uniqueness of the classification. With nine benchmark data sets it is shown that 3DS outperforms a random selection strategy (baseline method), a pure closest sampling approach, ITDS (information theoretic diversity sampling), DWUS (density-weighted uncertainty sampling), DUAL (dual strategy for active learning), and PBAC (prototype based active learning) regarding evaluation criteria such as ranked performance based on classification accuracy, number of labeled samples (data utilization), and learning speed assessed by the area under the learning curve.

在本文中，我们介绍并研究了一种新的基于池的生成分类器主动训练的选择策略3DS，即CMM(基于概率混合模型的分类器)。这样的生成分类器旨在对数据“生成”背后的过程进行建模。3DS策略考虑样本到决策边界的距离、样本所选区域的密度以及查询集中样本的多样性，例如由人类领域专家选择用于标记。在3DS中，这三种度量的结合是自适应的，因为距离和密度度量的权重取决于分类的唯一性。通过9个基准数据集，我们发现3DS在评估标准方面优于随机选择策略(基线法)、纯最接近抽样方法、ITDS(信息论多样性抽样)、DWUS(密度加权不确定性抽样)、DUAL(主动学习的双重策略)和PBAC(基于原型的主动学习)，这些评估标准包括基于分类准确性、标记样本数量(数据利用率)、学习速度由学习曲线下的面积来衡量。

{"title":"Active classifier training with the 3DS strategy","authors":"Tobias Reitmaier, B. Sick","doi":"10.1109/CIDM.2011.5949421","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949421","url":null,"abstract":"In this article, we introduce and investigate 3DS, a novel selection strategy for pool-based active training of a generative classifier, namely CMM (classifier based on a probabilistic mixture model). Such a generative classifier aims at modeling the processes underlying the “generation” of the data. The strategy 3DS considers the distance of samples to the decision boundary, the density in regions where samples are selected, and the diversity of samples in the query set that are chosen for labeling, e.g., by a human domain expert. The combination of the three measures in 3DS is adaptive in the sense that the weights of the distance and the density measure depend on the uniqueness of the classification. With nine benchmark data sets it is shown that 3DS outperforms a random selection strategy (baseline method), a pure closest sampling approach, ITDS (information theoretic diversity sampling), DWUS (density-weighted uncertainty sampling), DUAL (dual strategy for active learning), and PBAC (prototype based active learning) regarding evaluation criteria such as ranked performance based on classification accuracy, number of labeled samples (data utilization), and learning speed assessed by the area under the learning curve.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114373306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

Active learning for aspect model in recommender systems 面向方面模型的主动学习推荐系统

2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)

Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949431

R. Karimi, C. Freudenthaler, A. Nanopoulos, L. Schmidt-Thieme

Recommender systems help Web users to address information overload. Their performance, however, depends on the amount of information that users provide about their preferences. Users are not willing to provide information for a large amount of items, thus the quality of recommendations is affected specially for new users. Active learning has been proposed in the past, to acquire preference information from users. Based on an underlying prediction model, these approaches determine the most informative item for querying the new user to provide a rating. In this paper, we propose a new active learning method which is developed specially based on aspect model features. There is a difference between classic active learning and active learning for recommender system. In the recommender system context, each item has already been rated by training users while in classic active learning there is not training user. We take into account this difference and develop a new method which competes with a complicated bayesian approach in accuracy while results in drastically reduced (one order of magnitude) user waiting times, i.e., the time that the users wait before being asked a new query.

推荐系统帮助网络用户解决信息过载的问题。然而，它们的性能取决于用户提供的有关其偏好的信息量。用户不愿意为大量的项目提供信息，从而影响了推荐的质量，特别是对于新用户。主动学习在过去已经被提出，以获取用户的偏好信息。基于底层预测模型，这些方法确定查询新用户以提供评级所需的最有信息的项。本文提出了一种新的基于方面模型特征的主动学习方法。推荐系统的主动学习与经典主动学习是有区别的。在推荐系统上下文中，每个项目都已经由训练用户进行了评分，而在经典的主动学习中，没有训练用户。我们考虑到这种差异，并开发了一种新的方法，该方法在精度上与复杂的贝叶斯方法竞争，同时大大减少了(一个数量级)用户等待时间，即用户在被询问新查询之前等待的时间。

{"title":"Active learning for aspect model in recommender systems","authors":"R. Karimi, C. Freudenthaler, A. Nanopoulos, L. Schmidt-Thieme","doi":"10.1109/CIDM.2011.5949431","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949431","url":null,"abstract":"Recommender systems help Web users to address information overload. Their performance, however, depends on the amount of information that users provide about their preferences. Users are not willing to provide information for a large amount of items, thus the quality of recommendations is affected specially for new users. Active learning has been proposed in the past, to acquire preference information from users. Based on an underlying prediction model, these approaches determine the most informative item for querying the new user to provide a rating. In this paper, we propose a new active learning method which is developed specially based on aspect model features. There is a difference between classic active learning and active learning for recommender system. In the recommender system context, each item has already been rated by training users while in classic active learning there is not training user. We take into account this difference and develop a new method which competes with a complicated bayesian approach in accuracy while results in drastically reduced (one order of magnitude) user waiting times, i.e., the time that the users wait before being asked a new query.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130867232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

An intelligent load forecasting expert system by integration of ant colony optimization, genetic algorithms and fuzzy logic 基于蚁群优化、遗传算法和模糊逻辑的智能负荷预测专家系统

2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)

Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949432

A. Ghanbari, S. Abbasian-Naghneh, E. Hadavandi

Computational intelligence (CI) as an offshoot of artificial intelligence (AI), is becoming more and more widespread nowadays for solving different engineering problems. Especially by embracing Swarm Intelligence techniques such as ant colony optimization (ACO), CI is known as a good alternative to classical AI for dealing with practical problems which are not easy to solve by traditional methods. Besides, electricity load forecasting is one of the most important concerns of power systems, consequently; developing intelligent methods in order to perform accurate forecasts is vital for such systems. This study presents a hybrid CI methodology (called ACO-GA) by integration of ant colony optimization, genetic algorithm (GA) and fuzzy logic to construct a load forecasting expert system. The superiority and applicability of ACO-GA is shown for Iran's annual electricity load forecasting problem and results are compared with adaptive neuro-fuzzy inference system (ANFIS), which is a common approach in this field. The outcomes indicate that ACO-GA provides more accurate results than ANFIS approach. Moreover, the results of this study provide decision makers with an appropriate simulation tool to make more accurate forecasts on future electricity loads.

计算智能(CI)作为人工智能(AI)的一个分支，在解决各种工程问题方面得到越来越广泛的应用。特别是通过采用蚁群优化(蚁群优化)等群体智能技术，CI被认为是传统人工智能的一个很好的替代方案，可以处理传统方法不易解决的实际问题。此外，电力负荷预测是电力系统的重要问题之一，因此;开发智能方法以执行准确的预测对这类系统至关重要。本文提出了一种将蚁群优化、遗传算法和模糊逻辑相结合的混合CI方法(ACO-GA)来构建负荷预测专家系统。将蚁群遗传算法应用于伊朗年度电力负荷预测问题，并与该领域常用的自适应神经模糊推理系统(ANFIS)进行了比较。结果表明，ACO-GA比ANFIS方法提供了更准确的结果。此外，本研究的结果为决策者提供了一个合适的模拟工具，以更准确地预测未来的电力负荷。

{"title":"An intelligent load forecasting expert system by integration of ant colony optimization, genetic algorithms and fuzzy logic","authors":"A. Ghanbari, S. Abbasian-Naghneh, E. Hadavandi","doi":"10.1109/CIDM.2011.5949432","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949432","url":null,"abstract":"Computational intelligence (CI) as an offshoot of artificial intelligence (AI), is becoming more and more widespread nowadays for solving different engineering problems. Especially by embracing Swarm Intelligence techniques such as ant colony optimization (ACO), CI is known as a good alternative to classical AI for dealing with practical problems which are not easy to solve by traditional methods. Besides, electricity load forecasting is one of the most important concerns of power systems, consequently; developing intelligent methods in order to perform accurate forecasts is vital for such systems. This study presents a hybrid CI methodology (called ACO-GA) by integration of ant colony optimization, genetic algorithm (GA) and fuzzy logic to construct a load forecasting expert system. The superiority and applicability of ACO-GA is shown for Iran's annual electricity load forecasting problem and results are compared with adaptive neuro-fuzzy inference system (ANFIS), which is a common approach in this field. The outcomes indicate that ACO-GA provides more accurate results than ANFIS approach. Moreover, the results of this study provide decision makers with an appropriate simulation tool to make more accurate forecasts on future electricity loads.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126822695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Flexible Heuristics Miner (FHM) 灵活启发式算法(FHM)

2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)

Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949453

A. Weijters, J. Ribeiro

One of the aims of process mining is to retrieve a process model from a given event log. However, current techniques have problems when mining processes that contain nontrivial constructs, processes that are low structured and/or dealing with the presence of noise in the event logs. To overcome these problems, a new process representation language is presented in combination with an accompanying process mining algorithm. The most significant property of the new representation language is in the way the semantics of splits and joins are represented; by using so-called split/join frequency tables. This results in easy to understand process models even in the case of non-trivial constructs, low structured domains and the presence of noise. This paper explains the new process representation language and how the mining algorithm works. The algorithm is implemented as a plug-in in the ProM framework. An illustrative example with noise and a real life log of a complex and low structured process are used to explicate the presented approach.

流程挖掘的目标之一是从给定的事件日志中检索流程模型。然而，当前的技术在挖掘包含重要构造的过程、低结构化的过程和/或处理事件日志中存在的噪声时存在问题。为了克服这些问题，提出了一种新的过程表示语言，并结合了相应的过程挖掘算法。这种新的表示语言最重要的特性是表示分割和连接的语义的方式;通过使用所谓的分割/连接频率表。这使得过程模型即使在非平凡结构、低结构域和存在噪声的情况下也易于理解。本文解释了新的过程表示语言和挖掘算法的工作原理。该算法作为插件在ProM框架中实现。用一个带有噪声的实例和一个复杂的低结构过程的真实生命日志来说明所提出的方法。

引用次数: 450

Feature extraction for multi-label learning in the domain of email classification 电子邮件分类领域中多标签学习的特征提取

2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)

Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949301

José M. Carmona-Cejudo, Manuel Baena-García, J. D. Campo-Ávila, Rafael Morales Bueno

Multi-label learning is a very interesting field in Machine Learning. It allows to generalise standard methods and evaluation procedures, and tackle challenging real problems where one example can be tagged with more than one label. In this paper we study the performance of different multi-label methods in combination with standard single-label algorithms, using several specific multi-label metrics. What we want to show is how a good preprocessing phase can improve the performance of such methods and algorithms. As we will explain, its main advantage is a shorter time to induce the models, while keeping (even improving) other classification quality measures. We use the GNUsmail framework to do the preprocessing of an existing and extensively used dataset, to obtain a reduced feature space that conserves the relevant information and allows improvements on performance. Thanks to the capabilities of GNUsmail, the preprocessing step can be easily applied to different email datasets.

多标签学习是机器学习中一个非常有趣的领域。它允许推广标准方法和评估程序，并解决具有挑战性的实际问题，其中一个示例可以使用多个标签进行标记。在本文中，我们使用几个特定的多标签度量来研究不同的多标签方法与标准单标签算法相结合的性能。我们想展示的是良好的预处理阶段如何提高这些方法和算法的性能。正如我们将解释的那样，它的主要优点是可以缩短归纳模型的时间，同时保持(甚至改进)其他分类质量度量。我们使用GNUsmail框架对现有的和广泛使用的数据集进行预处理，以获得减少的特征空间，以保存相关信息并允许改进性能。由于GNUsmail的功能，预处理步骤可以很容易地应用于不同的电子邮件数据集。

引用次数: 7

GSOM sequence: An unsupervised dynamic approach for knowledge discovery in temporal data GSOM序列:时间数据中知识发现的无监督动态方法

2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)

Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949456

A. Fonseka, D. Alahakoon, S. Bedingfield

A significant problem which arises during the process of knowledge discovery is dealing with data which have temporal dependencies. The attributes associated with temporal data need to be processed differently from non temporal attributes. A typical approach to address this issue is to view temporal data as an ordered sequence of events. In this work, we propose a novel dynamic unsupervised learning approach to discover patterns in temporal data. The new technique is based on the Growing Self-Organization Map (GSOM), which is a structure adapting version of the Self-Organizing Map (SOM). The SOM is widely used in knowledge discovery applications due to its unsupervised learning nature, ease of use and visualization capabilities. The GSOM further enhances the SOM with faster processing, more representative cluster formation and the ability to control map spread. This paper describes a significant extension to the GSOM enabling it to be used to for analyzing data with temporal sequences. The similarity between two time dependent sequences with unequal length is estimated using the Dynamic Time Warping (DTW) algorithm incorporated into the GSOM. Experiments were carried out to evaluate the performance and the validity of the proposed approach using an audio-visual data set. The results demonstrate that the novel “GSOM Sequence” algorithm improves the accuracy and validity of the clusters obtained.

在知识发现过程中出现的一个重要问题是如何处理具有时间依赖性的数据。与时态数据相关联的属性需要与非时态属性进行不同的处理。解决此问题的典型方法是将时间数据视为有序的事件序列。在这项工作中，我们提出了一种新的动态无监督学习方法来发现时间数据中的模式。这种新技术基于生长自组织图(growth Self-Organization Map, GSOM)，它是自组织图(Self-Organizing Map, SOM)的结构适应版本。SOM由于其无监督学习的特性、易用性和可视化能力而广泛应用于知识发现应用。GSOM以更快的处理速度、更具代表性的集群形成和控制地图扩展的能力进一步增强了SOM。本文描述了对GSOM的一个重要扩展，使其能够用于分析具有时间序列的数据。将动态时间翘曲(DTW)算法引入到GSOM中，估计了两个不等长时变序列之间的相似度。利用视听数据集进行了实验，以评估该方法的性能和有效性。结果表明，“GSOM序列”算法提高了聚类的准确性和有效性。

{"title":"GSOM sequence: An unsupervised dynamic approach for knowledge discovery in temporal data","authors":"A. Fonseka, D. Alahakoon, S. Bedingfield","doi":"10.1109/CIDM.2011.5949456","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949456","url":null,"abstract":"A significant problem which arises during the process of knowledge discovery is dealing with data which have temporal dependencies. The attributes associated with temporal data need to be processed differently from non temporal attributes. A typical approach to address this issue is to view temporal data as an ordered sequence of events. In this work, we propose a novel dynamic unsupervised learning approach to discover patterns in temporal data. The new technique is based on the Growing Self-Organization Map (GSOM), which is a structure adapting version of the Self-Organizing Map (SOM). The SOM is widely used in knowledge discovery applications due to its unsupervised learning nature, ease of use and visualization capabilities. The GSOM further enhances the SOM with faster processing, more representative cluster formation and the ability to control map spread. This paper describes a significant extension to the GSOM enabling it to be used to for analyzing data with temporal sequences. The similarity between two time dependent sequences with unequal length is estimated using the Dynamic Time Warping (DTW) algorithm incorporated into the GSOM. Experiments were carried out to evaluate the performance and the validity of the proposed approach using an audio-visual data set. The results demonstrate that the novel “GSOM Sequence” algorithm improves the accuracy and validity of the clusters obtained.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130122164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

A recommendation algorithm using positive and negative latent models 一种基于正潜和负潜模型的推荐算法

2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)

Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949455

A. Takasu, Saranya Maneeroj

This paper proposes an algorithm for recommender systems that uses both positive and negative latent user models. In recommending items to a user, recommender systems usually exploit item content information as well as the preferences of similar users. Various types of content information can be attached to items and these are useful for judging user preferences. For example, in movie recommendations, a movie record may include the director, the actors, and reviews. These types of information help systems calculate sophisticated user preferences. We first propose a probabilistic model that maps multi-attributed records into a low-dimensional feature space. The proposed model extends latent Dirichlet allocation to the handling of multi-attributed data. We derive an algorithm for estimating the model's parameters using the Gibbs sampling technique. Next, we propose a probabilistic model to calculate user preferences for items in the feature space. Finally, we develop a recommendation algorithm based on the probabilistic model that works efficiently for large quantities of items and user ratings. We use a publicly available movie corpus to evaluate the proposed algorithm empirically, in terms of both its recommendation accuracy and its processing efficiency.

本文提出了一种同时使用正面和负面潜在用户模型的推荐系统算法。在向用户推荐商品时，推荐系统通常利用商品内容信息以及类似用户的偏好。可以将各种类型的内容信息附加到项目上，这些信息对于判断用户偏好非常有用。例如，在电影推荐中，电影记录可能包括导演、演员和评论。这些类型的信息帮助系统计算复杂的用户偏好。我们首先提出了一个概率模型，将多属性记录映射到低维特征空间。该模型将潜在狄利克雷分配扩展到多属性数据的处理。我们推导了一种利用吉布斯抽样技术估计模型参数的算法。接下来，我们提出了一个概率模型来计算用户对特征空间中物品的偏好。最后，我们开发了一种基于概率模型的推荐算法，该算法可以有效地处理大量的项目和用户评分。我们使用一个公开的电影语料库，从推荐精度和处理效率两方面对所提出的算法进行了实证评估。

{"title":"A recommendation algorithm using positive and negative latent models","authors":"A. Takasu, Saranya Maneeroj","doi":"10.1109/CIDM.2011.5949455","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949455","url":null,"abstract":"This paper proposes an algorithm for recommender systems that uses both positive and negative latent user models. In recommending items to a user, recommender systems usually exploit item content information as well as the preferences of similar users. Various types of content information can be attached to items and these are useful for judging user preferences. For example, in movie recommendations, a movie record may include the director, the actors, and reviews. These types of information help systems calculate sophisticated user preferences. We first propose a probabilistic model that maps multi-attributed records into a low-dimensional feature space. The proposed model extends latent Dirichlet allocation to the handling of multi-attributed data. We derive an algorithm for estimating the model's parameters using the Gibbs sampling technique. Next, we propose a probabilistic model to calculate user preferences for items in the feature space. Finally, we develop a recommendation algorithm based on the probabilistic model that works efficiently for large quantities of items and user ratings. We use a publicly available movie corpus to evaluate the proposed algorithm empirically, in terms of both its recommendation accuracy and its processing efficiency.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124840477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

KB-CB-N classification: Towards unsupervised approach for supervised learning KB-CB-N分类:面向监督学习的无监督方法

2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)

Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949435

Z. Abdallah, M. Gaber

Data classification has attracted considerable research attention in the field of computational statistics and data mining due to its wide range of applications. K Best Cluster Based Neighbour (KB-CB-N) is our novel classification technique based on the integration of three different similarity measures for cluster based classification. The basic principle is to apply unsupervised learning on the instances of each class in the dataset and then use the output as an input for the classification algorithm to find the K best neighbours of clusters from the density, gravity and distance perspectives. Clustering is applied as an initial step within each class to find the inherent in-class grouping in the dataset. Different data clustering techniques use different similarity measures. Each measure has its own strength and weakness. Thus, combining the three measures can benefit from the strength of each one and eliminate encountered problems of using an individual measure. Extensive experimental results using eight real datasets have evidenced that our new technique typically shows improved or equivalent performance over other existing state-of-the-art classification methods.

数据分类由于其广泛的应用，在计算统计和数据挖掘领域引起了相当大的研究关注。基于最佳聚类邻居(KB-CB-N)是一种基于三种不同相似性度量的聚类分类新技术。基本原理是对数据集中每个类的实例应用无监督学习，然后将输出作为分类算法的输入，从密度、重力和距离的角度找到K个簇的最佳邻居。聚类是在每个类中应用的初始步骤，以找到数据集中固有的类内分组。不同的数据聚类技术使用不同的相似性度量。每种措施都有其优缺点。因此，将三个度量结合起来可以从每个度量的优势中获益，并消除使用单个度量所遇到的问题。使用8个真实数据集的广泛实验结果证明，我们的新技术通常比其他现有的最先进的分类方法表现出改进或同等的性能。

{"title":"KB-CB-N classification: Towards unsupervised approach for supervised learning","authors":"Z. Abdallah, M. Gaber","doi":"10.1109/CIDM.2011.5949435","DOIUrl":"https://doi.org/10.1109/CIDM.2011.5949435","url":null,"abstract":"Data classification has attracted considerable research attention in the field of computational statistics and data mining due to its wide range of applications. K Best Cluster Based Neighbour (KB-CB-N) is our novel classification technique based on the integration of three different similarity measures for cluster based classification. The basic principle is to apply unsupervised learning on the instances of each class in the dataset and then use the output as an input for the classification algorithm to find the K best neighbours of clusters from the density, gravity and distance perspectives. Clustering is applied as an initial step within each class to find the inherent in-class grouping in the dataset. Different data clustering techniques use different similarity measures. Each measure has its own strength and weakness. Thus, combining the three measures can benefit from the strength of each one and eliminate encountered problems of using an individual measure. Extensive experimental results using eight real datasets have evidenced that our new technique typically shows improved or equivalent performance over other existing state-of-the-art classification methods.","PeriodicalId":211565,"journal":{"name":"2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133416846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Visual tracking of the Millennium Development Goals with a Self-organizing neural network 基于自组织神经网络的千年发展目标视觉跟踪

2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)

Pub Date : 2011-04-11 DOI: 10.1109/CIDM.2011.5949433

Peter Sarlin

The Millennium Development Goals (MDGs) represent commitments to reduce poverty and hunger, and to tackle ill-health, gender inequality, lack of education, lack of access to clean water and environmental degradation by 2015. The eight goals of the Millennium Declaration are tracked using 21 benchmark targets, measured by 60 indicators. This paper explores whether the application of the Self-organizing map (SOM), a neural network-based projection and clustering technique, facilitates monitoring of the multidimensional MDGs. First, this paper presents a SOM model for visual benchmarking of countries and for visual analysis of the evolution of MDG indicators. Second, the SOM is paired with a geospatial dimension by mapping the clustering results on a geographic map. The results of this paper indicate that the SOM is a feasible tool for visual monitoring of MDG indicators.

千年发展目标是承诺到2015年减少贫穷和饥饿，并解决健康不良、性别不平等、缺乏教育、无法获得清洁水和环境退化等问题。《千年宣言》的八项目标使用21个基准目标进行跟踪，用60个指标进行衡量。本文探讨了自组织地图(SOM)——一种基于神经网络的投影和聚类技术——的应用是否有助于对多维千年发展目标的监测。首先，本文提出了一个SOM模型，用于对各国进行可视化基准测试，并对千年发展目标指标的演变进行可视化分析。其次，通过将聚类结果映射到地理地图上，将SOM与地理空间维度配对。本文的研究结果表明，SOM是一种可行的千年发展目标指标可视化监测工具。

引用次数: 1

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM)

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀