2008 IEEE International Conference on Data Mining Workshops最新文献

英文中文

Mining Temporal Patterns with Quantitative Intervals 利用定量间隔挖掘时间模式

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.16

Thomas Guyet, R. Quiniou

In this paper we consider the problem of discovering frequent temporal patterns in a database of temporal sequences, where a temporal sequence is a set of items with associated dates and durations. Since the quantitative temporal information appears to be fundamental in many contexts, it is taken into account in the mining processes and returned as part of the extracted knowledge. To this end, we have adapted the classical a priori (Agrawal and Srikant, 1995) framework to propose an efficient algorithm based on a hyper-cube representation of temporal sequences. The extraction of quantitative temporal information is performed using a density estimation of the distribution of event intervals from the temporal sequences. An evaluation on synthetic data sets shows that the proposed algorithm can robustly extract frequent temporal patterns with quantitative temporal extents.

在本文中，我们考虑在时间序列数据库中发现频繁时间模式的问题，其中时间序列是一组具有相关日期和持续时间的项目。由于定量时间信息在许多情况下似乎是基本的，因此在挖掘过程中考虑到它，并作为提取的知识的一部分返回。为此，我们改编了经典的先验(Agrawal和Srikant, 1995)框架，提出了一种基于时间序列的超立方体表示的高效算法。定量时间信息的提取是利用时间序列中事件间隔分布的密度估计来完成的。对综合数据集的评估表明，该算法可以鲁棒地提取具有定量时间范围的频繁时间模式。

引用次数: 45

Service Oriented KDD: A Framework for Grid Data Mining Workflows 面向服务的KDD:网格数据挖掘工作流的框架

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.28

M. Lackovic, D. Talia, Paolo Trunfio

Weka4WS is an extension of the Weka toolkit to support remote execution of data mining tasks as grid services. A first version of Weka4WS supporting concurrent execution of multiple data mining tasks on remote grid nodes has been presented in a previous work. In this paper we present a new version supporting also the composition and execution of data mining workflows on a grid. This new version of Weka4WS extends the KnowledgeFlow component of Weka by allowing the data mining tasks of the workflow to run in parallel on different machines, hence reducing the execution time. Besides the performance improvement, the capability of designing data mining applications as workflows allows to define typical patterns and to reuse them in different contexts. In this paper we describe the architecture of the system, the functionalities of the Weka4WS KnowledgeFlow, and some examples of use with their performance.

Weka4WS是Weka工具包的扩展，它支持将数据挖掘任务作为网格服务远程执行。支持在远程网格节点上并发执行多个数据挖掘任务的Weka4WS的第一个版本已经在之前的工作中提出。在本文中，我们提出了一个支持网格上数据挖掘工作流的组合和执行的新版本。这个新版本的Weka4WS扩展了Weka的KnowledgeFlow组件，允许工作流的数据挖掘任务在不同的机器上并行运行，从而减少了执行时间。除了性能改进之外，将数据挖掘应用程序设计为工作流的能力还允许定义典型模式并在不同的上下文中重用它们。在本文中，我们描述了系统的体系结构，Weka4WS知识流的功能，以及一些使用实例和它们的性能。

引用次数: 6

Hierarchical Text Categorization in a Transductive Setting 转换设置中的层次文本分类

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.126

Michelangelo Ceci

Transductive learning is the learning setting that permits to learn from "particular to particular'' and to consider both labelled and unlabelled examples when taking classification decisions. In this paper, we investigate the use of transductive learning in the context of hierarchical text categorization. At this aim, we exploit a modified version of an inductive hierarchical learning framework that permits to classify documents in internal and leaf nodes of a hierarchy of categories. Experimental results on real world datasets are reported.

转换学习是一种学习设置，允许从“特定到特定”进行学习，并在进行分类决策时考虑标记和未标记的示例。在本文中，我们研究了在层次文本分类的背景下使用转换学习。为此，我们利用了一个改进版本的归纳层次学习框架，该框架允许在类别层次结构的内部和叶节点中对文档进行分类。报告了在真实世界数据集上的实验结果。

引用次数: 10

Title-Composing Support System for Reaching New Audiences 面向新受众的片名创作支持系统

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.24

Yoko Nishihara, W. Sunayama

This paper proposes a support system for composing good titles for research papers in order to reach new audiences. Our system takes titles as input. The system evaluates title understandability and interest level of a title. The system ranks titles and outputs a title list. Users are able to recompose their titles by referring to the list and each evaluation value. Using the system, users can obtain new audiences who have not previously been interested in the userpsilas research area. Experimental results showed that our system is able to rank titles in descending order of audiencespsila choices.

本文提出了一个为研究论文撰写好标题的支持系统，以达到新的受众。我们的系统接受标题作为输入。该系统评估标题的可理解性和标题的兴趣水平。系统对标题进行排序并输出标题列表。用户可以通过引用列表和每个评估值来重新组合他们的标题。使用该系统，用户可以获得以前对用户的研究领域不感兴趣的新受众。实验结果表明，该系统能够根据读者的阅读选择按降序对标题进行排序。

引用次数: 6

Mining Allocating Patterns in One-Sum Weighted Items 单和加权项的分配模式挖掘

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.112

Y. Wang, Xinwei Zheng, Frans Coenen, Cindy Y. Li

An association rule (AR) is a common knowledge model in data mining that describes an implicative co-occurring relationship between two disjoint sets of binary-valued transaction database attributes (items), expressed in the form of an "antecedent rArr consequent" rule. A variant of the AR is the weighted association rule (WAR). With regard to a marketing context, this paper introduces a new knowledge model in data mining - allocating pattern (ALP). An ALP is a special form of WAR, where each rule item is associated with a weighting score between 0 and 1, and the sum of all rule item scores is 1. It can not only indicate the implicative co-occurring relationship between two (disjoint) sets of items in a weighted setting, but also inform the "allocating" relationship among rule items. ALPs can be demonstrated to be applicable in marketing and possibly a surprising variety of other areas. We further propose an apriori based algorithm to extract hidden and interesting ALPs from a "one-sum" weighted transaction database. The experimental results show the effectiveness of the proposed algorithm.

关联规则(AR)是数据挖掘中的一种通用知识模型，它描述了两个不相交的二元事务数据库属性(项)集之间隐含的共同发生的关系，以“先行规则”的形式表示。AR的一种变体是加权关联规则(WAR)。针对营销环境，提出了一种新的数据挖掘知识模型——分配模式(ALP)。ALP是WAR的一种特殊形式，其中每个规则项都与0到1之间的权重分数相关联，并且所有规则项分数的总和为1。它不仅可以指示加权设置中两个(不相交的)条目集之间隐含的共发生关系，还可以通知规则条目之间的“分配”关系。阿尔卑斯山可以被证明适用于市场营销和可能令人惊讶的其他领域。我们进一步提出了一种基于先验的算法，从“一和”加权事务数据库中提取隐藏的和有趣的阿尔卑斯山。实验结果表明了该算法的有效性。

引用次数: 3

Identification of Causal Variables for Building Energy Fault Detection by Semi-supervised LDA and Decision Boundary Analysis 基于半监督LDA和决策边界分析的建筑能源故障因果变量识别

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.44

Keigo Yoshida, M. Inui, T. Yairi, K. Machida, Masaki Shioya, Y. Masukawa

This paper addresses the identification problem of causal variables for the system anomaly. In real-world complicated systems, even experts often fail to specify causal factors, thus they attempt to detect the anomaly with exploratory heuristics. Our goal is to offer further information that supports anomaly cause analysis using the incomplete empirical knowledge. Proposed technique discovers responsible factors for the fault by leveraging domain knowledge with an effective combination of semi-supervised linear discriminant analysis (LDA) and boundary-based discriminative subspace identification method. Experimental results on synthetic and real dataset confirmed validity of our approach. Moreover, we applied this method to the building energy fault diagnosis and succeeded in extracting causal variables for energy waste in a building.

本文研究了系统异常原因变量的识别问题。在现实世界的复杂系统中，即使是专家也常常无法确定因果因素，因此他们试图用探索性启发式来检测异常。我们的目标是利用不完整的经验知识提供进一步的信息来支持异常原因分析。该技术将半监督线性判别分析(LDA)和基于边界的判别子空间识别方法有效结合，利用领域知识发现故障的责任因素。在合成数据集和真实数据集上的实验结果验证了该方法的有效性。并将该方法应用于建筑能源故障诊断中，成功地提取了建筑能源浪费的原因变量。

引用次数: 9

Efficient Distance Computation Using SQL Queries and UDFs 使用SQL查询和udf进行有效的距离计算

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.135

Sasi K. Pitchaimalai, C. Ordonez, Carlos Garcia-Alvarado

Distance computation is one of the most computationally intensive operations employed by many data mining algorithms. Performing such matrix computations within a DBMS creates many optimization challenges. We propose techniques to efficiently compute Euclidean distance using SQL queries and user-defined functions (UDFs). We concentrate on efficient Euclidean distance computation for the well-known K-means clustering algorithm. We present SQL query optimizations and a scalar UDF to compute Euclidean distance. We experimentally evaluate performance and scalability of our proposed SQL queries and UDF with large data sets on a modern DBMS. We benchmark distance computation on two important data mining techniques: clustering and classification. In general, UDFs are faster than SQL queries because they are executed in main memory. Data set size is the main factor impacting performance, followed by data set dimensionality.

距离计算是许多数据挖掘算法中计算量最大的操作之一。在DBMS中执行这种矩阵计算会带来许多优化挑战。我们提出了使用SQL查询和用户定义函数(udf)有效计算欧几里得距离的技术。我们专注于有效的欧氏距离计算为著名的k均值聚类算法。我们提出了SQL查询优化和一个标量UDF来计算欧几里得距离。我们通过实验评估了我们提出的SQL查询和UDF在现代DBMS上使用大型数据集的性能和可伸缩性。我们在两种重要的数据挖掘技术:聚类和分类上对距离计算进行基准测试。一般来说，udf比SQL查询快，因为它们是在主内存中执行的。数据集大小是影响性能的主要因素，其次是数据集维数。

引用次数: 12

Interactive Exploration of Model-Based Automatically Extracted Data 基于模型的自动提取数据的交互式探索

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.34

A. Coden, I. Sominsky, M. Tanenblatt

We present an interactive system to query, explore and navigate data according to a hierarchical knowledge model that had been automatically populated from unstructured textual data. Our system differs from systems assisting in the navigation of domain ontologies and mining between pairs of concepts in that it enables access to unstructured data by abstract concepts and relations between them. Concepts in turn are specified by sets of models and their relations. However, some concepts may not have a direct representation in the text. In particular, the demonstration query by model/cancer (QbM/C) is based on unstructured pathology reports. The knowledge model represents both named entities such as diagnosis and anatomical site, and higher level concepts such as primary and metastatic tumor. Such concepts are based on the relations between named entities. We will present the data layout and access mechanism from the GUI to the data.

我们提出了一个交互式系统来查询、探索和导航数据，根据一个层次知识模型，该模型已经从非结构化文本数据自动填充。我们的系统不同于帮助导航领域本体和挖掘概念对的系统，因为它允许通过抽象概念和它们之间的关系访问非结构化数据。概念又由一组模型及其关系指定。但是，有些概念在文本中可能没有直接表示。特别是，按模型/肿瘤(QbM/C)的演示查询是基于非结构化病理报告的。知识模型既表示已命名的实体，如诊断和解剖部位，也表示更高层次的概念，如原发性和转移性肿瘤。这些概念基于命名实体之间的关系。我们将介绍从GUI到数据的数据布局和访问机制。

引用次数: 0

Semantic Analysis Method for Unstructured Data in Telecom Services 电信业务中非结构化数据的语义分析方法

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.79

M. Iwashita, K. Nishimatsu, S. Shimogawa

A variety of services have recently been provided depending on highly developed networks and personal equipment. With these advances, connecting this equipment has become increasingly more complicated. Problems such as an increase in no-connection and determining the cause have become difficult in some cases because software is often updated to keep up with advancements in services or security. Telecom operators must understand the situation and act as quickly as possible when they receive customer enquiries. In this paper, we propose a method for analyzing and classifying customer enquiries that enables quick and efficient responses. This method is based upon a dependency parsing and co-occurrence technique to enable classification of a large amount of unstructured data into patterns because customer enquiries are generally stored as unstructured textual data.

最近依靠高度发达的网络和个人设备提供了各种服务。随着这些进步，连接这些设备变得越来越复杂。在某些情况下，诸如无连接增加之类的问题和确定原因变得很困难，因为软件经常更新以跟上服务或安全性的进步。电信运营商必须了解情况，并在收到客户询问后尽快采取行动。在本文中，我们提出了一种分析和分类客户查询的方法，使快速有效的响应。该方法基于依赖项解析和共存技术，支持将大量非结构化数据分类为模式，因为客户查询通常存储为非结构化文本数据。

引用次数: 3

ARUBAS: An Association Rule Based Similarity Framework for Associative Classifiers 基于关联规则的关联分类器相似度框架

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.58

B. Depaire, K. Vanhoof, G. Wets

This article introduces ARUBAS, a new framework to build associative classifiers. In contrast with many existing associative classifiers, it uses class association rules to transform the feature space and uses instance-based reasoning to classify new instances. The framework allows the researcher to use any association rule mining algorithm to produce the class association rules. Every aspect of the framework is extensively introduced and discussed and five different fitness measures used for classification purposes are defined. The empirical results determine which fitness measure is the best and compares the framework with other classifiers. These results show that the ARUBAS framework is able to produce associative classifiers which are competitive with other classification techniques. More specifically, with ARUBAS-Scheffer-phi5 we have introduced a parameter-free algorithm which is competitive with classification techniques such as C4.5, RIPPER and CBA.

本文介绍了一种构建关联分类器的新框架ARUBAS。与现有的许多关联分类器相比，它使用类关联规则对特征空间进行变换，并使用基于实例的推理对新实例进行分类。该框架允许研究者使用任何关联规则挖掘算法来生成类关联规则。广泛介绍和讨论了框架的每个方面，并定义了用于分类目的的五种不同的适应度度量。经验结果确定了哪个适应度度量是最好的，并将框架与其他分类器进行比较。这些结果表明，ARUBAS框架能够产生与其他分类技术相竞争的关联分类器。更具体地说，在ARUBAS-Scheffer-phi5中，我们引入了一种与C4.5、RIPPER和CBA等分类技术相竞争的无参数算法。

引用次数: 11

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2008 IEEE International Conference on Data Mining Workshops

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀