2008 IEEE International Conference on Data Mining Workshops最新文献

英文中文

Association Action Rules 关联动作规则

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.66

Z. Ras, A. Dardzinska, Li-Shiang Tsay, H. Wasyluk

Action rules describe possible transitions of objects from one state to another with respect to a distinguished attribute. Previous research on action rule discovery usually required the extraction of classification rules before constructing any action rule. This paper gives anew approach for generating association-type action rules. The notion of frequent action sets and Apriori-like strategy generating them is proposed. We introduce the notion of a representative action rules and give an algorithm to construct them directly from frequent action sets. Finally, we introduce the notion of a simple association action rule, the cost of association action rule, and give a strategy to construct simple association action rules of a lowest cost.

动作规则描述了对象从一种状态到另一种状态的可能转换。以往的动作规则发现研究通常需要在构造任何动作规则之前提取分类规则。本文给出了一种生成关联型动作规则的新方法。提出了频繁动作集的概念和生成频繁动作集的类apriori策略。引入了代表性动作规则的概念，并给出了从频繁动作集直接构造代表性动作规则的算法。最后，我们引入了简单关联动作规则的概念、关联动作规则的成本，并给出了构建成本最低的简单关联动作规则的策略。

引用次数: 73

GeoDMA - A Novel System for Spatial Data Mining GeoDMA -一个新的空间数据挖掘系统

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.22

T. Korting, Leila Maria Garcia Fonseca, M. Escada, F. C. Silva, M. Silva

Although a huge amount of remote sensing data has been provided by Earth observation satellites, few data manipulation techniques and information extraction in large data sets have been developed. In this context, the present paper aims to show a new system for spatial data mining, and two test cases applied to land use change in the Brazilian Amazon region. We present the operational environment named GeoDMA, developed to implement such approach.

虽然地球观测卫星提供了大量的遥感数据，但很少开发出大型数据集的数据处理技术和信息提取。在此背景下，本文旨在展示一个新的空间数据挖掘系统，以及两个应用于巴西亚马逊地区土地利用变化的测试案例。我们提出了一个名为GeoDMA的操作环境，为实现这种方法而开发。

引用次数: 24

Discovering Triggering Events from Longitudinal Data 从纵向数据中发现触发事件

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.136

Corrado Loglisci, D. Malerba

Longitudinal data consist of the repeated measurements of some variables which describe the dynamics of a domain(process or phenomenon) over time. They can be analyzed in order to explain what event may cause the transition from a state into the next one during the evolution of the domain. Generally, approaches to this explanation problem rely on the exclusive usage of domain knowledge, while an analysis driven from only data is still lacking. In this paper we describe a data mining approach to discover events which may have triggered a transition during the evolution of the domain. The original data mining task is decomposed into two consecutive subtasks. First, the sequence of discrete states which represents the dynamics of the domain is determined. Second, the triggering events for two successive states are found out. Computational solutions to both problems are presented. Their application to two real scenarios is presented and results are discussed.

纵向数据由一些变量的重复测量组成，这些变量描述了一个领域(过程或现象)随时间的动态。可以对它们进行分析，以便解释在域的演化过程中哪些事件可能导致从一个状态过渡到下一个状态。一般来说，解决这一解释问题的方法依赖于对领域知识的独家使用，而仅从数据驱动的分析仍然缺乏。在本文中，我们描述了一种数据挖掘方法来发现在领域演变过程中可能触发转换的事件。将原始数据挖掘任务分解为两个连续的子任务。首先，确定了表示域动态的离散状态序列。其次，找出两个连续状态的触发事件。给出了这两个问题的计算解。给出了它们在两个实际场景中的应用，并对结果进行了讨论。

引用次数: 3

Keyword Extraction Based on Lexical Chains and Word Co-occurrence for Chinese News Web Pages 基于词汇链和词共现的中文新闻网页关键词抽取

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.122

Xinghua Li, Xindong Wu, Xuegang Hu, Fei Xie, Zhaozhong Jiang

This paper presents a new keyword extraction algorithm for Chinese news Web pages using lexical chains and word co-occurrence combined with frequency features, cohesion features, and corelation features. A lexical chain is an external performance consistency by semantically related words of a text, and is the representation of the semantic content of a portion of the text. Word co-occurrence distribution is an important statistical model widely used in natural language processing that reflects the correlation of the words. Lexical chains and word co-occurrence are combined in this paper to extract keywords for Chinese news Web pages in our proposed algorithm KELCC. This algorithm is not domain-specific and can be applied to a single Web page without corpus. Experiments on randomly selected Web pages have been performed to demonstrate the quality of the keywords extracted by our proposed algorithm.

本文提出了一种基于词汇链和词共现的中文新闻网页关键词提取算法，该算法结合了频率特征、衔接特征和关联特征。词汇链是文本中语义相关词的外部性能一致性，是文本一部分语义内容的表示。词共现分布是自然语言处理中广泛应用的一个重要统计模型，它反映了词之间的相关性。本文将词汇链和词共现相结合，采用KELCC算法对中文新闻网页进行关键词提取。该算法不是特定于领域的，可以应用于没有语料库的单个Web页面。在随机选择的网页上进行了实验，以证明我们提出的算法提取的关键字的质量。

引用次数: 11

Extension of Partitional Clustering Methods for Handling Mixed Data 扩展部分聚类方法以处理混合数据

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.85

Yosr Naïja, Salem Chakhar, Kaouther Blibech Sinaoui, R. Robbana

Clustering is an active research topic in data mining and different methods have been proposed in the literature. Most of these methods are based on the use of a distance measure defined either on numerical attributes or on categorical attributes. However, in fields such as road traffic and medicine, datasets are composed of numerical and categorical attributes. Recently, there have been several proposals to develop clustering methods that support mixed attributes. There are three basic categories of clustering methods: partitional methods, hierarchical methods and density-based methods. This paper proposes an extension of partitional clustering methods devoted to mixed attributes. The proposed extension looks to create several partitions by using numerical attributes-based clustering methods and then chooses the one that maximizes a measure---called ``homogeneity degree"---of these partitions according to categorical attributes.

聚类是数据挖掘领域一个活跃的研究课题，文献中提出了不同的方法。这些方法大多基于数字属性或分类属性的距离度量。然而，在道路交通和医学等领域，数据集由数字和分类属性组成。最近，有一些建议提出开发支持混合属性的聚类方法。聚类方法有三个基本类别：划分方法、层次方法和基于密度的方法。本文提出了一种专门针对混合属性的分区聚类方法的扩展。拟议的扩展旨在通过使用基于数字属性的聚类方法创建多个分区，然后根据分类属性选择最大化这些分区的度量--即所谓的 "同质性度"。

引用次数: 10

Comparing Accuracies of Rule Evaluation Models to Determine Human Criteria on Evaluated Rule Sets 比较规则评估模型的准确性以确定评估规则集上的人为标准

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.49

H. Abe, S. Tsumoto

In data mining post-processing, rule selection using objective rule evaluation indices is one of a useful method to find out valuable knowledge from mined patterns. However, the relationship between an index value and experts' criteria has never been clarified. In this study, we have compared the accuracies of classification learning algorithms for datasets with randomized class distributions and real human evaluations. As a method to determine the relationship, we used rule evaluation models, which are learned from a dataset consisting of objective rule evaluation indices and evaluation labels for each rule. Then, the results show that accuracies of classification learning algorithms with/without criteria of human experts are different on a balanced randomized class distribution. With regarding to the results, we can consider about a way to distinguish randomly evaluated rules using the accuracies of multiple learning algorithms.

在数据挖掘后处理中，利用客观规则评价指标进行规则选择是从挖掘的模式中发现有价值知识的有效方法之一。然而，指标值与专家标准之间的关系从未得到澄清。在这项研究中，我们比较了随机分类分布和真实人类评估数据集的分类学习算法的准确性。作为一种确定关系的方法，我们使用了规则评价模型，该模型是从由客观规则评价指标和每个规则的评价标签组成的数据集中学习到的。然后，研究结果表明，在平衡随机分类分布下，有和没有人类专家标准的分类学习算法的准确率是不同的。针对这些结果，我们可以考虑一种利用多种学习算法的准确率来区分随机评估规则的方法。

引用次数: 5

Detecting Suspicious Behavior in Surveillance Images 在监控图像中检测可疑行为

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.36

Daniel Barbará, C. Domeniconi, Zoran Duric, M. Filippone, Richard Mansfield, E. Lawson

We introduce a novel technique to detect anomalies in images. The notion of normalcy is given by a baseline of images, under the assumption that the majority of such images is normal. The key of our approach is a featureless probabilistic representation of images, based on the length of the codeword necessary to represent each image. Such codeword's lengths are then used for anomaly detection based on statistical testing. Our techniques were tested on synthetic and real data sets. The results show that our approach can achieve high true positive and low false positive rates.

我们介绍了一种新的图像异常检测技术。正态性的概念是由图像的基线给出的，假设这些图像中的大多数是正常的。我们方法的关键是基于表示每个图像所需的码字长度的图像的无特征概率表示。这样的码字长度然后用于基于统计测试的异常检测。我们的技术在合成数据集和真实数据集上进行了测试。结果表明，该方法可以实现高真阳性率和低假阳性率。

引用次数: 16

Comparing Reliability of Association Rules and OLAP Statistical Tests 关联规则与OLAP统计检验的信度比较

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.76

Zhibo Chen, C. Ordonez, Kai Zhao

Association rules is a technique that can detect patterns within the items of a dataset. The constrained version applies several restrictions that reduces the number of rules and also helps improve performance. On the other hand, OLAP statistical tests is an integration of exploratory On-Line Analytical Processing techniques and statistical tests. It uses a different approach that make it more appropriate for continuous domains and is able to discover more informative patterns. In this article, we thoroughly compare the reliability of the results returned by both techniques by analyzing the metrics, such as confidence and p-value, by which these techniques are implemented in relation to the results that are generated. While these two techniques are different, we were able to bring both to level ground by extending association rules with pairing to discover more specific patterns and extending OLAP statistical tests with constraints to reduce the number of discovered patterns. We conducted our experiments on a real medical dataset and found that the extended OLAP statistical tests discovered more patterns, had comparable performance, and possessed higher reliability due to its strong statistical background.

关联规则是一种可以检测数据集项中的模式的技术。约束版本应用了几个限制，减少了规则的数量，还有助于提高性能。另一方面，OLAP统计测试是探索性在线分析处理技术和统计测试的集成。它使用了一种不同的方法，使其更适合于连续域，并且能够发现更多信息丰富的模式。在本文中，我们通过分析度量(如置信度和p值)来彻底比较这两种技术返回结果的可靠性，通过这些度量实现这些技术与生成的结果的关系。虽然这两种技术是不同的，但我们能够通过扩展带有配对的关联规则来发现更具体的模式，并扩展带有约束的OLAP统计测试来减少发现模式的数量，从而使两者达到同一水平。我们在一个真实的医学数据集上进行了实验，发现扩展的OLAP统计测试发现了更多的模式，具有可比较的性能，并且由于其强大的统计背景而具有更高的可靠性。

{"title":"Comparing Reliability of Association Rules and OLAP Statistical Tests","authors":"Zhibo Chen, C. Ordonez, Kai Zhao","doi":"10.1109/ICDMW.2008.76","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.76","url":null,"abstract":"Association rules is a technique that can detect patterns within the items of a dataset. The constrained version applies several restrictions that reduces the number of rules and also helps improve performance. On the other hand, OLAP statistical tests is an integration of exploratory On-Line Analytical Processing techniques and statistical tests. It uses a different approach that make it more appropriate for continuous domains and is able to discover more informative patterns. In this article, we thoroughly compare the reliability of the results returned by both techniques by analyzing the metrics, such as confidence and p-value, by which these techniques are implemented in relation to the results that are generated. While these two techniques are different, we were able to bring both to level ground by extending association rules with pairing to discover more specific patterns and extending OLAP statistical tests with constraints to reduce the number of discovered patterns. We conducted our experiments on a real medical dataset and found that the extended OLAP statistical tests discovered more patterns, had comparable performance, and possessed higher reliability due to its strong statistical background.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125140474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Detecting and Tracking Spatio-temporal Clusters with Adaptive History Filtering 基于自适应历史滤波的时空聚类检测与跟踪

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.93

J. Rosswog, K. Ghose

This paper addresses the problem of detecting and tracking moving clusters in spatio-temporal data sets. Spatio-temporal data sets contain data elements that move in space over time. Traditional data clustering algorithms work well on static data sets that contain well separated clusters. When traditional techniques are applied to spatio-temporal data they breakdown when the moving data elements intersect the space occupied by elements from another cluster. The goal of this work is to improve the accuracy of traditional data clustering algorithms on spatio-temporal data sets. Many clustering algorithms create clusters based on the distance between the elements. We extend this distance measure to be a function of the position history of the elements. We show through a series of experiments that the use of the history based distance measures greatly improves the performance of existing data clustering algorithms on spatio-temporal data sets. In random data sets we achieve up to a 90% improvement in cluster accuracy. To evaluate the clustering algorithms we created 102 spatio-temporal data sets. We also defined a set of metrics that are used to evaluate the performance of the clustering algorithms on the spatio-temporal data sets.

本文研究了在时空数据集中运动聚类的检测和跟踪问题。时空数据集包含随时间在空间中移动的数据元素。传统的数据聚类算法在包含分离良好的聚类的静态数据集上表现良好。当传统技术应用于时空数据时，当移动的数据元素与来自另一个集群的元素所占据的空间相交时，它们就失效了。本文的目标是提高传统数据聚类算法在时空数据集上的准确性。许多聚类算法基于元素之间的距离创建聚类。我们将这个距离度量扩展为元素位置历史的函数。我们通过一系列实验表明，使用基于历史的距离度量大大提高了现有数据聚类算法在时空数据集上的性能。在随机数据集中，我们实现了高达90%的聚类精度提高。为了评估聚类算法，我们创建了102个时空数据集。我们还定义了一组用于评估聚类算法在时空数据集上的性能的指标。

{"title":"Detecting and Tracking Spatio-temporal Clusters with Adaptive History Filtering","authors":"J. Rosswog, K. Ghose","doi":"10.1109/ICDMW.2008.93","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.93","url":null,"abstract":"This paper addresses the problem of detecting and tracking moving clusters in spatio-temporal data sets. Spatio-temporal data sets contain data elements that move in space over time. Traditional data clustering algorithms work well on static data sets that contain well separated clusters. When traditional techniques are applied to spatio-temporal data they breakdown when the moving data elements intersect the space occupied by elements from another cluster. The goal of this work is to improve the accuracy of traditional data clustering algorithms on spatio-temporal data sets. Many clustering algorithms create clusters based on the distance between the elements. We extend this distance measure to be a function of the position history of the elements. We show through a series of experiments that the use of the history based distance measures greatly improves the performance of existing data clustering algorithms on spatio-temporal data sets. In random data sets we achieve up to a 90% improvement in cluster accuracy. To evaluate the clustering algorithms we created 102 spatio-temporal data sets. We also defined a set of metrics that are used to evaluate the performance of the clustering algorithms on the spatio-temporal data sets.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131200544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 25

Discovery of Internal and External Hyperclique Patterns in Complex Graph Databases 复杂图数据库中内外超团模式的发现

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.59

Tsubasa Yamamoto, Tomonobu Ozaki, T. Ohkawa

In some applications, the whole structure of the target data can be represented naturally in "multi-structured graphs" that are complex graphs whose vertices consist of aset of structured data such as itemsets, sequences and so on. To catch the strong affinity relationship in multi-structured graphs, in this paper, we propose an algorithm named HFMG to discover novel and meaningful frequent patterns whose components are highly correlated with each other. HFMG mines two kinds of meaningful patterns efficiently according to which relationships we focus on. The effectiveness of the proposed algorithm is confirmed through the experiments with real and synthetic datasets.

在一些应用中，目标数据的整个结构可以自然地用“多结构图”来表示。“多结构图”是复杂图，其顶点由一组结构化数据(如项集、序列等)组成。为了抓住多结构图中的强亲和关系，本文提出了一种HFMG算法来发现成分之间高度相关的新颖而有意义的频繁模式。HFMG根据我们关注的关系，有效地挖掘出两种有意义的模式。通过真实数据集和合成数据集的实验验证了该算法的有效性。

引用次数: 2

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2008 IEEE International Conference on Data Mining Workshops

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀