2008 IEEE International Conference on Data Mining Workshops最新文献

英文中文

Temporal Evolution of the UK Web 英国网络的时间演化

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.88

Ilaria Bordino, P. Boldi, D. Donato, Massimo Santini, S. Vigna

Recently, a new temporal dataset has been made public: it is made of a series of twelve 100 M pages snapshots of the .uk domain. The Web graphs of the twelve snapshots have been merged into a single time-aware graph that provide constant-time access to temporal information. In this paper we present the first statistical analysis performed on this graph, with the goal of checking whether the information contained in the graph is reliable (i.e. whether it depends essentially on appearance and disappearance of pages and links, or on the crawler behaviour). We perform a number of tests that show that the graph is actually reliable, and provide the first public data on the evolution of the Web that use a large scale and a significant diversity in the sites considered.

最近，一个新的时间数据集已经公开:它是由一系列12个100万页的。uk域名快照组成的。12个快照的Web图已经合并为一个时间感知图，提供对时间信息的恒定时间访问。在本文中，我们首次对该图进行了统计分析，目的是检查图中包含的信息是否可靠(即，它是否主要取决于页面和链接的出现和消失，还是取决于爬虫的行为)。我们进行了大量的测试，证明该图表实际上是可靠的，并提供了有关Web发展的第一批公开数据，这些数据使用了大规模的数据，并且在所考虑的站点中具有显著的多样性。

引用次数: 32

Rules Extraction from Multiple Decisions Ordered Information Tables 从多决策有序信息表中提取规则

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.75

Bin Shen, Min Yao, Zhaohui Wu

Ordered information table is one of the most important research areas of granular computing. In this thesis, we introduce multiple decisions ordered information tables based on the concept of ordered information tables. Multiple decisions ordered information tables are used to describe the actual multiple decision attributes situation of reality. We study the process of rule extraction from multiple decisions ordered information tables thoroughly and several concepts about this process are proposed and discussed. At last, an example of multiple decisions ordered information tables is used to illustrate the basic ideas. These ideas and methods are quite useful for KDD, DM and GC.

有序信息表是颗粒计算的重要研究领域之一。本文基于有序信息表的概念，引入了多决策有序信息表。多决策有序信息表用于描述实际的多决策属性的现实情况。深入研究了从多决策有序信息表中提取规则的过程，提出并讨论了该过程的几个概念。最后，以多决策有序信息表为例说明了基本思想。这些思想和方法对KDD、DM和GC非常有用。

引用次数: 0

Kernels for the Investigation of Localized Spatiotemporal Transitions of Drought with Support Vector Machines 基于支持向量机的干旱局域时空变迁研究

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.71

Matthew W. Collier, A. McGovern

We present and discuss several spatiotemporal kernels designed to mine real-life and simulated data in support of drought prediction. We implement and empirically validate these kernels for support vector machines. Issues related to the nature of geographic data such as autocorrelation and directionality are investigated.

我们提出并讨论了几个时空核，旨在挖掘现实生活和模拟数据，以支持干旱预测。我们在支持向量机上实现并经验验证了这些核。与地理数据的性质有关的问题，如自相关和方向性进行了研究。

引用次数: 3

An Efficient Sequential Pattern Mining Algorithm Based on the 2-Sequence Matrix 一种基于2序列矩阵的高效序列模式挖掘算法

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.82

C. Hsieh, Don-Lin Yang, Jungpin Wu

Sequential pattern mining has become more and more popular in recent years due to its wide applications and the fact that it can find more information than association rules. Two famous algorithms in sequential pattern mining are AprioriAll and PrefixSpan. These two algorithms not only need to scan a database or projected-databases many times, but also require setting a minimal support threshold to prune infrequent data to obtain useful sequential patterns efficiently. In addition, they must rescan the database if new items or sequences are added. In this paper, we propose a novel algorithm called efficient sequential pattern enumeration (ESPE) to solve the above problems. In addition, our method can be applied in many applications, such as for the itemsets appearing at the same time in a sequence. In our experiments, we show that the performance of ESPE is better than the other two methods using various datasets.

序列模式挖掘由于其广泛的应用和比关联规则能发现更多的信息，近年来越来越受到人们的欢迎。序列模式挖掘中两个著名的算法是AprioriAll和PrefixSpan。这两种算法不仅需要多次扫描数据库或投影数据库，而且还需要设置最小支持阈值来修剪不频繁的数据，以有效地获得有用的顺序模式。此外，如果添加了新项目或序列，它们必须重新扫描数据库。为了解决上述问题，本文提出了一种新的算法——高效顺序模式枚举(ESPE)。此外，我们的方法可以应用于许多应用程序，例如在序列中同时出现的项集。在不同的数据集上，我们的实验表明，ESPE的性能优于其他两种方法。

引用次数: 7

Characterizing Network Motifs to Identify Spam Comments 表征网络主题识别垃圾评论

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.72

E. Kamaliha, Fatemeh Riahi, Vahed Qazvinian, Jafar Adibi

Personal blogs are one of the most interconnected and socially networked type of social media. The capability of placing "comments'' on blog posts makes the blogosphere rather a complex environment.In this paper, we study the behavior of bloggers who place comments on others' posts and examine if it is possible to detect spam comments.We look at the functionality of different network motif profiles in the comment network, and identify certain subgraphs that associate with spam comments. We illustrate that some of these patterns and their statistical features could be exploited to classify comments and bloggers to spammers and non-spammers. Our preliminary results are encouraging and show reasonable results on rich and dense blog networks.

个人博客是一种相互联系最紧密的社交网络类型的社交媒体。在博客文章上放置“评论”的功能使博客圈成为一个相当复杂的环境。在本文中，我们研究了博客作者在别人的帖子上发表评论的行为，并检验了是否有可能检测到垃圾评论。我们查看评论网络中不同网络主题配置文件的功能，并识别与垃圾评论相关的某些子图。我们说明了其中一些模式及其统计特征可以用于将评论和博客分类为垃圾邮件发送者和非垃圾邮件发送者。我们的初步结果是令人鼓舞的，并且在丰富和密集的博客网络上显示出合理的结果。

引用次数: 17

Word Sense Discovery for Web Information Retrieval 面向Web信息检索的词义发现

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.10

Tomasz Nykiel, H. Rybinski

Word meaning disambiguation has always been an important problem in many computer science tasks, such as information retrieval and extraction. One of the problems,faced in automatic word sense discovery, is the number of different senses a word can have. Often, senses are dominated by some other, more frequent ones. Discovering such dominated meanings can significantly improve quality of many text-related algorithms. In particular, Web search quality can be leveraged. In the paper, we present a novel approach for discovering word senses. The method is based on concise representations of frequent patterns. The method attempts to discover not only word senses that are dominating, but also senses that are dominated and under represented in the repository.

词义消歧一直是信息检索和提取等计算机科学任务中的一个重要问题。词义自动发现面临的一个问题是，一个词可以有多少不同的意思。通常情况下，感官被一些其他更频繁的感官所支配。发现这种主导意义可以显著提高许多文本相关算法的质量。特别是，可以利用Web搜索质量。在本文中，我们提出了一种发现词义的新方法。该方法基于频繁模式的简洁表示。该方法不仅试图发现占主导地位的词义，还试图发现在存储库中占主导地位和未被充分表示的词义。

引用次数: 9

An Efficient Search Algorithm for Content-Based Image Retrieval with User Feedback 一种基于用户反馈的基于内容的图像检索算法

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.90

A. Leung, P. Auer

We propose a probabilistic model for the relevance feedback of users looking for target images. This model takes into account user errors and user uncertainty about distinguishing similarly relevant images. Based on this model, we have developed an algorithm, which selects images to be presented to the user for further relevance feedback until a satisfactory image is found. In each query session, the algorithm maintains weights on the images in the database which reflect the assumed relevance of the images. Relevance feedback is used to modify these weights. As a second ingredient, the algorithm uses a minimax principle to select images for presentation to the user: any response of the user will provide significant information about his query, such that relatively few feedback rounds are sufficient to find a satisfactory image. We have implemented this algorithm and have conducted experiments on both simulated data and real data which show promising results.

我们提出了一个概率模型，用于用户寻找目标图像的相关反馈。该模型考虑了用户在区分相似相关图像时的错误和不确定性。基于该模型，我们开发了一种算法，该算法选择图像呈现给用户进行进一步的相关性反馈，直到找到满意的图像。在每个查询会话中，算法维护数据库中图像的权重，这些权重反映了图像的假设相关性。相关反馈用于修改这些权重。作为第二个要素，该算法使用极小极大原则来选择呈现给用户的图像:用户的任何响应都将提供有关其查询的重要信息，因此相对较少的反馈轮足以找到令人满意的图像。我们对该算法进行了实现，并在模拟数据和实际数据上进行了实验，取得了良好的效果。

引用次数: 9

A Case Study on Classification Reliability 分类信度的案例研究

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.97

H. Dai

The reliability of an induced classifier can be affected by several factors including the data oriented factors and the algorithm oriented factors. In some cases, the reliability could also be affected by knowledge oriented factors. In this paper, we analyze three special cases to examine the reliability of the discovered knowledge. Our case study results show that (1) in the cases of mining from low quality data, rough classification approach is more reliable than exact approach which in general tolerate to low quality data; (2) Without sufficient large size of the data, the reliability of the discovered knowledge will be decreased accordingly; (3) The reliability of point learning approach could easily be misled by noisy data. It will in most cases generate an unreliable interval and thus affect the reliability of the discovered knowledge. It is also reveals that the inexact field is a good learning strategy that could model the potentials and to improve the discovery reliability.

影响诱导分类器可靠性的因素包括面向数据的因素和面向算法的因素。在某些情况下，可靠性还可能受到知识导向因素的影响。在本文中，我们分析了三种特殊情况来检验发现知识的可靠性。我们的案例研究结果表明:(1)在低质量数据挖掘的情况下，粗略分类方法比精确分类方法更可靠，精确分类方法一般可以容忍低质量数据;(2)如果没有足够大的数据规模，发现的知识的可靠性会相应降低;(3)点学习方法的可靠性容易受到噪声数据的误导。在大多数情况下，它会产生一个不可靠区间，从而影响所发现知识的可靠性。结果表明，不精确场是一种很好的学习策略，它可以对潜在点进行建模，提高发现的可靠性。

引用次数: 5

Innovation Game as Workplace for Sensing Values in Design and Market 创新游戏是感知设计和市场价值的工作场所

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.46

Y. Ohsawa, Y. Maeno, Akihiro Takaichi, Yoko Nishihara

The "value" in this paper can be dealt with as a new variable which business workers create from their interaction with the dynamic environment, on which they redesign products and the market sustainably. Here we first show how data mining and data visualization can provide useful tools for aiding marketerspsila/designerspsila sensitivity of emerging values of consumers/users. By visualizing the data, human can find the relations between existing entities, and create new combination of products via the found relations. Then Innovation Game is introduced as an environment for the communication to elevate userspsila ability to combine existing values of products to create newly valuable products. The players called innovators present combinatorial ideas from prepared basic ideas, and sell the ideas to each other and their stocks to players called investors. As a result, latent opportunities of business are revealed for the market of ideas and designs.

本文中的“价值”可以看作是企业员工在与动态环境的互动中创造的一个新的变量，在这个变量上，企业员工可以持续地重新设计产品和市场。在这里，我们首先展示了数据挖掘和数据可视化如何提供有用的工具，帮助营销人员/设计师对消费者/用户的新兴价值保持敏感。通过数据的可视化，人们可以发现现有实体之间的关系，并通过发现的关系创建新的产品组合。然后引入创新游戏作为沟通环境，提升用户结合产品现有价值创造新价值的能力。被称为创新者的参与者根据准备好的基本想法提出组合想法，并将这些想法相互出售，并将他们的股票出售给被称为投资者的参与者。因此，为创意和设计市场揭示了潜在的商业机会。

引用次数: 8

Post-Processing of Discovered Association Rules Using Ontologies 发现关联规则的本体后处理

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.87

Claudia Marinica, F. Guillet, H. Briand

In Data Mining, the usefulness of association rules is strongly limited by the huge amount of delivered rules. In this paper we propose a new approach to prune and filter discovered rules. Using Domain Ontologies, we strengthen the integration of user knowledge in the post-processing task. Furthermore, an interactive and iterative framework is designed to assist the user along the analyzing task. On the one hand, we represent user domain knowledge using a Domain Ontology over database. On the other hand, a novel technique is suggested to prune and to filter discovered rules. The proposed framework was applied successfully over the client database provided by Nantes Habitat.

在数据挖掘中，关联规则的有用性受到交付的大量规则的强烈限制。本文提出了一种对发现规则进行剪枝和过滤的新方法。利用领域本体，我们在后处理任务中加强了用户知识的集成。此外，设计了一个交互式和迭代的框架来帮助用户完成分析任务。一方面，我们使用基于数据库的领域本体来表示用户领域知识。另一方面，提出了一种新的规则修剪和过滤技术。该框架已成功应用于Nantes Habitat提供的客户端数据库。

引用次数: 36

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2008 IEEE International Conference on Data Mining Workshops

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀