2008 IEEE International Conference on Data Mining Workshops最新文献

英文中文

GRAPHITE: A Visual Query System for Large Graphs 一个大型图形的可视化查询系统

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.99

Duen Horng Chau, C. Faloutsos, Hanghang Tong, Jason I. Hong, Brian Gallagher, Tina Eliassi-Rad

We present Graphite, a system that allows the user to visually construct a query pattern, finds both its exact and approximate matching subgraphs in large attributed graphs, and visualizes the matches. For example, in a social network where a person's occupation is an attribute, the user can draw a 'star' query for "finding a CEO who has interacted with a Secretary, a Manager, and an Accountant, or a structure very similar to this". Graphite uses the G-Ray algorithm to run the query against a user-chosen data graph, gaining all of its benefits, namely its high speed, scalability, and its ability to find both exact and near matches. Therefore, for the example above, Graphite tolerates indirect paths between, say, the CEO and the Accountant, when no direct path exists. Graphite uses fast algorithms to estimate node proximities when finding matches, enabling it to scale well with the graph database size.We demonstrate Graphitepsilas usage and benefits using the DBLP author-publication graph, which consists of 356 K nodes and 1.9 M edges. A demo video of Graphite can be downloaded at http://www.cs.cmu.edu/~dchau/graphite/graphite.mov.

我们介绍了Graphite，这个系统允许用户可视化地构建查询模式，在大型属性图中找到精确和近似匹配的子图，并将匹配可视化。例如，在一个社交网络中，一个人的职业是一个属性，用户可以绘制一个“星形”查询，用于“寻找与秘书、经理和会计师或非常类似的结构有过互动的首席执行官”。Graphite使用G-Ray算法对用户选择的数据图运行查询，从而获得了它的所有优点，即高速、可伸缩性以及找到精确匹配和接近匹配的能力。因此，对于上面的示例，当不存在直接路径时，石墨可以容忍在CEO和会计之间的间接路径。在寻找匹配时，石墨使用快速算法来估计节点的接近度，使其能够很好地随图数据库的大小进行扩展。我们使用DBLP作者-出版图(由356k个节点和1.9 M条边组成)来演示Graphitepsilas的使用和好处。Graphite的演示视频可以从http://www.cs.cmu.edu/~dchau/graphite/graphite.mov下载。

{"title":"GRAPHITE: A Visual Query System for Large Graphs","authors":"Duen Horng Chau, C. Faloutsos, Hanghang Tong, Jason I. Hong, Brian Gallagher, Tina Eliassi-Rad","doi":"10.1109/ICDMW.2008.99","DOIUrl":"https://doi.org/10.1109/ICDMW.2008.99","url":null,"abstract":"We present Graphite, a system that allows the user to visually construct a query pattern, finds both its exact and approximate matching subgraphs in large attributed graphs, and visualizes the matches. For example, in a social network where a person's occupation is an attribute, the user can draw a 'star' query for \"finding a CEO who has interacted with a Secretary, a Manager, and an Accountant, or a structure very similar to this\". Graphite uses the G-Ray algorithm to run the query against a user-chosen data graph, gaining all of its benefits, namely its high speed, scalability, and its ability to find both exact and near matches. Therefore, for the example above, Graphite tolerates indirect paths between, say, the CEO and the Accountant, when no direct path exists. Graphite uses fast algorithms to estimate node proximities when finding matches, enabling it to scale well with the graph database size.We demonstrate Graphitepsilas usage and benefits using the DBLP author-publication graph, which consists of 356 K nodes and 1.9 M edges. A demo video of Graphite can be downloaded at http://www.cs.cmu.edu/~dchau/graphite/graphite.mov.","PeriodicalId":175955,"journal":{"name":"2008 IEEE International Conference on Data Mining Workshops","volume":"112 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125279680","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 54

A New Method for Multi-view Face Clustering in Video Sequence 视频序列中多视图人脸聚类的新方法

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.63

Panpan Huang, Yunhong Wang, Ming Shao

In the problem of face clustering with multi-views, the similarity between faces of different persons with similar pose is usually greater than the similarity between multi-view faces of the same person. This may exert a tremendous impact on the clustering result that sent back to the user. To solve this problem, we should do pose clustering first and then within each dasiapose grouppsila, clustering images of different individuals. Gabor filters have been used to detect the eyes in the face image. The coordinate of the eyes have been extracted as an input feature for the dasiapose clusteringpsila. After doing this, images of the similar pose will be in the same cluster. PCA/ LBP and kmeans algorithms have been used in each pose cluster for clustering of different individuals. The precision of face classification with clustering is enhanced. The proposed clustering algorithms can be applied to and face indexing or face recognition system.

在多视图人脸聚类问题中，具有相似姿态的不同人的人脸之间的相似性通常大于同一人的多视图人脸之间的相似性。这可能会对返回给用户的集群结果产生巨大的影响。为了解决这一问题，我们应该首先进行姿态聚类，然后在每个数据库组内对不同个体的图像进行聚类。Gabor滤波器被用来检测人脸图像中的眼睛。眼睛的坐标作为输入特征被提取出来，用于聚类算法。这样做之后，相似姿势的图像将在同一个集群中。在每个姿态聚类中分别使用PCA/ LBP和kmeans算法对不同个体进行聚类。利用聚类方法提高了人脸分类的精度。本文提出的聚类算法可应用于人脸索引或人脸识别系统。

引用次数: 16

Ontology-Based Protein-Protein Interactions Extraction from Literature Using the Hidden Vector State Model 基于本体的蛋白质-蛋白质相互作用的文献隐藏向量状态模型提取

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.11

Yulan He, K. Nakata, Deyu Zhou

This paper proposes a novel framework of incorporating protein-protein interactions (PPI) ontology knowledge into PPI extraction from biomedical literature in order to address the emerging challenges of deep natural language understanding. It is built upon the existing work on relation extraction using the hidden vector state (HVS) model. The HVS model belongs to the category of statistical learning methods. It can be trained directly from un-annotated data in a constrained way whilst at the same time being able to capture the underlying named entity relationships. However, it is difficult to incorporate background knowledge or non-local information into the HVS model. This paper proposes to represent the HVS model as a conditionally trained undirected graphical model in which non-local features derived from PPI ontology through inference would be easily incorporated. The seamless fusion of ontology inference with statistical learning produces a new paradigm to information extraction.

本文提出了一种将蛋白质-蛋白质相互作用(PPI)本体知识纳入生物医学文献中PPI提取的新框架，以解决深度自然语言理解的新挑战。它是建立在使用隐藏向量状态(HVS)模型的现有关系提取工作的基础上的。HVS模型属于统计学习方法的范畴。它可以以一种受限的方式直接从未注释的数据中进行训练，同时能够捕获底层的命名实体关系。然而，很难将背景知识或非局部信息纳入HVS模型。本文提出将HVS模型表示为一个有条件训练的无向图形模型，该模型可以很容易地纳入通过推理从PPI本体中获得的非局部特征。本体推理与统计学习的无缝融合为信息抽取提供了一种新的范式。

引用次数: 3

Text Knowledge Mining: An Alternative to Text Data Mining 文本知识挖掘:文本数据挖掘的替代方案

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.57

D. Sánchez, M. Martín-Bautista, Ignacio J. Blanco, Consuelo Justicia de la Torre

In this paper we introduced an alternative view of text mining and we review several alternative views proposed by different authors. We propose a classification of text mining techniques into two main groups: techniques based on inductive inference, that we call text data mining (TDM, comprising most of the existing proposals in the literature), and techniques based on deductive or abductive inference, that we call text knowledge mining (TKM). To our knowledge, the TKM view of text mining is new though, as we shall show, several existing techniques could be considered in this group. We discuss about the possibilities and challenges of TKM techniques. We also discuss about the application of existing theories in possible future research in this field.

本文介绍了文本挖掘的另一种观点，并回顾了不同作者提出的几种替代观点。我们将文本挖掘技术分为两大类:基于归纳推理的技术，我们称之为文本数据挖掘(TDM，包括文献中大多数现有的建议)，以及基于演绎或溯因推理的技术，我们称之为文本知识挖掘(TKM)。据我们所知，文本挖掘的TKM观点是新的，正如我们将展示的，在这个组中可以考虑几种现有的技术。我们讨论了TKM技术的可能性和挑战。讨论了现有理论在未来研究中的应用。

引用次数: 46

Statistical Independence and Contingency Matrix 统计独立性与权变矩阵

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.94

S. Tsumoto, S. Hirano

This paper shows the meaning of Pearson residuals as an indicator of statistical independence. While information granules of statistical independence of two variables can be viewed as determinants of 2times2-submatrices, those of three variables consist of several combinations of linear equations which will become residuals for odds ratio (outer products) when they are equal to 0. Interestingly, the residuals can be an expansion series of the product of marginal distributions and the residuals for odds ratio (outer products).

本文说明了皮尔逊残差作为统计独立性指标的意义。两个变量的统计独立性信息粒可以看作是2times2子矩阵的行列式，而三个变量的统计独立性信息粒由若干线性方程的组合组成，当它们等于0时，这些组合将成为比值比(外积)的残差。有趣的是，残差可以是边际分布与比值比残差(外积)乘积的展开式级数。

引用次数: 3

A Vector-Geometry Based Spatial kNN-Algorithm for Traffic Frequency Predictions 基于矢量几何的交通频率预测空间knn算法

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.35

M. May, D. Hecker, Christine Kopp, S. Scheider, Daniel Schulz

We introduce s-kNN, a nearest neighbor based spatial data mining algorithm. It belongs to the class of vector-geometry based algorithms that reason on complex spatial objects instead of point measurements. In contrast to most methods in this class, it does on the fly spatial computations that cannot be replaced by a pre-processing step without sacrificing efficiency. The key is a partial evaluation scheme for efficient computations. The algorithm is fully integrated into an object-relational spatial database. It is the basis for traffic frequency predictions (vehicles and pedestrians) for all German cities larger than 50,000 inhabitants and is the basis for pricing of posters in Germany.

介绍了基于最近邻的空间数据挖掘算法s-kNN。它属于基于矢量几何的一类算法，它对复杂的空间对象进行推理，而不是对点的测量。与此类中的大多数方法相反，它动态地进行空间计算，在不牺牲效率的情况下，这些计算不能被预处理步骤所取代。关键是一种高效计算的部分求值方案。该算法完全集成到对象-关系空间数据库中。它是所有人口超过5万的德国城市交通频率预测(车辆和行人)的基础，也是德国海报定价的基础。

引用次数: 31

Combining Behavioral and Social Network Data for Online Advertising 结合行为和社会网络数据用于在线广告

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.70

A. Bagherjeiran, R. Parekh

There are two main requirements for effective advertising in social networks. The first is that links in the social network are relevant to the targeted ads. The second is that social information can be easily incorporated with existing targeting methods to predict response rates. Our purpose in this paper is to investigate these requirements. We measure the relevance of a social network, the Yahoo! Instant Messenger graph, to classes of ads. We investigate the degree to which social network information complements existing user-profile information for targeting. We find that there is significant evidence in our social network of homophily, that links in the network indicate similar ad-relevant interests. We propose an ensemble classifier to combine existing user-only models with social network features to improve response predictions.

社交网络中有效的广告有两个主要要求。首先，社交网络中的链接与目标广告相关。其次，社交信息可以很容易地与现有的目标定位方法结合起来，以预测响应率。我们在本文中的目的是研究这些需求。我们衡量一个社交网络的相关性，雅虎!我们调查了社交网络信息对现有用户资料信息的补充程度。我们发现，在我们的社交网络中存在显著的同质性，即网络中的链接表明了相似的广告相关兴趣。我们提出了一个集成分类器，将现有的仅用户模型与社交网络特征结合起来，以改进响应预测。

引用次数: 46

Region Classification with Decision Trees 基于决策树的区域分类

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.19

J. V. Prehn, E. Smirnov

The region-classification task is to construct class regions containing the correct classes of the objects being classified with a given probability. To turn a point classifier into a region classifier, the conformal framework is used . However, applying the framework requires a non-conformity function. This function estimates the instances' non-conformity for the point classifier used. This paper studies how to turn decision trees into region classifiers. It considers two non-conformity functions. The first one is a general non-conformity function applicable to any point classifier . The second function is a specific non-conformity function for decision trees . Our main contribution is twofold. First we show, contrary to , that the general function outperforms the specific one for decision-tree region classifiers in terms of validity and efficiency of the class regions. Second, we show how the decision-tree complexity influences the quality of the class regions based on these two functions.

区域分类任务是根据给定的概率构造包含被分类对象的正确类别的类区域。为了将点分类器转化为区域分类器，使用了保角框架。然而，应用框架需要一个非一致性函数。该函数估计所使用的点分类器实例的不一致性。本文研究了如何将决策树转化为区域分类器。它考虑了两个不符合函数。第一个是适用于任何点分类器的一般不符合函数。第二个函数是决策树的特定不合格函数。我们的主要贡献是双重的。首先，我们表明，相反，在类区域的有效性和效率方面，一般函数优于决策树区域分类器的特定函数。其次，基于这两个函数，我们展示了决策树复杂性如何影响类区域的质量。

引用次数: 1

A Study on the Reliability of Case-Based Reasoning Systems 基于案例的推理系统可靠性研究

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.33

Ke Wang, J. Liu, Weimin Ma

Case-based reasoning (CBR) is a methodology for problem solving, which suggests a solution to a new problem based on the previously-solved problems and their associated solutions. A key issue in this methodology is that can we always trust the solutions suggested by a case-based reasoning system? This paper studies the reliability of CBR systems at an overall level first. Factors affecting the reliability of a CBR system are discussed in this section, especially the property that whether its case library is compatible with the foundational assumption that "similar problems have similar solutions." After that, the reliability of an individual suggested solution is studied. Some existing approaches which can be employed to estimate the reliability of a single solution are compared in this section. To illustrate these ideas, some experiments and their results are also discussed in this paper. It is shown that if a case library attains a high compatibility, then a satisfactory result can be expected, and the reliability of a CBR system at an overall level can be improved by identifying the reliable solutions.

基于案例的推理(Case-based reasoning, CBR)是一种解决问题的方法，它基于先前解决的问题及其关联的解决方案提出新问题的解决方案。这种方法中的一个关键问题是，我们能否始终信任基于案例的推理系统提出的解决方案?本文首先从整体层面对CBR系统的可靠性进行了研究。本节将讨论影响CBR系统可靠性的因素，特别是其案例库是否符合“类似的问题有类似的解决方案”的基本假设。然后，研究了个体建议方案的可靠性。本节将比较一些现有的可用于估计单个解的可靠性的方法。为了说明这些观点，本文还讨论了一些实验和结果。研究表明，如果案例库具有较高的兼容性，则可以获得满意的结果，并且可以通过确定可靠的解决方案来提高CBR系统在整体层面的可靠性。

引用次数: 2

Mining Correlated Pairs of Patterns in Multidimensional Structured Databases 多维结构化数据库中关联模式对的挖掘

2008 IEEE International Conference on Data Mining Workshops

Pub Date : 2008-12-15 DOI: 10.1109/ICDMW.2008.25

Tomonobu Ozaki, T. Ohkawa

Structured data is becoming increasingly abundant in many application domains recently. In this paper, as one of the correlation mining, we propose new data mining problems of finding frequent and correlated pairs of patterns in structured databases. First, we consider the problem of finding all frequent and correlated pattern pairs in two dimensional structured databases. Then, two kinds of top-k mining problems are studied. To solve these problems efficiently, we develop a series of algorithms having powerful pruning capabilities. We also discuss the applicability of the proposed algorithms to the discovery of pattern pairs in single and multidimensional structured databases. The effectiveness of proposed algorithms is assessed through the experiments with synthetic and real world datasets.

近年来，结构化数据在许多应用领域变得越来越丰富。作为关联挖掘的一种，本文提出了在结构化数据库中发现频繁且相关的模式对的新数据挖掘问题。首先，我们考虑了在二维结构化数据库中找到所有频繁和相关模式对的问题。然后，研究了两类top-k挖掘问题。为了有效地解决这些问题，我们开发了一系列具有强大修剪能力的算法。我们还讨论了所提出的算法在单个和多维结构化数据库中发现模式对的适用性。通过合成数据集和真实世界数据集的实验，评估了所提出算法的有效性。

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

2008 IEEE International Conference on Data Mining Workshops

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀