首页 > 最新文献

Sixth International Conference on Data Mining (ICDM'06)最新文献

英文 中文
Diverse Topic Phrase Extraction through Latent Semantic Analysis 基于潜在语义分析的多主题短语提取
Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.61
Jilin Chen, Jun Yan, Benyu Zhang, Qiang Yang, Zheng Chen
We propose a novel algorithm for extracting diverse topic phrases in order to provide summary for large corpora. Previous works often ignore the importance of diversity and thus extract phrases crowded on some hot topics while failing to cover other less obvious but important topics. We solve this problem through document re-weighting and phrase diversification by using latent semantic analysis (LSA). Experiments on various datasets show that our new algorithm can improve relevance as well as diversity over different topics for topic phrase extraction problems.
为了为大型语料库提供摘要,提出了一种新的主题短语提取算法。以往的作品往往忽视了多样性的重要性,在一些热点话题上抽取了拥挤的短语,而没有涵盖其他不太明显但很重要的话题。我们利用潜在语义分析(latent semantic analysis, LSA),通过文档重加权和短语多样化来解决这个问题。在各种数据集上的实验表明,我们的新算法可以提高不同主题的主题短语提取问题的相关性和多样性。
{"title":"Diverse Topic Phrase Extraction through Latent Semantic Analysis","authors":"Jilin Chen, Jun Yan, Benyu Zhang, Qiang Yang, Zheng Chen","doi":"10.1109/ICDM.2006.61","DOIUrl":"https://doi.org/10.1109/ICDM.2006.61","url":null,"abstract":"We propose a novel algorithm for extracting diverse topic phrases in order to provide summary for large corpora. Previous works often ignore the importance of diversity and thus extract phrases crowded on some hot topics while failing to cover other less obvious but important topics. We solve this problem through document re-weighting and phrase diversification by using latent semantic analysis (LSA). Experiments on various datasets show that our new algorithm can improve relevance as well as diversity over different topics for topic phrase extraction problems.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116645719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
bitSPADE: A Lattice-based Sequential Pattern Mining Algorithm Using Bitmap Representation bitSPADE:一种使用位图表示的基于格的顺序模式挖掘算法
Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.28
S. Aseervatham, A. Osmani, E. Viennet
Sequential pattern mining allows to discover temporal relationship between items within a database. The patterns can then be used to generate association rules. When the databases are very large, the execution speed and the memory usage of the mining algorithm become critical parameters. Previous research has focused on either one of the two parameters. In this paper, we present bitSPADE, a novel algorithm that combines the best features of SPAM, one of the fastest algorithm, and SPADE, one of the most memory efficient algorithm. Moreover, we introduce a new pruning strategy that enables bitSPADE to reach high performances. Experimental evaluations showed that bitSPADE ensures an efficient tradeoff between speed and memory usage by outperforming SPADE by both speed and memory usage factors more than 3.4 and SPAM by a memory consumption factor up to more than an order of magnitude.
顺序模式挖掘允许发现数据库中项目之间的时间关系。然后可以使用这些模式来生成关联规则。当数据库规模较大时,挖掘算法的执行速度和内存使用成为关键参数。以前的研究主要集中在这两个参数中的任何一个。在本文中,我们提出了一种新的算法bitSPADE,它结合了最快的算法之一SPAM和内存效率最高的算法之一SPADE的最佳特征。此外,我们引入了一种新的修剪策略,使bitSPADE达到高性能。实验评估表明,bitSPADE确保了速度和内存使用之间的有效权衡,其速度和内存使用系数都超过了SPADE 3.4,内存消耗系数超过了SPAM一个数量级。
{"title":"bitSPADE: A Lattice-based Sequential Pattern Mining Algorithm Using Bitmap Representation","authors":"S. Aseervatham, A. Osmani, E. Viennet","doi":"10.1109/ICDM.2006.28","DOIUrl":"https://doi.org/10.1109/ICDM.2006.28","url":null,"abstract":"Sequential pattern mining allows to discover temporal relationship between items within a database. The patterns can then be used to generate association rules. When the databases are very large, the execution speed and the memory usage of the mining algorithm become critical parameters. Previous research has focused on either one of the two parameters. In this paper, we present bitSPADE, a novel algorithm that combines the best features of SPAM, one of the fastest algorithm, and SPADE, one of the most memory efficient algorithm. Moreover, we introduce a new pruning strategy that enables bitSPADE to reach high performances. Experimental evaluations showed that bitSPADE ensures an efficient tradeoff between speed and memory usage by outperforming SPADE by both speed and memory usage factors more than 3.4 and SPAM by a memory consumption factor up to more than an order of magnitude.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121847359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Belief Propagation in Large, Highly Connected Graphs for 3D Part-Based Object Recognition 基于三维零件的物体识别中大型高连通图的信念传播
Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.26
F. DiMaio, J. Shavlik
We describe a part-based object-recognition framework, specialized to mining complex 3D objects from detailed 3D images. Objects are modeled as a collection of parts together with a pairwise potential function. An efficient inference algorithm - based on belief propagation (BP) -finds the optimal layout of parts, given some input image. We introduce AggBP, a message aggregation scheme for BP, in which groups of messages are approximated as a single message. For objects consisting of N parts, we reduce CPU time and memory requirements from O(N2) to O(N). We apply AggBP on synthetic data as well as a real-world task identifying protein fragments in three-dimensional images. These experiments show that our improvements result in minimal loss in accuracy in significantly less time.
我们描述了一个基于零件的物体识别框架,专门用于从详细的3D图像中挖掘复杂的3D物体。对象被建模为带有成对势函数的部分集合。一种基于信念传播(BP)的高效推理算法在给定输入图像的情况下找到零件的最优布局。介绍了一种基于BP的消息聚合方案AggBP,该方案将多组消息近似为单个消息。对于由N个部件组成的对象,我们将CPU时间和内存需求从0 (N2)降低到O(N)。我们将AggBP应用于合成数据以及在三维图像中识别蛋白质片段的实际任务。这些实验表明,我们的改进在更短的时间内实现了最小的精度损失。
{"title":"Belief Propagation in Large, Highly Connected Graphs for 3D Part-Based Object Recognition","authors":"F. DiMaio, J. Shavlik","doi":"10.1109/ICDM.2006.26","DOIUrl":"https://doi.org/10.1109/ICDM.2006.26","url":null,"abstract":"We describe a part-based object-recognition framework, specialized to mining complex 3D objects from detailed 3D images. Objects are modeled as a collection of parts together with a pairwise potential function. An efficient inference algorithm - based on belief propagation (BP) -finds the optimal layout of parts, given some input image. We introduce AggBP, a message aggregation scheme for BP, in which groups of messages are approximated as a single message. For objects consisting of N parts, we reduce CPU time and memory requirements from O(N2) to O(N). We apply AggBP on synthetic data as well as a real-world task identifying protein fragments in three-dimensional images. These experiments show that our improvements result in minimal loss in accuracy in significantly less time.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124130184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Getting the Most Out of Ensemble Selection 最大限度地利用合奏选择
Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.76
R. Caruana, Art Munson, Alexandru Niculescu-Mizil
We investigate four previously unexplored aspects of ensemble selection, a procedure for building ensembles of classifiers. First we test whether adjusting model predictions to put them on a canonical scale makes the ensembles more effective. Second, we explore the performance of ensemble selection when different amounts of data are available for ensemble hillclimbing. Third, we quantify the benefit of ensemble selection's ability to optimize to arbitrary metrics. Fourth, we study the performance impact of pruning the number of models available for ensemble selection. Based on our results we present improved ensemble selection methods that double the benefit of the original method.
我们研究了先前未探索的集成选择的四个方面,这是一个构建分类器集成的过程。首先,我们测试调整模型预测以使它们处于标准尺度上是否会使集成更有效。其次,我们探讨了不同数据量的集成爬坡时集成选择的性能。第三,我们量化了集成选择优化到任意指标的能力的好处。第四,我们研究了裁剪可用于集成选择的模型数量对性能的影响。基于我们的研究结果,我们提出了改进的集成选择方法,使原始方法的效益翻了一番。
{"title":"Getting the Most Out of Ensemble Selection","authors":"R. Caruana, Art Munson, Alexandru Niculescu-Mizil","doi":"10.1109/ICDM.2006.76","DOIUrl":"https://doi.org/10.1109/ICDM.2006.76","url":null,"abstract":"We investigate four previously unexplored aspects of ensemble selection, a procedure for building ensembles of classifiers. First we test whether adjusting model predictions to put them on a canonical scale makes the ensembles more effective. Second, we explore the performance of ensemble selection when different amounts of data are available for ensemble hillclimbing. Third, we quantify the benefit of ensemble selection's ability to optimize to arbitrary metrics. Fourth, we study the performance impact of pruning the number of models available for ensemble selection. Based on our results we present improved ensemble selection methods that double the benefit of the original method.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122377498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 147
Object Identification with Constraints 带约束的对象识别
Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.117
Steffen Rendle, L. Schmidt-Thieme
Object identification aims at identifying different representations of the same object based on noisy attributes such as descriptions of the same product in different online shops or references to the same paper in different publications. Numerous solutions have been proposed for solving this task, almost all of them based on similarity functions of a pair of objects. Although today the similarity functions are learned from a set of labeled training data, the structural information given by the labeled data is not used. By formulating a generic model for object identification we show how almost any proposed identification model can easily be extended for satisfying structural constraints. Therefore we propose a model that uses structural information given as pairwise constraints to guide collective decisions about object identification in addition to a learned similarity measure. We show with empirical experiments on public and on real-life data that combining both structural information and attribute-based similarity enormously increases the overall performance for object identification tasks.
对象识别旨在基于噪声属性识别同一对象的不同表示,例如在不同的在线商店中对同一产品的描述或在不同的出版物中对同一论文的引用。为了解决这个问题,已经提出了许多解决方案,几乎所有的解决方案都是基于一对对象的相似函数。虽然目前的相似度函数是从一组标记的训练数据中学习的,但没有使用标记数据给出的结构信息。通过制定一个对象识别的通用模型,我们展示了几乎任何提出的识别模型都可以很容易地扩展以满足结构约束。因此,我们提出了一个模型,该模型使用给定的结构信息作为两两约束来指导关于对象识别的集体决策,以及学习的相似性度量。我们通过公开和现实数据的经验实验表明,结合结构信息和基于属性的相似性极大地提高了对象识别任务的整体性能。
{"title":"Object Identification with Constraints","authors":"Steffen Rendle, L. Schmidt-Thieme","doi":"10.1109/ICDM.2006.117","DOIUrl":"https://doi.org/10.1109/ICDM.2006.117","url":null,"abstract":"Object identification aims at identifying different representations of the same object based on noisy attributes such as descriptions of the same product in different online shops or references to the same paper in different publications. Numerous solutions have been proposed for solving this task, almost all of them based on similarity functions of a pair of objects. Although today the similarity functions are learned from a set of labeled training data, the structural information given by the labeled data is not used. By formulating a generic model for object identification we show how almost any proposed identification model can easily be extended for satisfying structural constraints. Therefore we propose a model that uses structural information given as pairwise constraints to guide collective decisions about object identification in addition to a learned similarity measure. We show with empirical experiments on public and on real-life data that combining both structural information and attribute-based similarity enormously increases the overall performance for object identification tasks.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129233534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 28
LOCI: Load Shedding through Class-Preserving Data Acquisition LOCI:通过保持类的数据采集来减少负载
Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.100
Peng Wang, Haixun Wang, Wei Wang, Baile Shi, Philip S. Yu
An avalanche of data available in the stream form is overstretching our data analyzing ability. In this paper, we propose a novel load shedding method that enables fast and accurate stream data classification. We transform input data so that its class information concentrates on a few features, and we introduce a progressive classifier that makes prediction with partial input. We take advantage of stream data's temporal locality -for example, readings from a temperature sensor usually do not change dramatically over a short period of time -for load shedding. We first show that temporal locality of the original data is preserved by our transform, then we utilize positive and negative knowledge about the data (which is of much smaller size than the data itself) for classification. We employ both analytical and empirical analysis to demonstrate the advantage of our approach.
以流形式提供的大量数据超出了我们的数据分析能力。在本文中,我们提出了一种新的减载方法,可以实现快速准确的流数据分类。我们对输入数据进行了转换,使其类信息集中在几个特征上,并引入了一个渐进式分类器,该分类器使用部分输入进行预测。我们利用流数据的时间局域性(例如,温度传感器的读数通常在短时间内不会发生显着变化)来减少负载。我们首先表明,我们的变换保留了原始数据的时间局部性,然后我们利用关于数据的正知识和负知识(比数据本身小得多)进行分类。我们采用分析和实证分析来证明我们的方法的优势。
{"title":"LOCI: Load Shedding through Class-Preserving Data Acquisition","authors":"Peng Wang, Haixun Wang, Wei Wang, Baile Shi, Philip S. Yu","doi":"10.1109/ICDM.2006.100","DOIUrl":"https://doi.org/10.1109/ICDM.2006.100","url":null,"abstract":"An avalanche of data available in the stream form is overstretching our data analyzing ability. In this paper, we propose a novel load shedding method that enables fast and accurate stream data classification. We transform input data so that its class information concentrates on a few features, and we introduce a progressive classifier that makes prediction with partial input. We take advantage of stream data's temporal locality -for example, readings from a temperature sensor usually do not change dramatically over a short period of time -for load shedding. We first show that temporal locality of the original data is preserved by our transform, then we utilize positive and negative knowledge about the data (which is of much smaller size than the data itself) for classification. We employ both analytical and empirical analysis to demonstrate the advantage of our approach.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128656062","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Turning Clusters into Patterns: Rectangle-Based Discriminative Data Description 将聚类转化为模式:基于矩形的判别数据描述
Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.163
Byron J. Gao, M. Ester
The ultimate goal of data mining is to extract knowledge from massive data. Knowledge is ideally represented as human-comprehensible patterns from which end-users can gain intuitions and insights. Yet not all data mining methods produce such readily understandable knowledge, e.g., most clustering algorithms output sets of points as clusters. In this paper, we perform a systematic study of cluster description that generates interpretable patterns from clusters. We introduce and analyze novel description formats leading to more expressive power, motivate and define novel description problems specifying different trade-offs between interpretability and accuracy. We also present effective heuristic algorithms together with their empirical evaluations.
数据挖掘的最终目标是从海量数据中提取知识。知识理想地表示为人类可理解的模式,最终用户可以从中获得直觉和见解。然而,并不是所有的数据挖掘方法都能产生这种容易理解的知识,例如,大多数聚类算法输出点集作为聚类。在本文中,我们系统地研究了从集群中生成可解释模式的集群描述。我们介绍和分析新颖的描述格式,从而提高表达能力,激发和定义新颖的描述问题,指定可解释性和准确性之间的不同权衡。我们还提出了有效的启发式算法及其经验评估。
{"title":"Turning Clusters into Patterns: Rectangle-Based Discriminative Data Description","authors":"Byron J. Gao, M. Ester","doi":"10.1109/ICDM.2006.163","DOIUrl":"https://doi.org/10.1109/ICDM.2006.163","url":null,"abstract":"The ultimate goal of data mining is to extract knowledge from massive data. Knowledge is ideally represented as human-comprehensible patterns from which end-users can gain intuitions and insights. Yet not all data mining methods produce such readily understandable knowledge, e.g., most clustering algorithms output sets of points as clusters. In this paper, we perform a systematic study of cluster description that generates interpretable patterns from clusters. We introduce and analyze novel description formats leading to more expressive power, motivate and define novel description problems specifying different trade-offs between interpretability and accuracy. We also present effective heuristic algorithms together with their empirical evaluations.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128742619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 19
Semantic Smoothing for Model-based Document Clustering 基于模型的文档聚类的语义平滑
Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.142
Xiaodan Zhang, Xiaohua Zhou, Xiaohua Hu
A document is often full of class-independent "general" words and short of class-specific "core " words, which leads to the difficulty of document clustering. We argue that both problems will be relieved after suitable smoothing of document models in agglomerative approaches and of cluster models in partitional approaches, and hence improve clustering quality. To the best of our knowledge, most model-based clustering approaches use Laplacian smoothing to prevent zero probability while most similarity-based approaches employ the heuristic TF*IDF scheme to discount the effect of "general" words. Inspired by a series of statistical translation language model for text retrieval, we propose in this paper a novel smoothing method referred to as context-sensitive semantic smoothing for document clustering purpose. The comparative experiment on three datasets shows that model-based clustering approaches with semantic smoothing is effective in improving cluster quality.
文档中往往充斥着与类无关的“一般”词,而缺乏特定于类的“核心”词,这就给文档聚类带来了困难。我们认为,在聚类方法中对文档模型进行适当的平滑处理,在分割方法中对聚类模型进行适当的平滑处理,可以缓解这两个问题,从而提高聚类质量。据我们所知,大多数基于模型的聚类方法使用拉普拉斯平滑来防止零概率,而大多数基于相似性的方法使用启发式TF*IDF方案来消除“一般”单词的影响。受一系列用于文本检索的统计翻译语言模型的启发,本文提出了一种新的平滑方法——上下文敏感语义平滑。在三个数据集上的对比实验表明,基于模型的语义平滑聚类方法可以有效地提高聚类质量。
{"title":"Semantic Smoothing for Model-based Document Clustering","authors":"Xiaodan Zhang, Xiaohua Zhou, Xiaohua Hu","doi":"10.1109/ICDM.2006.142","DOIUrl":"https://doi.org/10.1109/ICDM.2006.142","url":null,"abstract":"A document is often full of class-independent \"general\" words and short of class-specific \"core \" words, which leads to the difficulty of document clustering. We argue that both problems will be relieved after suitable smoothing of document models in agglomerative approaches and of cluster models in partitional approaches, and hence improve clustering quality. To the best of our knowledge, most model-based clustering approaches use Laplacian smoothing to prevent zero probability while most similarity-based approaches employ the heuristic TF*IDF scheme to discount the effect of \"general\" words. Inspired by a series of statistical translation language model for text retrieval, we propose in this paper a novel smoothing method referred to as context-sensitive semantic smoothing for document clustering purpose. The comparative experiment on three datasets shows that model-based clustering approaches with semantic smoothing is effective in improving cluster quality.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"373 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129081431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 26
Adaptive Kernel Principal Component Analysis with Unsupervised Learning of Kernels 基于核无监督学习的自适应核主成分分析
Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.14
Daoqiang Zhang, Zhi-Hua Zhou, Songcan Chen
Choosing an appropriate kernel is one of the key problems in kernel-based methods. Most existing kernel selection methods require that the class labels of the training examples are known. In this paper, we propose an adaptive kernel selection method for kernel principal component analysis, which can effectively learn the kernels when the class labels of the training examples are not available. By iteratively optimizing a novel criterion, the proposed method can achieve nonlinear feature extraction and unsupervised kernel learning simultaneously. Moreover, a non-iterative approximate algorithm is developed. The effectiveness of the proposed algorithms are validated on UCI datasets and the COIL-20 object recognition database.
选择合适的核是基于核方法的关键问题之一。大多数现有的核选择方法要求训练样本的类标签是已知的。本文提出了一种核主成分分析的自适应核选择方法,该方法可以在训练样本的类标签不可用的情况下有效地学习核。通过迭代优化新准则,该方法可以同时实现非线性特征提取和无监督核学习。此外,还提出了一种非迭代近似算法。在UCI数据集和COIL-20目标识别数据库上验证了算法的有效性。
{"title":"Adaptive Kernel Principal Component Analysis with Unsupervised Learning of Kernels","authors":"Daoqiang Zhang, Zhi-Hua Zhou, Songcan Chen","doi":"10.1109/ICDM.2006.14","DOIUrl":"https://doi.org/10.1109/ICDM.2006.14","url":null,"abstract":"Choosing an appropriate kernel is one of the key problems in kernel-based methods. Most existing kernel selection methods require that the class labels of the training examples are known. In this paper, we propose an adaptive kernel selection method for kernel principal component analysis, which can effectively learn the kernels when the class labels of the training examples are not available. By iteratively optimizing a novel criterion, the proposed method can achieve nonlinear feature extraction and unsupervised kernel learning simultaneously. Moreover, a non-iterative approximate algorithm is developed. The effectiveness of the proposed algorithms are validated on UCI datasets and the COIL-20 object recognition database.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116845047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
How Bayesians Debug 贝叶斯算法是如何调试的
Pub Date : 2006-12-18 DOI: 10.1109/ICDM.2006.83
Chao Liu, Zeng Lian, Jiawei Han
Manual debugging is expensive. And the high cost has motivated extensive research on automated fault localization in both software engineering and data mining communities. Fault localization aims at automatically locating likely fault locations, and hence assists manual debugging. A number of fault localization algorithms have been developed in recent years, which prove effective when multiple failing and passing cases are available. However, we notice what is more commonly encountered in practice is the two-sample debugging problem, where only one failing and one passing cases are available. This problem has been either overlooked or insufficiently tackled in previous studies. In this paper, we develop a new fault localization algorithm, named BayesDebug, which simulates some manual debugging principles through a Bayesian approach. Different from existing approaches that base fault analysis on multiple passing and failing cases, BayesDebug only requires one passing and one failing cases. We reason about why BayesDebug fits the two- sample debugging problem and why other approaches do not. Finally, an experiment with a real-world program grep-2.2 is conducted, which exemplifies the effectiveness of BayesDebug.
手动调试是昂贵的。而高成本也促使了软件工程和数据挖掘界对自动故障定位的广泛研究。故障定位旨在自动定位可能的故障位置,从而辅助人工调试。近年来发展了许多故障定位算法,这些算法在存在多个故障和通过的情况下是有效的。然而,我们注意到在实践中更常见的是双样本调试问题,其中只有一个失败和一个通过的情况可用。这一问题在以往的研究中要么被忽视,要么没有得到充分的解决。在本文中,我们开发了一种新的故障定位算法BayesDebug,它通过贝叶斯方法模拟了一些人工调试原理。与现有的基于多个通过和失败案例的故障分析方法不同,BayesDebug只需要一个通过和一个失败案例。我们解释了为什么BayesDebug适合双样本调试问题,而其他方法不适合的原因。最后,对真实世界的grep-2.2程序进行了实验,验证了BayesDebug的有效性。
{"title":"How Bayesians Debug","authors":"Chao Liu, Zeng Lian, Jiawei Han","doi":"10.1109/ICDM.2006.83","DOIUrl":"https://doi.org/10.1109/ICDM.2006.83","url":null,"abstract":"Manual debugging is expensive. And the high cost has motivated extensive research on automated fault localization in both software engineering and data mining communities. Fault localization aims at automatically locating likely fault locations, and hence assists manual debugging. A number of fault localization algorithms have been developed in recent years, which prove effective when multiple failing and passing cases are available. However, we notice what is more commonly encountered in practice is the two-sample debugging problem, where only one failing and one passing cases are available. This problem has been either overlooked or insufficiently tackled in previous studies. In this paper, we develop a new fault localization algorithm, named BayesDebug, which simulates some manual debugging principles through a Bayesian approach. Different from existing approaches that base fault analysis on multiple passing and failing cases, BayesDebug only requires one passing and one failing cases. We reason about why BayesDebug fits the two- sample debugging problem and why other approaches do not. Finally, an experiment with a real-world program grep-2.2 is conducted, which exemplifies the effectiveness of BayesDebug.","PeriodicalId":356443,"journal":{"name":"Sixth International Conference on Data Mining (ICDM'06)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2006-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115666617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 29
期刊
Sixth International Conference on Data Mining (ICDM'06)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1