Proceedings of the 23rd international conference on Machine learning最新文献

英文中文

Graph model selection using maximum likelihood 图模型选择使用最大似然

Proceedings of the 23rd international conference on Machine learning

Pub Date : 2006-06-25 DOI: 10.1145/1143844.1143858

Ivona Bezáková, A. Kalai, R. Santhanam

In recent years, there has been a proliferation of theoretical graph models, e.g., preferential attachment and small-world models, motivated by real-world graphs such as the Internet topology. To address the natural question of which model is best for a particular data set, we propose a model selection criterion for graph models. Since each model is in fact a probability distribution over graphs, we suggest using Maximum Likelihood to compare graph models and select their parameters. Interestingly, for the case of graph models, computing likelihoods is a difficult algorithmic task. However, we design and implement MCMC algorithms for computing the maximum likelihood for four popular models: a power-law random graph model, a preferential attachment model, a small-world model, and a uniform random graph model. We hope that this novel use of ML will objectify comparisons between graph models.

近年来，受Internet拓扑等现实世界图的启发，出现了大量理论图模型，如优先依恋模型和小世界模型。为了解决哪个模型最适合特定数据集的自然问题，我们为图模型提出了一个模型选择标准。由于每个模型实际上是图上的概率分布，我们建议使用最大似然来比较图模型并选择它们的参数。有趣的是，对于图模型来说，计算可能性是一项困难的算法任务。然而，我们设计并实现了MCMC算法，用于计算四种流行模型的最大似然:幂律随机图模型、优先依恋模型、小世界模型和均匀随机图模型。我们希望这种机器学习的新应用将客观化图模型之间的比较。

引用次数: 41

A note on mixtures of experts for multiclass responses: approximation rate and Consistent Bayesian Inference 多类响应的混合专家注释:近似率和一致贝叶斯推断

Proceedings of the 23rd international conference on Machine learning

Pub Date : 2006-06-25 DOI: 10.1145/1143844.1143886

Yang Ge, Wenxin Jiang

We report that mixtures of m multinomial logistic regression can be used to approximate a class of 'smooth' probability models for multiclass responses. With bounded second derivatives of log-odds, the approximation rate is O(m-2/s) in Hellinger distance or O(m-4/s) in Kullback-Leibler divergence. Here s = dim(x) is the dimension of the input space (or the number of predictors). With the availability of training data of size n, we also show that 'consistency' in multiclass regression and classification can be achieved, simultaneously for all classes, when posterior based inference is performed in a Bayesian framework. Loosely speaking, such 'consistency' refers to performance being often close to the best possible for large n. Consistency can be achieved either by taking m = mn, or by taking m to be uniformly distributed among {1, ...,mn} according to the prior, where 1 ≺ mn ≺ na in order as n grows, for some a ∈ (0, 1).

我们报告了m多项逻辑回归的混合物可以用来近似一类多类响应的“光滑”概率模型。在log-odds二阶导数有界的情况下，Hellinger距离近似速率为O(m-2/s)， Kullback-Leibler散度近似速率为O(m-4/s)。这里s = dim(x)是输入空间的维度(或预测器的数量)。随着大小为n的训练数据的可用性，我们还表明，当在贝叶斯框架中执行后验推理时，可以同时实现多类回归和分类的“一致性”。宽泛地说，这种“一致性”指的是在大n时，性能往往接近最佳。一致性可以通过取m = mn，或取m均匀分布在{1，…，mn}根据先验，其中，对于某a∈(0,1)，1 mn na按n的增长顺序排列。

引用次数: 4

Fast transpose methods for kernel learning on sparse data 稀疏数据核学习的快速转置方法

Proceedings of the 23rd international conference on Machine learning

Pub Date : 2006-06-25 DOI: 10.1145/1143844.1143893

P. Haffner

Kernel-based learning algorithms, such as Support Vector Machines (SVMs) or Perceptron, often rely on sequential optimization where a few examples are added at each iteration. Updating the kernel matrix usually requires matrix-vector multiplications. We propose a new method based on transposition to speedup this computation on sparse data. Instead of dot-products over sparse feature vectors, our computation incrementally merges lists of training examples and minimizes access to the data. Caching and shrinking are also optimized for sparsity. On very large natural language tasks (tagging, translation, text classification) with sparse feature representations, a 20 to 80-fold speedup over LIBSVM is observed using the same SMO algorithm. Theory and experiments explain what type of sparsity structure is needed for this approach to work, and why its adaptation to Maxent sequential optimization is inefficient.

基于核的学习算法，如支持向量机(svm)或感知机(Perceptron)，通常依赖于顺序优化，在每次迭代中添加一些示例。更新核矩阵通常需要矩阵-向量乘法。我们提出了一种基于换位的新方法来加快稀疏数据的计算速度。我们的计算不是稀疏特征向量上的点积，而是增量地合并训练示例列表，并最大限度地减少对数据的访问。缓存和收缩也针对稀疏性进行了优化。在具有稀疏特征表示的非常大的自然语言任务(标记、翻译、文本分类)上，使用相同的SMO算法可以观察到比LIBSVM提高20到80倍的速度。理论和实验解释了这种方法需要什么类型的稀疏结构才能工作，以及为什么它对Maxent顺序优化的适应是低效的。

引用次数: 7

Online multiclass learning by interclass hypothesis sharing 基于班级间假设共享的在线多班级学习

Proceedings of the 23rd international conference on Machine learning

Pub Date : 2006-06-25 DOI: 10.1145/1143844.1143884

Michael Fink, S. Shalev-Shwartz, Y. Singer, S. Ullman

We describe a general framework for online multiclass learning based on the notion of hypothesis sharing. In our framework sets of classes are associated with hypotheses. Thus, all classes within a given set share the same hypothesis. This framework includes as special cases commonly used constructions for multiclass categorization such as allocating a unique hypothesis for each class and allocating a single common hypothesis for all classes. We generalize the multiclass Perceptron to our framework and derive a unifying mistake bound analysis. Our construction naturally extends to settings where the number of classes is not known in advance but, rather, is revealed along the online learning process. We demonstrate the merits of our approach by comparing it to previous methods on both synthetic and natural datasets.

我们描述了一个基于假设共享概念的在线多班学习的一般框架。在我们的框架中，类的集合与假设相关联。因此，给定集合中的所有类共享相同的假设。该框架包括一些特殊情况下常用的多类分类结构，例如为每个类分配一个唯一的假设，为所有类分配一个单一的公共假设。我们将多类感知器推广到我们的框架中，并推导出一个统一的错误界分析。我们的构建自然扩展到预先不知道课程数量的设置，而是在在线学习过程中显示。我们通过将其与以前的方法在合成和自然数据集上进行比较来证明我们的方法的优点。

引用次数: 65

Iterative RELIEF for feature weighting 特征加权的迭代救济

Proceedings of the 23rd international conference on Machine learning

Pub Date : 2006-06-25 DOI: 10.1145/1143844.1143959

Yijun Sun, Jian Li

We propose a series of new feature weighting algorithms, all stemming from a new interpretation of RELIEF as an online algorithm that solves a convex optimization problem with a margin-based objective function. The new interpretation explains the simplicity and effectiveness of RELIEF, and enables us to identify some of its weaknesses. We offer an analytic solution to mitigate these problems. We extend the newly proposed algorithm to handle multiclass problems by using a new multiclass margin definition. To reduce computational costs, an online learning algorithm is also developed. Convergence theorems of the proposed algorithms are presented. Some experiments based on the UCI and microarray datasets are performed to demonstrate the effectiveness of the proposed algorithms.

我们提出了一系列新的特征加权算法，所有这些算法都源于对RELIEF的新解释，即使用基于边缘的目标函数解决凸优化问题的在线算法。新的解释解释了RELIEF的简单性和有效性，并使我们能够确定它的一些弱点。我们提供了一个分析解决方案来缓解这些问题。通过使用新的多类余量定义，我们将新提出的算法扩展到处理多类问题。为了减少计算成本，还开发了一种在线学习算法。给出了算法的收敛定理。基于UCI和微阵列数据集的实验验证了所提算法的有效性。

引用次数: 162

Convex optimization techniques for fitting sparse Gaussian graphical models 拟合稀疏高斯图形模型的凸优化技术

Proceedings of the 23rd international conference on Machine learning

Pub Date : 2006-06-25 DOI: 10.1145/1143844.1143856

O. Banerjee, L. Ghaoui, A. d’Aspremont, G. Natsoulis

We consider the problem of fitting a large-scale covariance matrix to multivariate Gaussian data in such a way that the inverse is sparse, thus providing model selection. Beginning with a dense empirical covariance matrix, we solve a maximum likelihood problem with an l1-norm penalty term added to encourage sparsity in the inverse. For models with tens of nodes, the resulting problem can be solved using standard interior-point algorithms for convex optimization, but these methods scale poorly with problem size. We present two new algorithms aimed at solving problems with a thousand nodes. The first, based on Nesterov's first-order algorithm, yields a rigorous complexity estimate for the problem, with a much better dependence on problem size than interior-point methods. Our second algorithm uses block coordinate descent, updating row/columns of the covariance matrix sequentially. Experiments with genomic data show that our method is able to uncover biologically interpretable connections among genes.

我们考虑将大规模协方差矩阵拟合到多变量高斯数据的问题，这样逆是稀疏的，从而提供模型选择。从一个密集的经验协方差矩阵开始，我们解决了一个极大似然问题，增加了一个11范数惩罚项，以鼓励逆的稀疏性。对于具有数十个节点的模型，可以使用凸优化的标准内点算法来解决所产生的问题，但这些方法对问题规模的可扩展性很差。我们提出了两种新的算法，旨在解决一千节点的问题。第一种方法基于Nesterov的一阶算法，对问题产生了严格的复杂性估计，与内点方法相比，它对问题大小的依赖性要好得多。我们的第二个算法使用块坐标下降，按顺序更新协方差矩阵的行/列。基因组数据实验表明，我们的方法能够揭示基因之间可解释的生物学联系。

引用次数: 192

Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks 联结主义时间分类:用循环神经网络标记未分割的序列数据

Proceedings of the 23rd international conference on Machine learning

Pub Date : 2006-06-25 DOI: 10.1145/1143844.1143891

Alex Graves, Santiago Fernández, F. Gomez, J. Schmidhuber

Many real-world sequence learning tasks require the prediction of sequences of labels from noisy, unsegmented input data. In speech recognition, for example, an acoustic signal is transcribed into words or sub-word units. Recurrent neural networks (RNNs) are powerful sequence learners that would seem well suited to such tasks. However, because they require pre-segmented training data, and post-processing to transform their outputs into label sequences, their applicability has so far been limited. This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems. An experiment on the TIMIT speech corpus demonstrates its advantages over both a baseline HMM and a hybrid HMM-RNN.

许多现实世界的序列学习任务需要从有噪声的、未分割的输入数据中预测标签序列。例如，在语音识别中，声音信号被转录成单词或子单词单位。循环神经网络(RNNs)是功能强大的序列学习器，似乎非常适合此类任务。然而，由于它们需要预先分割训练数据，并需要后处理将其输出转换为标签序列，因此迄今为止它们的适用性受到限制。本文提出了一种训练rnn直接标记未分割序列的新方法，从而解决了这两个问题。在TIMIT语音语料库上的实验表明，它比基线HMM和混合HMM- rnn都有优势。

引用次数: 4767

CN = CPCN CN = CPCN

Proceedings of the 23rd international conference on Machine learning

Pub Date : 2006-06-25 DOI: 10.1145/1143844.1143935

L. Ralaivola, François Denis, C. Magnan

We address the issue of the learnability of concept classes under three classification noise models in the probably approximately correct framework. After introducing the Class-Conditional Classification Noise (CCCN) model, we investigate the problem of the learnability of concept classes under this particular setting and we show that concept classes that are learnable under the well-known uniform classification noise (CN) setting are also CCCN-learnable, which gives CN = CCCN. We then use this result to prove the equality between the set of concept classes that are CN-learnable and the set of concept classes that are learnable in the Constant Partition Classification Noise (CPCN) setting, or, in other words, we show that CN = CPCN.

我们在可能近似正确的框架下讨论了三种分类噪声模型下概念类的可学习性问题。在引入类别-条件分类噪声(CCCN)模型后，我们研究了在这种特定设置下概念类的可学习性问题，并证明了在众所周知的统一分类噪声(CN)设置下可学习的概念类也是CCCN可学习的，给出了CN = CCCN。然后，我们使用这个结果来证明在恒分割分类噪声(CPCN)设置中，CN可学习的概念类集与CN可学习的概念类集之间的相等性，或者，换句话说，我们证明CN = CPCN。

引用次数: 13

Maximum margin planning 最大余量规划

Proceedings of the 23rd international conference on Machine learning

Pub Date : 2006-06-25 DOI: 10.1145/1143844.1143936

Ashesh Jain, Michael Hu, Nathan D. Ratliff, Drew Bagnell, Martin A Zinkevich

Imitation learning of sequential, goal-directed behavior by standard supervised techniques is often difficult. We frame learning such behaviors as a maximum margin structured prediction problem over a space of policies. In this approach, we learn mappings from features to cost so an optimal policy in an MDP with these cost mimics the expert's behavior. Further, we demonstrate a simple, provably efficient approach to structured maximum margin learning, based on the subgradient method, that leverages existing fast algorithms for inference. Although the technique is general, it is particularly relevant in problems where A* and dynamic programming approaches make learning policies tractable in problems beyond the limitations of a QP formulation. We demonstrate our approach applied to route planning for outdoor mobile robots, where the behavior a designer wishes a planner to execute is often clear, while specifying cost functions that engender this behavior is a much more difficult task.

通过标准监督技术对顺序的、目标导向的行为进行模仿学习通常是困难的。我们将学习这些行为定义为策略空间上的最大边际结构化预测问题。在这种方法中，我们学习从特征到成本的映射，因此具有这些成本的MDP中的最优策略可以模仿专家的行为。此外，我们展示了一种简单的，可证明有效的结构化最大边际学习方法，基于子梯度方法，利用现有的快速算法进行推理。尽管该技术是通用的，但它特别适用于A*和动态规划方法使学习策略在超出QP公式限制的问题中易于处理的问题。我们展示了我们的方法应用于户外移动机器人的路线规划，其中设计者希望计划者执行的行为通常是明确的，而指定产生这种行为的成本函数是一项更加困难的任务。

引用次数: 681

On Bayesian bounds 在贝叶斯界上

Proceedings of the 23rd international conference on Machine learning

Pub Date : 2006-06-25 DOI: 10.1145/1143844.1143855

A. Banerjee

We show that several important Bayesian bounds studied in machine learning, both in the batch as well as the online setting, arise by an application of a simple compression lemma. In particular, we derive (i) PAC-Bayesian bounds in the batch setting, (ii) Bayesian log-loss bounds and (iii) Bayesian bounded-loss bounds in the online setting using the compression lemma. Although every setting has different semantics for prior, posterior and loss, we show that the core bound argument is the same. The paper simplifies our understanding of several important and apparently disparate results, as well as brings to light a powerful tool for developing similar arguments for other methods.

我们展示了在机器学习中研究的几个重要的贝叶斯界，无论是在批处理还是在线设置中，都是由一个简单的压缩引理的应用产生的。特别地，我们使用压缩引理推导出(i)批处理设置下的PAC-Bayesian边界，(ii)贝叶斯对数损失边界和(iii)在线设置下的贝叶斯有界损失边界。尽管每种设置对于先验、后验和损失具有不同的语义，但我们证明了核心界参数是相同的。这篇论文简化了我们对几个重要的、明显不同的结果的理解，同时也为其他方法提供了一个强大的工具，可以为类似的论点提供支持。

引用次数: 76

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Proceedings of the 23rd international conference on Machine learning

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀