Jiong Zhang, Ian E H Yen, Pradeep Ravikumar, Inderjit S Dhillon
Multiple Sequence Alignment (MSA) is one of the fundamental tasks in biological sequence analysis that underlies applications such as phylogenetic trees, profiles, and structure prediction. The task, however, is NP-hard, and the current practice resorts to heuristic and local-search methods. Recently, a convex optimization approach for MSA was proposed based on the concept of atomic norm [23], which demonstrates significant improvement over existing methods in the quality of alignments. However, the convex program is challenging to solve due to the constraint given by the intersection of two atomic-norm balls, for which the existing algorithm can only handle sequences of length up to 50, with an iteration complexity subject to constants of unknown relation to the natural parameters of MSA. In this work, we propose an accelerated dual decomposition algorithm that exploits entropy regularization to induce closed-form solutions for each atomic-norm-constrained subproblem, giving a single-loop algorithm of iteration complexity linear to the problem size (total length of all sequences). The proposed algorithm gives significantly better alignments than existing methods on sequences of length up to hundreds, where the existing convex programming method fails to converge in one day.
{"title":"Scalable Convex Multiple Sequence Alignment via Entropy-Regularized Dual Decomposition.","authors":"Jiong Zhang, Ian E H Yen, Pradeep Ravikumar, Inderjit S Dhillon","doi":"","DOIUrl":"","url":null,"abstract":"<p><p><i>Multiple Sequence Alignment (MSA)</i> is one of the fundamental tasks in biological sequence analysis that underlies applications such as phylogenetic trees, profiles, and structure prediction. The task, however, is NP-hard, and the current practice resorts to heuristic and local-search methods. Recently, a convex optimization approach for MSA was proposed based on the concept of atomic norm [23], which demonstrates significant improvement over existing methods in the quality of alignments. However, the convex program is challenging to solve due to the constraint given by the intersection of two atomic-norm balls, for which the existing algorithm can only handle sequences of length up to 50, with an iteration complexity subject to constants of unknown relation to the natural parameters of MSA. In this work, we propose an <i>accelerated dual decomposition</i> algorithm that exploits <i>entropy regularization</i> to induce closed-form solutions for each atomic-norm-constrained subproblem, giving a single-loop algorithm of iteration complexity linear to the problem size (total length of all sequences). The proposed algorithm gives significantly better alignments than existing methods on sequences of length up to hundreds, where the existing convex programming method fails to converge in one day.</p>","PeriodicalId":89793,"journal":{"name":"JMLR workshop and conference proceedings","volume":"54 ","pages":"1514-1522"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5581665/pdf/nihms896524.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35472764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiangru Huang, Qixing Huang, Ian E H Yen, Pradeep Ravikumar, Ruohan Zhang, Inderjit S Dhillon
Maximum-a-Posteriori (MAP) inference lies at the heart of Graphical Models and Structured Prediction. Despite the intractability of exact MAP inference, approximate methods based on LP relaxations have exhibited superior performance across a wide range of applications. Yet for problems involving large output domains (i.e., the state space for each variable is large), standard LP relaxations can easily give rise to a large number of variables and constraints which are beyond the limit of existing optimization algorithms. In this paper, we introduce an effective MAP inference method for problems with large output domains. The method builds upon alternating minimization of an Augmented Lagrangian that exploits the sparsity of messages through greedy optimization techniques. A key feature of our greedy approach is to introduce variables in an on-demand manner with a pre-built data structure over local factors. This results in a single-loop algorithm of sublinear cost per iteration and O(log(1/ε))-type iteration complexity to achieve ε sub-optimality. In addition, we introduce a variant of GDMM for binary MAP inference problems with a large number of factors. Empirically, the proposed algorithms demonstrate orders of magnitude speedup over state-of-the-art MAP inference techniques on MAP inference problems including Segmentation, Protein Folding, Graph Matching, and Multilabel prediction with pairwise interaction.
{"title":"Greedy Direction Method of Multiplier for MAP Inference of Large Output Domain.","authors":"Xiangru Huang, Qixing Huang, Ian E H Yen, Pradeep Ravikumar, Ruohan Zhang, Inderjit S Dhillon","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Maximum-a-Posteriori (MAP) inference lies at the heart of Graphical Models and Structured Prediction. Despite the intractability of exact MAP inference, approximate methods based on LP relaxations have exhibited superior performance across a wide range of applications. Yet for problems involving large output domains (i.e., the state space for each variable is large), standard LP relaxations can easily give rise to a large number of variables and constraints which are beyond the limit of existing optimization algorithms. In this paper, we introduce an effective MAP inference method for problems with large output domains. The method builds upon alternating minimization of an Augmented Lagrangian that exploits the sparsity of messages through greedy optimization techniques. A key feature of our greedy approach is to introduce variables in an on-demand manner with a pre-built data structure over local factors. This results in a single-loop algorithm of sublinear cost per iteration and <i>O</i>(log(1<i>/ε</i>))-type iteration complexity to achieve <i>ε</i> sub-optimality. In addition, we introduce a variant of GDMM for binary MAP inference problems with a large number of factors. Empirically, the proposed algorithms demonstrate orders of magnitude speedup over state-of-the-art MAP inference techniques on MAP inference problems including Segmentation, Protein Folding, Graph Matching, and Multilabel prediction with pairwise interaction.</p>","PeriodicalId":89793,"journal":{"name":"JMLR workshop and conference proceedings","volume":"54 ","pages":"1550-1559"},"PeriodicalIF":0.0,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5581664/pdf/nihms896523.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35472765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F Stewart, Jimeng Sun
Leveraging large historical data in electronic health record (EHR), we developed Doctor AI, a generic predictive model that covers observed medical conditions and medication uses. Doctor AI is a temporal model using recurrent neural networks (RNN) and was developed and applied to longitudinal time stamped EHR data from 260K patients over 8 years. Encounter records (e.g. diagnosis codes, medication codes or procedure codes) were input to RNN to predict (all) the diagnosis and medication categories for a subsequent visit. Doctor AI assesses the history of patients to make multilabel predictions (one label for each diagnosis or medication category). Based on separate blind test set evaluation, Doctor AI can perform differential diagnosis with up to 79% recall@30, significantly higher than several baselines. Moreover, we demonstrate great generalizability of Doctor AI by adapting the resulting models from one institution to another without losing substantial accuracy.
{"title":"Doctor AI: Predicting Clinical Events via Recurrent Neural Networks.","authors":"Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F Stewart, Jimeng Sun","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Leveraging large historical data in electronic health record (EHR), we developed Doctor AI, a generic predictive model that covers observed medical conditions and medication uses. Doctor AI is a temporal model using recurrent neural networks (RNN) and was developed and applied to longitudinal time stamped EHR data from 260K patients over 8 years. Encounter records (e.g. diagnosis codes, medication codes or procedure codes) were input to RNN to predict (all) the diagnosis and medication categories for a subsequent visit. Doctor AI assesses the history of patients to make multilabel predictions (one label for each diagnosis or medication category). Based on separate blind test set evaluation, Doctor AI can perform differential diagnosis with up to 79% recall@30, significantly higher than several baselines. Moreover, we demonstrate great generalizability of Doctor AI by adapting the resulting models from one institution to another without losing substantial accuracy.</p>","PeriodicalId":89793,"journal":{"name":"JMLR workshop and conference proceedings","volume":"56 ","pages":"301-318"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5341604/pdf/nihms-845642.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34806178","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Existing score-based causal model search algorithms such as GES (and a speeded up version, FGS) are asymptotically correct, fast, and reliable, but make the unrealistic assumption that the true causal graph does not contain any unmeasured confounders. There are several constraint-based causal search algorithms (e.g RFCI, FCI, or FCI+) that are asymptotically correct without assuming that there are no unmeasured confounders, but often perform poorly on small samples. We describe a combined score and constraint-based algorithm, GFCI, that we prove is asymptotically correct. On synthetic data, GFCI is only slightly slower than RFCI but more accurate than FCI, RFCI and FCI+.
{"title":"A Hybrid Causal Search Algorithm for Latent Variable Models.","authors":"Juan Miguel Ogarrio, Peter Spirtes, Joe Ramsey","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Existing score-based causal model search algorithms such as <i>GES</i> (and a speeded up version, <i>FGS</i>) are asymptotically correct, fast, and reliable, but make the unrealistic assumption that the true causal graph does not contain any unmeasured confounders. There are several constraint-based causal search algorithms (e.g <i>RFCI, FCI</i>, or <i>FCI</i>+) that are asymptotically correct without assuming that there are no unmeasured confounders, but often perform poorly on small samples. We describe a combined score and constraint-based algorithm, <i>GFCI</i>, that we prove is asymptotically correct. On synthetic data, <i>GFCI</i> is only slightly slower than <i>RFCI</i> but more accurate than <i>FCI, RFCI</i> and <i>FCI</i>+.</p>","PeriodicalId":89793,"journal":{"name":"JMLR workshop and conference proceedings","volume":"52 ","pages":"368-379"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5325717/pdf/nihms845582.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34766374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marzyeh Ghassemi, Zeeshan Syed, Daryush D Mehta, Jarrad H Van Stan, Robert E Hillman, John V Guttag
Voice disorders affect an estimated 14 million working-aged Americans, and many more worldwide. We present the first large scale study of vocal misuse based on long-term ambulatory data collected by an accelerometer placed on the neck. We investigate an unsupervised data mining approach to uncovering latent information about voice misuse. We segment signals from over 253 days of data from 22 subjects into over a hundred million single glottal pulses (closures of the vocal folds), cluster segments into symbols, and use symbolic mismatch to uncover differences between patients and matched controls, and between patients pre- and post-treatment. Our results show significant behavioral differences between patients and controls, as well as between some pre- and post-treatment patients. Our proposed approach provides an objective basis for helping diagnose behavioral voice disorders, and is a first step towards a more data-driven understanding of the impact of voice therapy.
{"title":"Uncovering Voice Misuse Using Symbolic Mismatch.","authors":"Marzyeh Ghassemi, Zeeshan Syed, Daryush D Mehta, Jarrad H Van Stan, Robert E Hillman, John V Guttag","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Voice disorders affect an estimated 14 million working-aged Americans, and many more worldwide. We present the first large scale study of vocal misuse based on long-term ambulatory data collected by an accelerometer placed on the neck. We investigate an unsupervised data mining approach to uncovering latent information about voice misuse. We segment signals from over 253 days of data from 22 subjects into over a hundred million single glottal pulses (closures of the vocal folds), cluster segments into symbols, and use symbolic mismatch to uncover differences between patients and matched controls, and between patients pre- and post-treatment. Our results show significant behavioral differences between patients and controls, as well as between some pre- and post-treatment patients. Our proposed approach provides an objective basis for helping diagnose behavioral voice disorders, and is a first step towards a more data-driven understanding of the impact of voice therapy.</p>","PeriodicalId":89793,"journal":{"name":"JMLR workshop and conference proceedings","volume":"56 ","pages":"239-252"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8693775/pdf/nihms-1069009.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39871655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Antti Hyttinen, Sergey Plis, Matti Järvisalo, Frederick Eberhardt, David Danks
This paper focuses on causal structure estimation from time series data in which measurements are obtained at a coarser timescale than the causal timescale of the underlying system. Previous work has shown that such subsampling can lead to significant errors about the system's causal structure if not properly taken into account. In this paper, we first consider the search for the system timescale causal structures that correspond to a given measurement timescale structure. We provide a constraint satisfaction procedure whose computational performance is several orders of magnitude better than previous approaches. We then consider finite-sample data as input, and propose the first constraint optimization approach for recovering the system timescale causal structure. This algorithm optimally recovers from possible conflicts due to statistical errors. More generally, these advances allow for a robust and non-parametric estimation of system timescale causal structures from subsampled time series data.
{"title":"Causal Discovery from Subsampled Time Series Data by Constraint Optimization.","authors":"Antti Hyttinen, Sergey Plis, Matti Järvisalo, Frederick Eberhardt, David Danks","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>This paper focuses on causal structure estimation from time series data in which measurements are obtained at a coarser timescale than the causal timescale of the underlying system. Previous work has shown that such subsampling can lead to significant errors about the system's causal structure if not properly taken into account. In this paper, we first consider the search for the system timescale causal structures that correspond to a given measurement timescale structure. We provide a constraint satisfaction procedure whose computational performance is several orders of magnitude better than previous approaches. We then consider finite-sample data as input, and propose the first constraint optimization approach for recovering the system timescale causal structure. This algorithm optimally recovers from possible conflicts due to statistical errors. More generally, these advances allow for a robust and non-parametric estimation of system timescale causal structures from subsampled time series data.</p>","PeriodicalId":89793,"journal":{"name":"JMLR workshop and conference proceedings","volume":"52 ","pages":"216-227"},"PeriodicalIF":0.0,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5305170/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140195246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
David I Inouye, Pradeep Ravikumar, Inderjit S Dhillon
We develop Square Root Graphical Models (SQR), a novel class of parametric graphical models that provides multivariate generalizations of univariate exponential family distributions. Previous multivariate graphical models (Yang et al., 2015) did not allow positive dependencies for the exponential and Poisson generalizations. However, in many real-world datasets, variables clearly have positive dependencies. For example, the airport delay time in New York-modeled as an exponential distribution-is positively related to the delay time in Boston. With this motivation, we give an example of our model class derived from the univariate exponential distribution that allows for almost arbitrary positive and negative dependencies with only a mild condition on the parameter matrix-a condition akin to the positive definiteness of the Gaussian covariance matrix. Our Poisson generalization allows for both positive and negative dependencies without any constraints on the parameter values. We also develop parameter estimation methods using node-wise regressions with ℓ1 regularization and likelihood approximation methods using sampling. Finally, we demonstrate our exponential generalization on a synthetic dataset and a real-world dataset of airport delay times.
我们开发了平方根图形模型(SQR),这是一类新的参数图形模型,它提供了单变量指数族分布的多元推广。以前的多变量图形模型(Yang et al., 2015)不允许指数和泊松推广的正依赖关系。然而,在许多真实世界的数据集中,变量显然具有正相关性。例如,纽约机场的延误时间(建模为指数分布)与波士顿的延误时间呈正相关。有了这个动机,我们给出了一个模型类的例子,该模型类来源于单变量指数分布,它允许几乎任意的正相关和负相关,而参数矩阵只有一个温和的条件——一个类似于高斯协方差矩阵的正确定性的条件。我们的泊松泛化允许正依赖和负依赖,而不受参数值的任何约束。我们也发展了参数估计方法使用节点明智的回归与1正则化和似然逼近方法使用抽样。最后,我们在一个合成数据集和一个真实的机场延误时间数据集上证明了我们的指数泛化。
{"title":"Square Root Graphical Models: Multivariate Generalizations of Univariate Exponential Families that Permit Positive Dependencies.","authors":"David I Inouye, Pradeep Ravikumar, Inderjit S Dhillon","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We develop Square Root Graphical Models (SQR), a novel class of parametric graphical models that provides multivariate generalizations of univariate exponential family distributions. Previous multivariate graphical models (Yang et al., 2015) did not allow positive dependencies for the exponential and Poisson generalizations. However, in many real-world datasets, variables clearly have positive dependencies. For example, the airport delay time in New York-modeled as an exponential distribution-is positively related to the delay time in Boston. With this motivation, we give an example of our model class derived from the univariate exponential distribution that allows for almost arbitrary positive and negative dependencies with only a mild condition on the parameter matrix-a condition akin to the positive definiteness of the Gaussian covariance matrix. Our Poisson generalization allows for both positive and negative dependencies without any constraints on the parameter values. We also develop parameter estimation methods using node-wise regressions with <i>ℓ</i><sub>1</sub> regularization and likelihood approximation methods using sampling. Finally, we demonstrate our exponential generalization on a synthetic dataset and a real-world dataset of airport delay times.</p>","PeriodicalId":89793,"journal":{"name":"JMLR workshop and conference proceedings","volume":"48 ","pages":"2445-2453"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4995108/pdf/nihms808904.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34338585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mingming Gong, Kun Zhang, Tongliang Liu, Dacheng Tao, Clark Glymour, Bernhard Schölkopf
Domain adaptation arises in supervised learning when the training (source domain) and test (target domain) data have different distributions. Let X and Y denote the features and target, respectively, previous work on domain adaptation mainly considers the covariate shift situation where the distribution of the features P(X) changes across domains while the conditional distribution P(Y∣X) stays the same. To reduce domain discrepancy, recent methods try to find invariant components [Formula: see text] that have similar [Formula: see text] on different domains by explicitly minimizing a distribution discrepancy measure. However, it is not clear if [Formula: see text] in different domains is also similar when P(Y∣X) changes. Furthermore, transferable components do not necessarily have to be invariant. If the change in some components is identifiable, we can make use of such components for prediction in the target domain. In this paper, we focus on the case where P(X∣Y) and P(Y) both change in a causal system in which Y is the cause for X. Under appropriate assumptions, we aim to extract conditional transferable components whose conditional distribution [Formula: see text] is invariant after proper location-scale (LS) transformations, and identify how P(Y) changes between domains simultaneously. We provide theoretical analysis and empirical evaluation on both synthetic and real-world data to show the effectiveness of our method.
{"title":"Domain Adaptation with Conditional Transferable Components.","authors":"Mingming Gong, Kun Zhang, Tongliang Liu, Dacheng Tao, Clark Glymour, Bernhard Schölkopf","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Domain adaptation arises in supervised learning when the training (source domain) and test (target domain) data have different distributions. Let <i>X</i> and <i>Y</i> denote the features and target, respectively, previous work on domain adaptation mainly considers the covariate shift situation where the distribution of the features <i>P</i>(<i>X</i>) changes across domains while the conditional distribution <i>P</i>(<i>Y</i>∣<i>X</i>) stays the same. To reduce domain discrepancy, recent methods try to find invariant components [Formula: see text] that have similar [Formula: see text] on different domains by explicitly minimizing a distribution discrepancy measure. However, it is not clear if [Formula: see text] in different domains is also similar when <i>P</i>(<i>Y</i>∣<i>X</i>) changes. Furthermore, transferable components do not necessarily have to be invariant. If the change in some components is identifiable, we can make use of such components for prediction in the target domain. In this paper, we focus on the case where <i>P</i>(<i>X</i>∣<i>Y</i>) and <i>P</i>(<i>Y</i>) both change in a causal system in which <i>Y</i> is the cause for <i>X</i>. Under appropriate assumptions, we aim to extract conditional transferable components whose conditional distribution [Formula: see text] is invariant after proper location-scale (LS) transformations, and identify how <i>P</i>(<i>Y</i>) changes between domains simultaneously. We provide theoretical analysis and empirical evaluation on both synthetic and real-world data to show the effectiveness of our method.</p>","PeriodicalId":89793,"journal":{"name":"JMLR workshop and conference proceedings","volume":"48 ","pages":"2839-2848"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5321138/pdf/nihms-846268.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34766373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We introduce anytime Explore-m, a pure exploration problem for multi-armed bandits (MAB) that requires making a prediction of the top-m arms at every time step. Anytime Explore-m is more practical than fixed budget or fixed confidence formulations of the top-m problem, since many applications involve a finite, but unpredictable, budget. However, the development and analysis of anytime algorithms present many challenges. We propose AT-LUCB (AnyTime Lower and Upper Confidence Bound), the first nontrivial algorithm that provably solves anytime Explore-m. Our analysis shows that the sample complexity of AT-LUCB is competitive to anytime variants of existing algorithms. Moreover, our empirical evaluation on AT-LUCB shows that AT-LUCB performs as well as or better than state-of-the-art baseline methods for anytime Explore-m.
我们引入了anytime Explore-m,这是一个针对多臂土匪(MAB)的纯探索问题,它需要在每个时间步长对最上面的m个臂进行预测。无论何时,Explore-m都比top-m问题的固定预算或固定置信度公式更实用,因为许多应用都涉及有限但不可预测的预算。然而,任意时间算法的开发和分析面临着许多挑战。我们提出了AT-LUCB (AnyTime Lower and Upper Confidence Bound)算法,这是第一个可以证明解决AnyTime Explore-m问题的非平凡算法。我们的分析表明,AT-LUCB的样本复杂度与现有算法的任何变体相比都具有竞争力。此外,我们对AT-LUCB的实证评估表明,AT-LUCB在任何时候都与最先进的基线方法一样好,甚至更好。
{"title":"Anytime Exploration for Multi-armed Bandits using Confidence Information.","authors":"Kwang-Sung Jun, Robert Nowak","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>We introduce anytime Explore-<i>m</i>, a pure exploration problem for multi-armed bandits (MAB) that requires making a prediction of the top-<i>m</i> arms at every time step. Anytime Explore-<i>m</i> is more practical than fixed budget or fixed confidence formulations of the top-<i>m</i> problem, since many applications involve a finite, but unpredictable, budget. However, the development and analysis of anytime algorithms present many challenges. We propose AT-LUCB (AnyTime Lower and Upper Confidence Bound), the first nontrivial algorithm that provably solves anytime Explore-<i>m</i>. Our analysis shows that the sample complexity of AT-LUCB is competitive to anytime variants of existing algorithms. Moreover, our empirical evaluation on AT-LUCB shows that AT-LUCB performs as well as or better than state-of-the-art baseline methods for anytime Explore-<i>m</i>.</p>","PeriodicalId":89793,"journal":{"name":"JMLR workshop and conference proceedings","volume":"48 ","pages":"974-982"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5846129/pdf/nihms894213.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"35915066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sathya N Ravi, Vamsi K Ithapu, Sterling C Johnson, Vikas Singh
Budget constrained optimal design of experiments is a well studied problem. Although the literature is very mature, not many strategies are available when these design problems appear in the context of sparse linear models commonly encountered in high dimensional machine learning. In this work, we study this budget constrained design where the underlying regression model involves a ℓ1-regularized linear function. We propose two novel strategies: the first is motivated geometrically whereas the second is algebraic in nature. We obtain tractable algorithms for this problem which also hold for a more general class of sparse linear models. We perform a detailed set of experiments, on benchmarks and a large neuroimaging study, showing that the proposed models are effective in practice. The latter experiment suggests that these ideas may play a small role in informing enrollment strategies for similar scientific studies in the future.
{"title":"Experimental Design on a Budget for Sparse Linear Models and Applications.","authors":"Sathya N Ravi, Vamsi K Ithapu, Sterling C Johnson, Vikas Singh","doi":"","DOIUrl":"","url":null,"abstract":"<p><p>Budget constrained optimal design of experiments is a well studied problem. Although the literature is very mature, not many strategies are available when these design problems appear in the context of sparse linear models commonly encountered in high dimensional machine learning. In this work, we study this budget constrained design where the underlying regression model involves a <i>ℓ</i><sub>1</sub>-regularized linear function. We propose two novel strategies: the first is motivated geometrically whereas the second is algebraic in nature. We obtain tractable algorithms for this problem which also hold for a more general class of sparse linear models. We perform a detailed set of experiments, on benchmarks and a large neuroimaging study, showing that the proposed models are effective in practice. The latter experiment suggests that these ideas may play a small role in informing enrollment strategies for similar scientific studies in the future.</p>","PeriodicalId":89793,"journal":{"name":"JMLR workshop and conference proceedings","volume":"48 ","pages":"583-592"},"PeriodicalIF":0.0,"publicationDate":"2016-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5415092/pdf/nihms855967.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"34974183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}