Etienne van de Bijl, Jan Klein, Joris Pries, Sandjai Bhulai, Mark Hoogendoorn, Rob van der Mei
Novel prediction methods should always be compared to a baseline to determine their performance. Without this frame of reference, the performance score of a model is basically meaningless. What does it mean when a model achieves an $F_1$ of 0.8 on a test set? A proper baseline is, therefore, required to evaluate the ‘goodness’ of a performance score. Comparing results with the latest state-of-the-art model is usually insightful. However, being state-of-the-art is dynamic, as newer models are continuously developed. Contrary to an advanced model, it is also possible to use a simple dummy classifier. However, the latter model could be beaten too easily, making the comparison less valuable. Furthermore, most existing baselines are stochastic and need to be computed repeatedly to get a reliable expected performance, which could be computationally expensive. We present a universal baseline method for all binary classification models, named the Dutch Draw (DD). This approach weighs simple classifiers and determines the best classifier to use as a baseline. Theoretically, we derive the DD baseline for many commonly used evaluation measures and show that in most situations it reduces to (almost) always predicting either zero or one. Summarizing, the DD baseline is general, as it is applicable to any binary classification problem; simple, as it can be quickly determined without training or parameter tuning; and informative, as insightful conclusions can be drawn from the results. The DD baseline serves two purposes. First, it is a robust and universal baseline that enables comparisons across research papers. Second, it provides a sanity check during the prediction model’s development process. When a model does not outperform the DD baseline, it is a major warning sign.
{"title":"The dutch draw: constructing a universal baseline for binary classification problems","authors":"Etienne van de Bijl, Jan Klein, Joris Pries, Sandjai Bhulai, Mark Hoogendoorn, Rob van der Mei","doi":"10.1017/jpr.2024.52","DOIUrl":"https://doi.org/10.1017/jpr.2024.52","url":null,"abstract":"<p>Novel prediction methods should always be compared to a baseline to determine their performance. Without this frame of reference, the performance score of a model is basically meaningless. What does it mean when a model achieves an <span><span><img data-mimesubtype=\"png\" data-type=\"\" src=\"https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20240918134025706-0265:S0021900224000524:S0021900224000524_inline1.png\"><span data-mathjax-type=\"texmath\"><span>$F_1$</span></span></img></span></span> of 0.8 on a test set? A proper baseline is, therefore, required to evaluate the ‘goodness’ of a performance score. Comparing results with the latest state-of-the-art model is usually insightful. However, being state-of-the-art is dynamic, as newer models are continuously developed. Contrary to an advanced model, it is also possible to use a simple dummy classifier. However, the latter model could be beaten too easily, making the comparison less valuable. Furthermore, most existing baselines are stochastic and need to be computed repeatedly to get a reliable expected performance, which could be computationally expensive. We present a universal baseline method for all <span>binary classification</span> models, named the <span>Dutch Draw</span> (DD). This approach weighs simple classifiers and determines the best classifier to use as a baseline. Theoretically, we derive the DD baseline for many commonly used evaluation measures and show that in most situations it reduces to (almost) always predicting either zero or one. Summarizing, the DD baseline is <span>general</span>, as it is applicable to any binary classification problem; <span>simple</span>, as it can be quickly determined without training or parameter tuning; and <span>informative</span>, as insightful conclusions can be drawn from the results. The DD baseline serves two purposes. First, it is a robust and universal baseline that enables comparisons across research papers. Second, it provides a sanity check during the prediction model’s development process. When a model does not outperform the DD baseline, it is a major warning sign.</p>","PeriodicalId":50256,"journal":{"name":"Journal of Applied Probability","volume":"194 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142248308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider two continuous-time generalizations of conservative random walks introduced in Englander and Volkov (2022), an orthogonal and a spherically symmetrical one; the latter model is also known as random flights. For both models, we show the transience of the walks when $dge 2$ and that the rate of direction changing follows a power law $t^{-alpha}$ , $0<alphale 1$ , or the law $(!ln t)^{-beta}$ where $beta>2$ .
{"title":"Transience of continuous-time conservative random walks","authors":"Satyaki Bhattacharya, Stanislav Volkov","doi":"10.1017/jpr.2024.46","DOIUrl":"https://doi.org/10.1017/jpr.2024.46","url":null,"abstract":"We consider two continuous-time generalizations of conservative random walks introduced in Englander and Volkov (2022), an orthogonal and a spherically symmetrical one; the latter model is also known as <jats:italic>random flights</jats:italic>. For both models, we show the transience of the walks when <jats:inline-formula> <jats:alternatives> <jats:inline-graphic xmlns:xlink=\"http://www.w3.org/1999/xlink\" mime-subtype=\"png\" xlink:href=\"S0021900224000469_inline1.png\"/> <jats:tex-math> $dge 2$ </jats:tex-math> </jats:alternatives> </jats:inline-formula> and that the rate of direction changing follows a power law <jats:inline-formula> <jats:alternatives> <jats:inline-graphic xmlns:xlink=\"http://www.w3.org/1999/xlink\" mime-subtype=\"png\" xlink:href=\"S0021900224000469_inline2.png\"/> <jats:tex-math> $t^{-alpha}$ </jats:tex-math> </jats:alternatives> </jats:inline-formula>, <jats:inline-formula> <jats:alternatives> <jats:inline-graphic xmlns:xlink=\"http://www.w3.org/1999/xlink\" mime-subtype=\"png\" xlink:href=\"S0021900224000469_inline3.png\"/> <jats:tex-math> $0<alphale 1$ </jats:tex-math> </jats:alternatives> </jats:inline-formula>, or the law <jats:inline-formula> <jats:alternatives> <jats:inline-graphic xmlns:xlink=\"http://www.w3.org/1999/xlink\" mime-subtype=\"png\" xlink:href=\"S0021900224000469_inline4.png\"/> <jats:tex-math> $(!ln t)^{-beta}$ </jats:tex-math> </jats:alternatives> </jats:inline-formula> where <jats:inline-formula> <jats:alternatives> <jats:inline-graphic xmlns:xlink=\"http://www.w3.org/1999/xlink\" mime-subtype=\"png\" xlink:href=\"S0021900224000469_inline5.png\"/> <jats:tex-math> $beta>2$ </jats:tex-math> </jats:alternatives> </jats:inline-formula>.","PeriodicalId":50256,"journal":{"name":"Journal of Applied Probability","volume":"14 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142247915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We review criteria for comparing the efficiency of Markov chain Monte Carlo (MCMC) methods with respect to the asymptotic variance of estimates of expectations of functions of state, and show how such criteria can justify ways of combining improvements to MCMC methods. We say that a chain on a finite state space with transition matrix P efficiency-dominates one with transition matrix Q if for every function of state it has lower (or equal) asymptotic variance. We give elementary proofs of some previous results regarding efficiency dominance, leading to a self-contained demonstration that a reversible chain with transition matrix P efficiency-dominates a reversible chain with transition matrix Q if and only if none of the eigenvalues of $Q-P$ are negative. This allows us to conclude that modifying a reversible MCMC method to improve its efficiency will also improve the efficiency of a method that randomly chooses either this or some other reversible method, and to conclude that improving the efficiency of a reversible update for one component of state (as in Gibbs sampling) will improve the overall efficiency of a reversible method that combines this and other updates. It also explains how antithetic MCMC can be more efficient than independent and identically distributed sampling. We also establish conditions that can guarantee that a method is not efficiency-dominated by any other method.
我们回顾了比较马尔可夫链蒙特卡罗(MCMC)方法在状态函数期望估计值渐近方差方面的效率的标准,并说明了这些标准如何证明将改进 MCMC 方法结合起来的方法是合理的。如果对每个状态函数而言,具有过渡矩阵 P 的有限状态空间链的渐近方差较小(或相等),我们就认为该链的效率优于具有过渡矩阵 Q 的有限状态空间链。我们给出了以前关于效率优势的一些结果的基本证明,从而得出一个自足的论证:当且仅当 $Q-P$ 的特征值都不是负数时,具有过渡矩阵 P 的可逆链效率优势于具有过渡矩阵 Q 的可逆链。这让我们得出结论:修改一种可逆 MCMC 方法以提高其效率,也会提高随机选择这种或其他可逆方法的方法的效率;还让我们得出结论:提高状态的一个组成部分的可逆更新(如吉布斯采样)的效率,会提高结合这种更新和其他更新的可逆方法的整体效率。这也解释了反可逆 MCMC 如何比独立同分布采样更高效。我们还建立了一些条件,以保证一种方法的效率不会被任何其他方法所支配。
{"title":"Efficiency of reversible MCMC methods: elementary derivations and applications to composite methods","authors":"Radford M. Neal, Jeffrey S. Rosenthal","doi":"10.1017/jpr.2024.48","DOIUrl":"https://doi.org/10.1017/jpr.2024.48","url":null,"abstract":"We review criteria for comparing the efficiency of Markov chain Monte Carlo (MCMC) methods with respect to the asymptotic variance of estimates of expectations of functions of state, and show how such criteria can justify ways of combining improvements to MCMC methods. We say that a chain on a finite state space with transition matrix <jats:italic>P</jats:italic> efficiency-dominates one with transition matrix <jats:italic>Q</jats:italic> if for every function of state it has lower (or equal) asymptotic variance. We give elementary proofs of some previous results regarding efficiency dominance, leading to a self-contained demonstration that a reversible chain with transition matrix <jats:italic>P</jats:italic> efficiency-dominates a reversible chain with transition matrix <jats:italic>Q</jats:italic> if and only if none of the eigenvalues of <jats:inline-formula> <jats:alternatives> <jats:inline-graphic xmlns:xlink=\"http://www.w3.org/1999/xlink\" mime-subtype=\"png\" xlink:href=\"S0021900224000482_inline1.png\"/> <jats:tex-math> $Q-P$ </jats:tex-math> </jats:alternatives> </jats:inline-formula> are negative. This allows us to conclude that modifying a reversible MCMC method to improve its efficiency will also improve the efficiency of a method that randomly chooses either this or some other reversible method, and to conclude that improving the efficiency of a reversible update for one component of state (as in Gibbs sampling) will improve the overall efficiency of a reversible method that combines this and other updates. It also explains how antithetic MCMC can be more efficient than independent and identically distributed sampling. We also establish conditions that can guarantee that a method is not efficiency-dominated by any other method.","PeriodicalId":50256,"journal":{"name":"Journal of Applied Probability","volume":"36 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142247916","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Previous approaches to modelling interval-censored data have often relied on assumptions of homogeneity in the sense that the censoring mechanism, the underlying distribution of occurrence times, or both, are assumed to be time-invariant. In this work, we introduce a model which allows for non-homogeneous behaviour in both cases. In particular, we outline a censoring mechanism based on a non-homogeneous alternating renewal process in which interval generation is assumed to be time-dependent, and we propose a Markov point process model for the underlying occurrence time distribution. We prove the existence of this process and derive the conditional distribution of the occurrence times given the intervals. We provide a framework within which the process can be accurately modelled, and subsequently compare our model to the homogeneous approach through a number of illustrative examples.
{"title":"A non-homogeneous alternating renewal process model for interval censoring","authors":"M. N. M. van Lieshout, R. L. Markwitz","doi":"10.1017/jpr.2024.54","DOIUrl":"https://doi.org/10.1017/jpr.2024.54","url":null,"abstract":"<p>Previous approaches to modelling interval-censored data have often relied on assumptions of homogeneity in the sense that the censoring mechanism, the underlying distribution of occurrence times, or both, are assumed to be time-invariant. In this work, we introduce a model which allows for non-homogeneous behaviour in both cases. In particular, we outline a censoring mechanism based on a non-homogeneous alternating renewal process in which interval generation is assumed to be time-dependent, and we propose a Markov point process model for the underlying occurrence time distribution. We prove the existence of this process and derive the conditional distribution of the occurrence times given the intervals. We provide a framework within which the process can be accurately modelled, and subsequently compare our model to the homogeneous approach through a number of illustrative examples.</p>","PeriodicalId":50256,"journal":{"name":"Journal of Applied Probability","volume":"39 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142247918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The system signature is a useful tool for studying coherent systems. For a given coherent system, various methods have been proposed in the literature to compute its signature. However, when any system signature is given, the literature does not address how to construct the corresponding coherent system(s). In this article we propose an algorithm to address this research gap. This algorithm enables the validation of whether a provided probability vector qualifies as a signature. If it does, the algorithm proceeds to generate the corresponding coherent system(s). To illustrate the applicability of this algorithm, we consider all three and four-dimensional probability vectors, verify if they are signatures, and finally obtain 5 and 20 coherent systems, respectively, which coincides with the literature (Shaked and Suarez-Llorens 2003).
{"title":"An algorithm to construct coherent systems using signatures","authors":"T. V. Rao, Sameen Naqvi","doi":"10.1017/jpr.2024.60","DOIUrl":"https://doi.org/10.1017/jpr.2024.60","url":null,"abstract":"<p>The system signature is a useful tool for studying coherent systems. For a given coherent system, various methods have been proposed in the literature to compute its signature. However, when any system signature is given, the literature does not address how to construct the corresponding coherent system(s). In this article we propose an algorithm to address this research gap. This algorithm enables the validation of whether a provided probability vector qualifies as a signature. If it does, the algorithm proceeds to generate the corresponding coherent system(s). To illustrate the applicability of this algorithm, we consider all three and four-dimensional probability vectors, verify if they are signatures, and finally obtain 5 and 20 coherent systems, respectively, which coincides with the literature (Shaked and Suarez-Llorens 2003).</p>","PeriodicalId":50256,"journal":{"name":"Journal of Applied Probability","volume":"2 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142248407","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We propose a method for cutting down a random recursive tree that focuses on its higher-degree vertices. Enumerate the vertices of a random recursive tree of size <jats:italic>n</jats:italic> according to the decreasing order of their degrees; namely, let <jats:inline-formula> <jats:alternatives> <jats:inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" mime-subtype="png" xlink:href="S0021900224000408_inline1.png"/> <jats:tex-math>$(v^{(i)})_{i=1}^{n}$</jats:tex-math> </jats:alternatives> </jats:inline-formula> be such that <jats:inline-formula> <jats:alternatives> <jats:inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" mime-subtype="png" xlink:href="S0021900224000408_inline2.png"/> <jats:tex-math>$deg(v^{(1)}) geq cdots geq deg (v^{(n)})$</jats:tex-math> </jats:alternatives> </jats:inline-formula>. The targeted vertex-cutting process is performed by sequentially removing vertices <jats:inline-formula> <jats:alternatives> <jats:inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" mime-subtype="png" xlink:href="S0021900224000408_inline3.png"/> <jats:tex-math>$v^{(1)}, v^{(2)}, ldots, v^{(n)}$</jats:tex-math> </jats:alternatives> </jats:inline-formula> and keeping only the subtree containing the root after each removal. The algorithm ends when the root is picked to be removed. The total number of steps for this procedure, <jats:inline-formula> <jats:alternatives> <jats:inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" mime-subtype="png" xlink:href="S0021900224000408_inline4.png"/> <jats:tex-math>$K_n$</jats:tex-math> </jats:alternatives> </jats:inline-formula>, is upper bounded by <jats:inline-formula> <jats:alternatives> <jats:inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" mime-subtype="png" xlink:href="S0021900224000408_inline5.png"/> <jats:tex-math>$Z_{geq D}$</jats:tex-math> </jats:alternatives> </jats:inline-formula>, which denotes the number of vertices that have degree at least as large as the degree of the root. We prove that <jats:inline-formula> <jats:alternatives> <jats:inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" mime-subtype="png" xlink:href="S0021900224000408_inline6.png"/> <jats:tex-math>$ln Z_{geq D}$</jats:tex-math> </jats:alternatives> </jats:inline-formula> grows as <jats:inline-formula> <jats:alternatives> <jats:inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" mime-subtype="png" xlink:href="S0021900224000408_inline7.png"/> <jats:tex-math>$ln n$</jats:tex-math> </jats:alternatives> </jats:inline-formula> asymptotically and obtain its limiting behavior in probability. Moreover, we obtain that the <jats:italic>k</jats:italic>th moment of <jats:inline-formula> <jats:alternatives> <jats:inline-graphic xmlns:xlink="http://www.w3.org/1999/xlink" mime-subtype="png" xlink:href="S0021900224000408_inline8.png"/> <jats:tex-math>$ln Z_{geq D}$</jats:tex-math> </jats:alternatives> </jats:inline-formula> is proportional to <jats:inline-formula> <jats:alternatives> <jats:inline-graphic
{"title":"Quenched worst-case scenario for root deletion in targeted cutting of random recursive trees","authors":"Laura Eslava, Sergio I. López, Marco L. Ortiz","doi":"10.1017/jpr.2024.40","DOIUrl":"https://doi.org/10.1017/jpr.2024.40","url":null,"abstract":"We propose a method for cutting down a random recursive tree that focuses on its higher-degree vertices. Enumerate the vertices of a random recursive tree of size <jats:italic>n</jats:italic> according to the decreasing order of their degrees; namely, let <jats:inline-formula> <jats:alternatives> <jats:inline-graphic xmlns:xlink=\"http://www.w3.org/1999/xlink\" mime-subtype=\"png\" xlink:href=\"S0021900224000408_inline1.png\"/> <jats:tex-math>$(v^{(i)})_{i=1}^{n}$</jats:tex-math> </jats:alternatives> </jats:inline-formula> be such that <jats:inline-formula> <jats:alternatives> <jats:inline-graphic xmlns:xlink=\"http://www.w3.org/1999/xlink\" mime-subtype=\"png\" xlink:href=\"S0021900224000408_inline2.png\"/> <jats:tex-math>$deg(v^{(1)}) geq cdots geq deg (v^{(n)})$</jats:tex-math> </jats:alternatives> </jats:inline-formula>. The targeted vertex-cutting process is performed by sequentially removing vertices <jats:inline-formula> <jats:alternatives> <jats:inline-graphic xmlns:xlink=\"http://www.w3.org/1999/xlink\" mime-subtype=\"png\" xlink:href=\"S0021900224000408_inline3.png\"/> <jats:tex-math>$v^{(1)}, v^{(2)}, ldots, v^{(n)}$</jats:tex-math> </jats:alternatives> </jats:inline-formula> and keeping only the subtree containing the root after each removal. The algorithm ends when the root is picked to be removed. The total number of steps for this procedure, <jats:inline-formula> <jats:alternatives> <jats:inline-graphic xmlns:xlink=\"http://www.w3.org/1999/xlink\" mime-subtype=\"png\" xlink:href=\"S0021900224000408_inline4.png\"/> <jats:tex-math>$K_n$</jats:tex-math> </jats:alternatives> </jats:inline-formula>, is upper bounded by <jats:inline-formula> <jats:alternatives> <jats:inline-graphic xmlns:xlink=\"http://www.w3.org/1999/xlink\" mime-subtype=\"png\" xlink:href=\"S0021900224000408_inline5.png\"/> <jats:tex-math>$Z_{geq D}$</jats:tex-math> </jats:alternatives> </jats:inline-formula>, which denotes the number of vertices that have degree at least as large as the degree of the root. We prove that <jats:inline-formula> <jats:alternatives> <jats:inline-graphic xmlns:xlink=\"http://www.w3.org/1999/xlink\" mime-subtype=\"png\" xlink:href=\"S0021900224000408_inline6.png\"/> <jats:tex-math>$ln Z_{geq D}$</jats:tex-math> </jats:alternatives> </jats:inline-formula> grows as <jats:inline-formula> <jats:alternatives> <jats:inline-graphic xmlns:xlink=\"http://www.w3.org/1999/xlink\" mime-subtype=\"png\" xlink:href=\"S0021900224000408_inline7.png\"/> <jats:tex-math>$ln n$</jats:tex-math> </jats:alternatives> </jats:inline-formula> asymptotically and obtain its limiting behavior in probability. Moreover, we obtain that the <jats:italic>k</jats:italic>th moment of <jats:inline-formula> <jats:alternatives> <jats:inline-graphic xmlns:xlink=\"http://www.w3.org/1999/xlink\" mime-subtype=\"png\" xlink:href=\"S0021900224000408_inline8.png\"/> <jats:tex-math>$ln Z_{geq D}$</jats:tex-math> </jats:alternatives> </jats:inline-formula> is proportional to <jats:inline-formula> <jats:alternatives> <jats:inline-graphic","PeriodicalId":50256,"journal":{"name":"Journal of Applied Probability","volume":"58 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Julio Backhoff, Joaquin Fontbona, Gonzalo Rios, Felipe Tobar
We present and study a novel algorithm for the computation of 2-Wasserstein population barycenters of absolutely continuous probability measures on Euclidean space. The proposed method can be seen as a stochastic gradient descent procedure in the 2-Wasserstein space, as well as a manifestation of a law of large numbers therein. The algorithm aims to find a Karcher mean or critical point in this setting, and can be implemented ‘online’, sequentially using independent and identically distributed random measures sampled from the population law. We provide natural sufficient conditions for this algorithm to almost surely converge in the Wasserstein space towards the population barycenter, and we introduce a novel, general condition which ensures uniqueness of Karcher means and, moreover, allows us to obtain explicit, parametric convergence rates for the expected optimality gap. We also study the mini-batch version of this algorithm, and discuss examples of families of population laws to which our method and results can be applied. This work expands and deepens ideas and results introduced in an early version of Backhoff-Veraguas et al. (2022), in which a statistical application (and numerical implementation) of this method is developed in the context of Bayesian learning.
{"title":"Stochastic gradient descent for barycenters in Wasserstein space","authors":"Julio Backhoff, Joaquin Fontbona, Gonzalo Rios, Felipe Tobar","doi":"10.1017/jpr.2024.39","DOIUrl":"https://doi.org/10.1017/jpr.2024.39","url":null,"abstract":"We present and study a novel algorithm for the computation of 2-Wasserstein population barycenters of absolutely continuous probability measures on Euclidean space. The proposed method can be seen as a stochastic gradient descent procedure in the 2-Wasserstein space, as well as a manifestation of a law of large numbers therein. The algorithm aims to find a Karcher mean or critical point in this setting, and can be implemented ‘online’, sequentially using independent and identically distributed random measures sampled from the population law. We provide natural sufficient conditions for this algorithm to almost surely converge in the Wasserstein space towards the population barycenter, and we introduce a novel, general condition which ensures uniqueness of Karcher means and, moreover, allows us to obtain explicit, parametric convergence rates for the expected optimality gap. We also study the mini-batch version of this algorithm, and discuss examples of families of population laws to which our method and results can be applied. This work expands and deepens ideas and results introduced in an early version of Backhoff-Veraguas <jats:italic>et al.</jats:italic> (2022), in which a statistical application (and numerical implementation) of this method is developed in the context of Bayesian learning.","PeriodicalId":50256,"journal":{"name":"Journal of Applied Probability","volume":"23 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194708","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider the estimation of rare-event probabilities using sample proportions output by naive Monte Carlo or collected data. Unlike using variance reduction techniques, this naive estimator does not have an a priori relative efficiency guarantee. On the other hand, due to the recent surge of sophisticated rare-event problems arising in safety evaluations of intelligent systems, efficiency-guaranteed variance reduction may face implementation challenges which, coupled with the availability of computation or data collection power, motivate the use of such a naive estimator. In this paper we study the uncertainty quantification, namely the construction, coverage validity, and tightness of confidence intervals, for rare-event probabilities using only sample proportions. In addition to the known normality, Wilson, and exact intervals, we investigate and compare them with two new intervals derived from Chernoff’s inequality and the Berry–Esseen theorem. Moreover, we generalize our results to the natural situation where sampling stops by reaching a target number of rare-event hits. Our findings show that the normality and Wilson intervals are not always valid, but they are close to the newly developed valid intervals in terms of half-width. In contrast, the exact interval is conservative, but safely guarantees the attainment of the nominal confidence level. Our new intervals, while being more conservative than the exact interval, provide useful insights into understanding the tightness of the considered intervals.
{"title":"Uncertainty quantification and confidence intervals for naive rare-event estimators","authors":"Yuanlu Bai, Henry Lam","doi":"10.1017/jpr.2024.43","DOIUrl":"https://doi.org/10.1017/jpr.2024.43","url":null,"abstract":"We consider the estimation of rare-event probabilities using sample proportions output by naive Monte Carlo or collected data. Unlike using variance reduction techniques, this naive estimator does not have an a priori relative efficiency guarantee. On the other hand, due to the recent surge of sophisticated rare-event problems arising in safety evaluations of intelligent systems, efficiency-guaranteed variance reduction may face implementation challenges which, coupled with the availability of computation or data collection power, motivate the use of such a naive estimator. In this paper we study the uncertainty quantification, namely the construction, coverage validity, and tightness of confidence intervals, for rare-event probabilities using only sample proportions. In addition to the known normality, Wilson, and exact intervals, we investigate and compare them with two new intervals derived from Chernoff’s inequality and the Berry–Esseen theorem. Moreover, we generalize our results to the natural situation where sampling stops by reaching a target number of rare-event hits. Our findings show that the normality and Wilson intervals are not always valid, but they are close to the newly developed valid intervals in terms of half-width. In contrast, the exact interval is conservative, but safely guarantees the attainment of the nominal confidence level. Our new intervals, while being more conservative than the exact interval, provide useful insights into understanding the tightness of the considered intervals.","PeriodicalId":50256,"journal":{"name":"Journal of Applied Probability","volume":"41 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The binary contact path process (BCPP) introduced in Griffeath (1983) describes the spread of an epidemic on a graph and is an auxiliary model in the study of improving upper bounds of the critical value of the contact process. In this paper, we are concerned with limit theorems of the occupation time of a normalized version of the BCPP (NBCPP) on a lattice. We first show that the law of large numbers of the occupation time process is driven by the identity function when the dimension of the lattice is at least 3 and the infection rate of the model is sufficiently large conditioned on the initial state of the NBCPP being distributed with a particular invariant distribution. Then we show that the centered occupation time process of the NBCPP converges in finite-dimensional distributions to a Brownian motion when the dimension of the lattice and the infection rate of the model are sufficiently large and the initial state of the NBCPP is distributed with the aforementioned invariant distribution.
{"title":"Limit theorems of occupation times of normalized binary contact path processes on lattices","authors":"Xiaofeng Xue","doi":"10.1017/jpr.2024.41","DOIUrl":"https://doi.org/10.1017/jpr.2024.41","url":null,"abstract":"The binary contact path process (BCPP) introduced in Griffeath (1983) describes the spread of an epidemic on a graph and is an auxiliary model in the study of improving upper bounds of the critical value of the contact process. In this paper, we are concerned with limit theorems of the occupation time of a normalized version of the BCPP (NBCPP) on a lattice. We first show that the law of large numbers of the occupation time process is driven by the identity function when the dimension of the lattice is at least 3 and the infection rate of the model is sufficiently large conditioned on the initial state of the NBCPP being distributed with a particular invariant distribution. Then we show that the centered occupation time process of the NBCPP converges in finite-dimensional distributions to a Brownian motion when the dimension of the lattice and the infection rate of the model are sufficiently large and the initial state of the NBCPP is distributed with the aforementioned invariant distribution.","PeriodicalId":50256,"journal":{"name":"Journal of Applied Probability","volume":"39 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider a Markov control model with Borel state space, metric compact action space, and transitions assumed to have a density function with respect to some probability measure satisfying some continuity conditions. We study the optimization problem of maximizing the probability of visiting some subset of the state space infinitely often, and we show that there exists an optimal stationary Markov policy for this problem. We endow the set of stationary Markov policies and the family of strategic probability measures with adequate topologies (namely, the narrow topology for Young measures and the $ws^infty$ -topology, respectively) to obtain compactness and continuity properties, which allow us to obtain our main results.
{"title":"Maximizing the probability of visiting a set infinitely often for a Markov decision process with Borel state and action spaces","authors":"François Dufour, Tomás Prieto-Rumeau","doi":"10.1017/jpr.2024.25","DOIUrl":"https://doi.org/10.1017/jpr.2024.25","url":null,"abstract":"We consider a Markov control model with Borel state space, metric compact action space, and transitions assumed to have a density function with respect to some probability measure satisfying some continuity conditions. We study the optimization problem of maximizing the probability of visiting some subset of the state space infinitely often, and we show that there exists an optimal stationary Markov policy for this problem. We endow the set of stationary Markov policies and the family of strategic probability measures with adequate topologies (namely, the narrow topology for Young measures and the <jats:inline-formula> <jats:alternatives> <jats:inline-graphic xmlns:xlink=\"http://www.w3.org/1999/xlink\" mime-subtype=\"png\" xlink:href=\"S0021900224000251_inline1.png\"/> <jats:tex-math> $ws^infty$ </jats:tex-math> </jats:alternatives> </jats:inline-formula>-topology, respectively) to obtain compactness and continuity properties, which allow us to obtain our main results.","PeriodicalId":50256,"journal":{"name":"Journal of Applied Probability","volume":"15 1","pages":""},"PeriodicalIF":1.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142194707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}