Bayesian mixture models are increasingly used for model‐based clustering and the follow‐up analysis on the clusters identified. As such, they are of particular interest for analyzing cytometry data where unsupervised clustering and association studies are often part of the scientific questions. Cytometry data are large quantitative data measured in a multidimensional space that typically ranges from a few dimensions to several dozens, and which keeps increasing due to innovative high‐throughput biotechonologies. We present several recent parametric and nonparametric Bayesian mixture modeling approaches, and describe advantages and limitations of these models under different research context for cytometry data analysis. We also acknowledge current computational challenges associated with the use of Bayesian mixture models for analyzing cytometry data, and we draw attention to recent developments in advanced numerical algorithms for estimating large Bayesian mixture models, which we believe have the potential to make Bayesian mixture model more applicable to new types of single‐cell data with higher dimensions.
{"title":"Bayesian mixture models for cytometry data analysis","authors":"Lin Lin, B. Hejblum","doi":"10.1002/wics.1535","DOIUrl":"https://doi.org/10.1002/wics.1535","url":null,"abstract":"Bayesian mixture models are increasingly used for model‐based clustering and the follow‐up analysis on the clusters identified. As such, they are of particular interest for analyzing cytometry data where unsupervised clustering and association studies are often part of the scientific questions. Cytometry data are large quantitative data measured in a multidimensional space that typically ranges from a few dimensions to several dozens, and which keeps increasing due to innovative high‐throughput biotechonologies. We present several recent parametric and nonparametric Bayesian mixture modeling approaches, and describe advantages and limitations of these models under different research context for cytometry data analysis. We also acknowledge current computational challenges associated with the use of Bayesian mixture models for analyzing cytometry data, and we draw attention to recent developments in advanced numerical algorithms for estimating large Bayesian mixture models, which we believe have the potential to make Bayesian mixture model more applicable to new types of single‐cell data with higher dimensions.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1535","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48335075","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Item response theory (IRT) is a class of latent variable models, which are used to develop educational and psychological tests (e.g., standardized tests, personality tests, tests for licensure, and certification). We review the theory and practices of IRT across two articles. In Part 1, we provide a broad range of topics such as foundations of educational measurement, basics of IRT, and applications of IRT using R. We focus particularly on the topics that the mirt package covers. These include unidimensional and multidimensional IRT models for dichotomous and polytomous items with continuous and discrete factors, confirmatory analysis and multigroup analysis in IRT, and estimation algorithms. In Part 2, on the other hand, we focus on more practical aspects of IRT, namely scoring, scaling, and equating.
{"title":"Item response theory and its applications in educational measurement Part I: Item response theory and its implementation in R","authors":"Kazuki Hori, Hirotaka Fukuhara, Tsuyoshi Yamada","doi":"10.1002/wics.1531","DOIUrl":"https://doi.org/10.1002/wics.1531","url":null,"abstract":"Item response theory (IRT) is a class of latent variable models, which are used to develop educational and psychological tests (e.g., standardized tests, personality tests, tests for licensure, and certification). We review the theory and practices of IRT across two articles. In Part 1, we provide a broad range of topics such as foundations of educational measurement, basics of IRT, and applications of IRT using R. We focus particularly on the topics that the mirt package covers. These include unidimensional and multidimensional IRT models for dichotomous and polytomous items with continuous and discrete factors, confirmatory analysis and multigroup analysis in IRT, and estimation algorithms. In Part 2, on the other hand, we focus on more practical aspects of IRT, namely scoring, scaling, and equating.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1531","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44495133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Meyer, M. Edwards, P. Maturana-Russel, N. Christensen
Since the very first detection of gravitational waves from the coalescence of two black holes in 2015, Bayesian statistical methods have been routinely applied by LIGO and Virgo to extract the signal out of noisy interferometric measurements, obtain point estimates of the physical parameters responsible for producing the signal, and rigorously quantify their uncertainties. Different computational techniques have been devised depending on the source of the gravitational radiation and the gravitational waveform model used. Prominent sources of gravitational waves are binary black hole or neutron star mergers, the only objects that have been observed by detectors to date. But also gravitational waves from core‐collapse supernovae, rapidly rotating neutron stars, and the stochastic gravitational‐wave background are in the sensitivity band of the ground‐based interferometers and expected to be observable in future observation runs. As nonlinearities of the complex waveforms and the high‐dimensional parameter spaces preclude analytic evaluation of the posterior distribution, posterior inference for all these sources relies on computer‐intensive simulation techniques such as Markov chain Monte Carlo methods. A review of state‐of‐the‐art Bayesian statistical parameter estimation methods will be given for researchers in this cross‐disciplinary area of gravitational wave data analysis.
{"title":"Computational techniques for parameter estimation of gravitational wave signals","authors":"R. Meyer, M. Edwards, P. Maturana-Russel, N. Christensen","doi":"10.1002/wics.1532","DOIUrl":"https://doi.org/10.1002/wics.1532","url":null,"abstract":"Since the very first detection of gravitational waves from the coalescence of two black holes in 2015, Bayesian statistical methods have been routinely applied by LIGO and Virgo to extract the signal out of noisy interferometric measurements, obtain point estimates of the physical parameters responsible for producing the signal, and rigorously quantify their uncertainties. Different computational techniques have been devised depending on the source of the gravitational radiation and the gravitational waveform model used. Prominent sources of gravitational waves are binary black hole or neutron star mergers, the only objects that have been observed by detectors to date. But also gravitational waves from core‐collapse supernovae, rapidly rotating neutron stars, and the stochastic gravitational‐wave background are in the sensitivity band of the ground‐based interferometers and expected to be observable in future observation runs. As nonlinearities of the complex waveforms and the high‐dimensional parameter spaces preclude analytic evaluation of the posterior distribution, posterior inference for all these sources relies on computer‐intensive simulation techniques such as Markov chain Monte Carlo methods. A review of state‐of‐the‐art Bayesian statistical parameter estimation methods will be given for researchers in this cross‐disciplinary area of gravitational wave data analysis.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1532","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47899957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
While Poisson regression serves as a standard tool for modeling the association between a count response variable and explanatory variables, it is well‐documented that this approach is limited by the Poisson model's assumption of data equi‐dispersion. The Conway–Maxwell–Poisson (COM‐Poisson) distribution has demonstrated itself as a viable alternative for real count data that express data over‐ or under‐dispersion, and thus the COM‐Poisson regression can flexibly model associations involving a discrete count response variable and covariates. This work overviews the ongoing developmental knowledge and advancement of COM‐Poisson regression, introducing the reader to the underlying model (and its considered reparametrizations) and related regression constructs, including zero‐inflated models, and longitudinal studies. This manuscript further introduces readers to associated computing tools available to perform COM‐Poisson and related regressions.
{"title":"Conway–Maxwell–Poisson regression models for dispersed count data","authors":"Kimberly F. Sellers, Bailey Premeaux","doi":"10.1002/wics.1533","DOIUrl":"https://doi.org/10.1002/wics.1533","url":null,"abstract":"While Poisson regression serves as a standard tool for modeling the association between a count response variable and explanatory variables, it is well‐documented that this approach is limited by the Poisson model's assumption of data equi‐dispersion. The Conway–Maxwell–Poisson (COM‐Poisson) distribution has demonstrated itself as a viable alternative for real count data that express data over‐ or under‐dispersion, and thus the COM‐Poisson regression can flexibly model associations involving a discrete count response variable and covariates. This work overviews the ongoing developmental knowledge and advancement of COM‐Poisson regression, introducing the reader to the underlying model (and its considered reparametrizations) and related regression constructs, including zero‐inflated models, and longitudinal studies. This manuscript further introduces readers to associated computing tools available to perform COM‐Poisson and related regressions.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1533","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43051686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The task to write on data analysis on nonstandard spaces is quite substantial, with a huge body of literature to cover, from parametric to nonparametrics, from shape spaces to Wasserstein spaces. In this survey we convey simple (e.g., Fréchet means) and more complicated ideas (e.g., empirical process theory), common to many approaches with focus on their interaction with one‐another. Indeed, this field is fast growing and it is imperative to develop a mathematical view point, drawing power, and diversity from a higher level of abstraction, for example, by introducing generalized Fréchet means. While many problems have found ingenious solutions (e.g., Procrustes analysis for principal component analysis [PCA] extensions on shape spaces and diffusion on the frame bundle to mimic anisotropic Gaussians), more problems emerge, often more difficult (e.g., topology and geometry influencing limiting rates and defining generic intrinsic PCA extensions). Along this survey, we point out some open problems, that will, as it seems, keep mathematicians, statisticians, computer and data scientists busy for a while.
{"title":"Data analysis on nonstandard spaces","authors":"S. Huckemann, B. Eltzner","doi":"10.1002/wics.1526","DOIUrl":"https://doi.org/10.1002/wics.1526","url":null,"abstract":"The task to write on data analysis on nonstandard spaces is quite substantial, with a huge body of literature to cover, from parametric to nonparametrics, from shape spaces to Wasserstein spaces. In this survey we convey simple (e.g., Fréchet means) and more complicated ideas (e.g., empirical process theory), common to many approaches with focus on their interaction with one‐another. Indeed, this field is fast growing and it is imperative to develop a mathematical view point, drawing power, and diversity from a higher level of abstraction, for example, by introducing generalized Fréchet means. While many problems have found ingenious solutions (e.g., Procrustes analysis for principal component analysis [PCA] extensions on shape spaces and diffusion on the frame bundle to mimic anisotropic Gaussians), more problems emerge, often more difficult (e.g., topology and geometry influencing limiting rates and defining generic intrinsic PCA extensions). Along this survey, we point out some open problems, that will, as it seems, keep mathematicians, statisticians, computer and data scientists busy for a while.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1526","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42836597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This article presents an overview of statistical methods for the analysis of discrete failure times with competing events. We describe the most commonly used modeling approaches for this type of data, including discrete versions of the cause‐specific hazards model and the subdistribution hazard model. In addition to discussing the characteristics of these methods, we present approaches to nonparametric estimation and model validation. Our literature review suggests that discrete competing‐risks analysis has gained substantial interest in the research community and is used regularly in econometrics, biostatistics, and educational research.
{"title":"Competing risks analysis for discrete time‐to‐event data","authors":"M. Schmid, M. Berger","doi":"10.1002/wics.1529","DOIUrl":"https://doi.org/10.1002/wics.1529","url":null,"abstract":"This article presents an overview of statistical methods for the analysis of discrete failure times with competing events. We describe the most commonly used modeling approaches for this type of data, including discrete versions of the cause‐specific hazards model and the subdistribution hazard model. In addition to discussing the characteristics of these methods, we present approaches to nonparametric estimation and model validation. Our literature review suggests that discrete competing‐risks analysis has gained substantial interest in the research community and is used regularly in econometrics, biostatistics, and educational research.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1529","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42117090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In today's world of engineering evolution, the need for optimized design has led to development of a plethora of optimization algorithms. Right from hardware engineering design problems that need optimization of design parameters to software applications that require reduction of data sets, optimization algorithms play a vital role. These algorithms are either based on statistical measures or on heuristics. Traditional optimization algorithms use statistical methods in which the optimal solution may not be the global minimal point. These standard optimization techniques are more application specific and demand different parameter sets for different applications. Rather, the bio‐inspired meta‐heuristic algorithms act like black boxes enabling multiple applications with definite global optimal solutions. This review work gives an insight of various bio‐inspired optimization algorithms including dragonfly algorithm, the whale optimization algorithm, gray wolf optimizer, moth‐flame optimization algorithm, cuckoo optimization algorithm, artificial bee colony algorithm, ant colony optimization, grasshopper optimization algorithm, binary bat algorithm, salp algorithm, and the ant lion optimizer. The biological behaviors of the living things that lead to modeling of these algorithms have been discussed in detail. The parametric setting of each algorithm has been studied and their evaluation with benchmark test functions has been reviewed. Also their application to real‐world engineering design problems has been discussed. Based on these characteristics, the possibility to extend these algorithms to data set optimization, feature set reduction, or optimization has been discussed.
{"title":"Critical review of bio‐inspired optimization techniques","authors":"Anita Christaline Johnvictor, Vaishali Durgamahanthi, Ramya Meghana Pariti Venkata, Nishtha Jethi","doi":"10.1002/wics.1528","DOIUrl":"https://doi.org/10.1002/wics.1528","url":null,"abstract":"In today's world of engineering evolution, the need for optimized design has led to development of a plethora of optimization algorithms. Right from hardware engineering design problems that need optimization of design parameters to software applications that require reduction of data sets, optimization algorithms play a vital role. These algorithms are either based on statistical measures or on heuristics. Traditional optimization algorithms use statistical methods in which the optimal solution may not be the global minimal point. These standard optimization techniques are more application specific and demand different parameter sets for different applications. Rather, the bio‐inspired meta‐heuristic algorithms act like black boxes enabling multiple applications with definite global optimal solutions. This review work gives an insight of various bio‐inspired optimization algorithms including dragonfly algorithm, the whale optimization algorithm, gray wolf optimizer, moth‐flame optimization algorithm, cuckoo optimization algorithm, artificial bee colony algorithm, ant colony optimization, grasshopper optimization algorithm, binary bat algorithm, salp algorithm, and the ant lion optimizer. The biological behaviors of the living things that lead to modeling of these algorithms have been discussed in detail. The parametric setting of each algorithm has been studied and their evaluation with benchmark test functions has been reviewed. Also their application to real‐world engineering design problems has been discussed. Based on these characteristics, the possibility to extend these algorithms to data set optimization, feature set reduction, or optimization has been discussed.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1528","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47957257","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fisher's classical likelihood has become the standard procedure to make inference for fixed unknown parameters. Recently, inferences of unobservable random variables, such as random effects, factors, missing values, etc., have become important in statistical analysis. Because Fisher's likelihood cannot have such unobservable random variables, the full Bayesian method is only available for inference. An alternative likelihood approach is proposed by Lee and Nelder. In the context of Fisher likelihood, the likelihood principle means that the likelihood function carries all relevant information regarding the fixed unknown parameters. Bjørnstad extended the likelihood principle to extended likelihood principle; all information in the observed data for fixed unknown parameters and unobservables are in the extended likelihood, such as the h‐likelihood. However, it turns out that the use of extended likelihood for inferences is not as straightforward as the Fisher likelihood. In this paper, we describe how to extract information of the data from the h‐likelihood. This provides a new way of statistical inferences in entire fields of statistical science.
{"title":"A review of h‐likelihood and hierarchical generalized linear model","authors":"Shaobo Jin, Youngjo Lee","doi":"10.1002/wics.1527","DOIUrl":"https://doi.org/10.1002/wics.1527","url":null,"abstract":"Fisher's classical likelihood has become the standard procedure to make inference for fixed unknown parameters. Recently, inferences of unobservable random variables, such as random effects, factors, missing values, etc., have become important in statistical analysis. Because Fisher's likelihood cannot have such unobservable random variables, the full Bayesian method is only available for inference. An alternative likelihood approach is proposed by Lee and Nelder. In the context of Fisher likelihood, the likelihood principle means that the likelihood function carries all relevant information regarding the fixed unknown parameters. Bjørnstad extended the likelihood principle to extended likelihood principle; all information in the observed data for fixed unknown parameters and unobservables are in the extended likelihood, such as the h‐likelihood. However, it turns out that the use of extended likelihood for inferences is not as straightforward as the Fisher likelihood. In this paper, we describe how to extract information of the data from the h‐likelihood. This provides a new way of statistical inferences in entire fields of statistical science.","PeriodicalId":47779,"journal":{"name":"Wiley Interdisciplinary Reviews-Computational Statistics","volume":" ","pages":""},"PeriodicalIF":3.2,"publicationDate":"2020-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1002/wics.1527","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45693409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}