Christian Palmes, Tobias Bluhmki, Benedikt Funke, E. Bluhmki
Abstract The two one-sided t-tests (TOST) method is the most popular statistical equivalence test with many areas of application, i.e., in the pharmaceutical industry. Proper sample size calculation is needed in order to show equivalence with a certain power. Here, the crucial problem of choosing a suitable mean-difference in TOST sample size calculations is addressed. As an alternative concept, it is assumed that the mean-difference follows an a-priori distribution. Special interest is given to the uniform and some centered triangle a-priori distributions. Using a newly developed asymptotical theory a helpful analogy principle is found: every a-priori distribution corresponds to a point mean-difference, which we call its Schuirmann-constant. This constant does not depend on the standard deviation and aims to support the investigator in finding a well-considered mean-difference for proper sample size calculations in complex data situations. In addition to the proposed concept, we demonstrate that well-known sample size approximation formulas in the literature are in fact biased and state their unbiased corrections as well. Moreover, an R package is provided for a right away application of our newly developed concepts.
{"title":"Asymptotic properties of the two one-sided t-tests – new insights and the Schuirmann-constant","authors":"Christian Palmes, Tobias Bluhmki, Benedikt Funke, E. Bluhmki","doi":"10.1515/IJB-2020-0057","DOIUrl":"https://doi.org/10.1515/IJB-2020-0057","url":null,"abstract":"Abstract The two one-sided t-tests (TOST) method is the most popular statistical equivalence test with many areas of application, i.e., in the pharmaceutical industry. Proper sample size calculation is needed in order to show equivalence with a certain power. Here, the crucial problem of choosing a suitable mean-difference in TOST sample size calculations is addressed. As an alternative concept, it is assumed that the mean-difference follows an a-priori distribution. Special interest is given to the uniform and some centered triangle a-priori distributions. Using a newly developed asymptotical theory a helpful analogy principle is found: every a-priori distribution corresponds to a point mean-difference, which we call its Schuirmann-constant. This constant does not depend on the standard deviation and aims to support the investigator in finding a well-considered mean-difference for proper sample size calculations in complex data situations. In addition to the proposed concept, we demonstrate that well-known sample size approximation formulas in the literature are in fact biased and state their unbiased corrections as well. Moreover, an R package is provided for a right away application of our newly developed concepts.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"18 1","pages":"19 - 38"},"PeriodicalIF":1.2,"publicationDate":"2021-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/IJB-2020-0057","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46667419","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Semi-Markov models are widely used for survival analysis and reliability analysis. In general, there are two competing parameterizations and each entails its own interpretation and inference properties. On the one hand, a semi-Markov process can be defined based on the distribution of sojourn times, often via hazard rates, together with transition probabilities of an embedded Markov chain. On the other hand, intensity transition functions may be used, often referred to as the hazard rates of the semi-Markov process. We summarize and contrast these two parameterizations both from a probabilistic and an inference perspective, and we highlight relationships between the two approaches. In general, the intensity transition based approach allows the likelihood to be split into likelihoods of two-state models having fewer parameters, allowing efficient computation and usage of many survival analysis tools. Nevertheless, in certain cases the sojourn time based approach is natural and has been exploited extensively in applications. In contrasting the two approaches and contemporary relevant R packages used for inference, we use two real datasets highlighting the probabilistic and inference properties of each approach. This analysis is accompanied by an R vignette.
{"title":"Estimation of semi-Markov multi-state models: a comparison of the sojourn times and transition intensities approaches","authors":"A. Asanjarani, B. Liquet, Y. Nazarathy","doi":"10.1515/IJB-2020-0083","DOIUrl":"https://doi.org/10.1515/IJB-2020-0083","url":null,"abstract":"Abstract Semi-Markov models are widely used for survival analysis and reliability analysis. In general, there are two competing parameterizations and each entails its own interpretation and inference properties. On the one hand, a semi-Markov process can be defined based on the distribution of sojourn times, often via hazard rates, together with transition probabilities of an embedded Markov chain. On the other hand, intensity transition functions may be used, often referred to as the hazard rates of the semi-Markov process. We summarize and contrast these two parameterizations both from a probabilistic and an inference perspective, and we highlight relationships between the two approaches. In general, the intensity transition based approach allows the likelihood to be split into likelihoods of two-state models having fewer parameters, allowing efficient computation and usage of many survival analysis tools. Nevertheless, in certain cases the sojourn time based approach is natural and has been exploited extensively in applications. In contrasting the two approaches and contemporary relevant R packages used for inference, we use two real datasets highlighting the probabilistic and inference properties of each approach. This analysis is accompanied by an R vignette.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"18 1","pages":"243 - 262"},"PeriodicalIF":1.2,"publicationDate":"2020-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/IJB-2020-0083","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43491644","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Infectious disease transmission between individuals in a heterogeneous population is often best modelled through a contact network. However, such contact network data are often unobserved. Such missing data can be accounted for in a Bayesian data augmented framework using Markov chain Monte Carlo (MCMC). Unfortunately, fitting models in such a framework can be highly computationally intensive. We investigate the fitting of network-based infectious disease models with completely unknown contact networks using approximate Bayesian computation population Monte Carlo (ABC-PMC) methods. This is done in the context of both simulated data, and data from the UK 2001 foot-and-mouth disease epidemic. We show that ABC-PMC is able to obtain reasonable approximations of the underlying infectious disease model with huge savings in computation time when compared to a full Bayesian MCMC analysis.
{"title":"Incorporating Contact Network Uncertainty in Individual Level Models of Infectious Disease using Approximate Bayesian Computation","authors":"Waleed Almutiry, R. Deardon","doi":"10.1515/ijb-2017-0092","DOIUrl":"https://doi.org/10.1515/ijb-2017-0092","url":null,"abstract":"Abstract Infectious disease transmission between individuals in a heterogeneous population is often best modelled through a contact network. However, such contact network data are often unobserved. Such missing data can be accounted for in a Bayesian data augmented framework using Markov chain Monte Carlo (MCMC). Unfortunately, fitting models in such a framework can be highly computationally intensive. We investigate the fitting of network-based infectious disease models with completely unknown contact networks using approximate Bayesian computation population Monte Carlo (ABC-PMC) methods. This is done in the context of both simulated data, and data from the UK 2001 foot-and-mouth disease epidemic. We show that ABC-PMC is able to obtain reasonable approximations of the underlying infectious disease model with huge savings in computation time when compared to a full Bayesian MCMC analysis.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2019-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2017-0092","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42487422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marta Tallarita, M. De Iorio, A. Guglielmi, J. Malone‐Lee
Abstract We propose autoregressive Bayesian semi-parametric models for gap times between recurrent events. The aim is two-fold: inference on the effect of possibly time-varying covariates on the gap times and clustering of individuals based on the time trajectory of the recurrent event. Time-dependency between gap times is taken into account through the specification of an autoregressive component for the frailty parameters influencing the response at different times. The order of the autoregression may be assumed unknown and is an object of inference. We consider two alternative approaches to perform model selection under this scenario. Covariates may be easily included in the regression framework and censoring and missing data are easily accounted for. As the proposed methodologies lie within the class of Dirichlet process mixtures, posterior inference can be performed through efficient MCMC algorithms. We illustrate the approach through simulations and medical applications involving recurrent hospitalizations of cancer patients and successive urinary tract infections.
{"title":"Bayesian Autoregressive Frailty Models for Inference in Recurrent Events","authors":"Marta Tallarita, M. De Iorio, A. Guglielmi, J. Malone‐Lee","doi":"10.1515/ijb-2018-0088","DOIUrl":"https://doi.org/10.1515/ijb-2018-0088","url":null,"abstract":"Abstract We propose autoregressive Bayesian semi-parametric models for gap times between recurrent events. The aim is two-fold: inference on the effect of possibly time-varying covariates on the gap times and clustering of individuals based on the time trajectory of the recurrent event. Time-dependency between gap times is taken into account through the specification of an autoregressive component for the frailty parameters influencing the response at different times. The order of the autoregression may be assumed unknown and is an object of inference. We consider two alternative approaches to perform model selection under this scenario. Covariates may be easily included in the regression framework and censoring and missing data are easily accounted for. As the proposed methodologies lie within the class of Dirichlet process mixtures, posterior inference can be performed through efficient MCMC algorithms. We illustrate the approach through simulations and medical applications involving recurrent hospitalizations of cancer patients and successive urinary tract infections.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2019-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2018-0088","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47218529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Chambaz, Alan Hubbard, Alexander R. Luedtke, M. J. Laan
{"title":"Biostatistics in Africa 2019: A Special Issue of The International Journal of Biostatistics","authors":"A. Chambaz, Alan Hubbard, Alexander R. Luedtke, M. J. Laan","doi":"10.1515/ijb-2019-0125","DOIUrl":"https://doi.org/10.1515/ijb-2019-0125","url":null,"abstract":"","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2019-0125","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47478519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract Biostatistic applications often require to collect and analyze a massive amount of data. Hence, it has become necessary to consider new statistical paradigms that perform well in characterizing complex data. Nonparametric Bayesian methods provide a widely used framework that offers the key advantages of a fully model-based probabilistic framework, while being highly flexible and adaptable. The goal of this paper is to provide a motivation of Bayesian nonparametrics (BNP) through a particular biomedical application, namely Positron Emission Tomography (PET) imaging reconstruction.
{"title":"Bayesian Nonparametrics and Biostatistics: The Case of PET Imaging","authors":"Mame Diarra Fall","doi":"10.1515/ijb-2017-0099","DOIUrl":"https://doi.org/10.1515/ijb-2017-0099","url":null,"abstract":"Abstract Biostatistic applications often require to collect and analyze a massive amount of data. Hence, it has become necessary to consider new statistical paradigms that perform well in characterizing complex data. Nonparametric Bayesian methods provide a widely used framework that offers the key advantages of a fully model-based probabilistic framework, while being highly flexible and adaptable. The goal of this paper is to provide a motivation of Bayesian nonparametrics (BNP) through a particular biomedical application, namely Positron Emission Tomography (PET) imaging reconstruction.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2017-0099","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42462530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Valérie Garès, C. Dimeglio, G. Guernec, Romain Fantin, B. Lepage, M. Kosorok, N. Savy
Abstract Merging databases is a strategy of paramount interest especially in medical research. A common problem in this context comes from a variable which is not coded on the same scale in both databases we aim to merge. This paper considers the problem of finding a relevant way to recode the variable in order to merge these two databases. To address this issue, an algorithm, based on optimal transportation theory, is proposed. Optimal transportation theory gives us an application to map the measure associated with the variable in database A to the measure associated with the same variable in database B. To do so, a cost function has to be introduced and an allocation rule has to be defined. Such a function and such a rule is proposed involving the information contained in the covariates. In this paper, the method is compared to multiple imputation by chained equations and a statistical learning method and has demonstrated a better average accuracy in many situations. Applications on both simulated and real datasets show that the efficiency of the proposed merging algorithm depends on how the covariates are linked with the variable of interest.
{"title":"On the Use of Optimal Transportation Theory to Recode Variables and Application to Database Merging","authors":"Valérie Garès, C. Dimeglio, G. Guernec, Romain Fantin, B. Lepage, M. Kosorok, N. Savy","doi":"10.1515/ijb-2018-0106","DOIUrl":"https://doi.org/10.1515/ijb-2018-0106","url":null,"abstract":"Abstract Merging databases is a strategy of paramount interest especially in medical research. A common problem in this context comes from a variable which is not coded on the same scale in both databases we aim to merge. This paper considers the problem of finding a relevant way to recode the variable in order to merge these two databases. To address this issue, an algorithm, based on optimal transportation theory, is proposed. Optimal transportation theory gives us an application to map the measure associated with the variable in database A to the measure associated with the same variable in database B. To do so, a cost function has to be introduced and an allocation rule has to be defined. Such a function and such a rule is proposed involving the information contained in the covariates. In this paper, the method is compared to multiple imputation by chained equations and a statistical learning method and has demonstrated a better average accuracy in many situations. Applications on both simulated and real datasets show that the efficiency of the proposed merging algorithm depends on how the covariates are linked with the variable of interest.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2019-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2018-0106","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42506480","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mor Absa Loum, Marie-Anne Poursat, A. Sow, A. Sall, C. Loucoubar, E. Gassiat
Abstract In tropical regions, populations continue to suffer morbidity and mortality from malaria and arboviral diseases. In Kedougou (Senegal), these illnesses are all endemic due to the climate and its geographical position. The co-circulation of malaria parasites and arboviruses can explain the observation of coinfected cases. Indeed there is strong resemblance in symptoms between these diseases making problematic targeted medical care of coinfected cases. This is due to the fact that the origin of illness is not obviously known. Some cases could be immunized against one or the other of the pathogens, immunity typically acquired with factors like age and exposure as usual for endemic area. Thus, coinfection needs to be better diagnosed. Using data collected from patients in Kedougou region, from 2009 to 2013, we adjusted a multinomial logistic model and selected relevant variables in explaining coinfection status. We observed specific sets of variables explaining each of the diseases exclusively and the coinfection. We tested the independence between arboviral and malaria infections and derived coinfection probabilities from the model fitting. In case of a coinfection probability greater than a threshold value to be calibrated on the data, long duration of illness and age are mostly indicative of arboviral disease while high body temperature and presence of nausea or vomiting symptoms during the rainy season are mostly indicative of malaria disease.
{"title":"Multinomial Logistic Model for Coinfection Diagnosis Between Arbovirus and Malaria in Kedougou","authors":"Mor Absa Loum, Marie-Anne Poursat, A. Sow, A. Sall, C. Loucoubar, E. Gassiat","doi":"10.1515/ijb-2017-0015","DOIUrl":"https://doi.org/10.1515/ijb-2017-0015","url":null,"abstract":"Abstract In tropical regions, populations continue to suffer morbidity and mortality from malaria and arboviral diseases. In Kedougou (Senegal), these illnesses are all endemic due to the climate and its geographical position. The co-circulation of malaria parasites and arboviruses can explain the observation of coinfected cases. Indeed there is strong resemblance in symptoms between these diseases making problematic targeted medical care of coinfected cases. This is due to the fact that the origin of illness is not obviously known. Some cases could be immunized against one or the other of the pathogens, immunity typically acquired with factors like age and exposure as usual for endemic area. Thus, coinfection needs to be better diagnosed. Using data collected from patients in Kedougou region, from 2009 to 2013, we adjusted a multinomial logistic model and selected relevant variables in explaining coinfection status. We observed specific sets of variables explaining each of the diseases exclusively and the coinfection. We tested the independence between arboviral and malaria infections and derived coinfection probabilities from the model fitting. In case of a coinfection probability greater than a threshold value to be calibrated on the data, long duration of illness and age are mostly indicative of arboviral disease while high body temperature and presence of nausea or vomiting symptoms during the rainy season are mostly indicative of malaria disease.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2018-01-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2017-0015","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48514925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract We present new methods for cell line classification using multivariate time series bioimpedance data obtained from electric cell-substrate impedance sensing (ECIS) technology. The ECIS technology, which monitors the attachment and spreading of mammalian cells in real time through the collection of electrical impedance data, has historically been used to study one cell line at a time. However, we show that if applied to data from multiple cell lines, ECIS can be used to classify unknown or potentially mislabeled cells, factors which have previously been associated with the reproducibility crisis in the biological literature. We assess a range of approaches to this new problem, testing different classification methods and deriving a dictionary of 29 features to characterize ECIS data. Most notably, our analysis enriches the current field by making use of simultaneous multi-frequency ECIS data, where previous studies have focused on only one frequency; using classification methods to distinguish multiple cell lines, rather than simple statistical tests that compare only two cell lines; and assessing a range of features derived from ECIS data based on their classification performance. In classification tests on fifteen mammalian cell lines, we obtain very high out-of-sample predictive accuracy. These preliminary findings provide a baseline for future large-scale studies in this field.
{"title":"Cell Line Classification Using Electric Cell-Substrate Impedance Sensing (ECIS)","authors":"Megan L. Gelsinger, Laura L. Tupper, D. Matteson","doi":"10.1515/ijb-2018-0083","DOIUrl":"https://doi.org/10.1515/ijb-2018-0083","url":null,"abstract":"Abstract We present new methods for cell line classification using multivariate time series bioimpedance data obtained from electric cell-substrate impedance sensing (ECIS) technology. The ECIS technology, which monitors the attachment and spreading of mammalian cells in real time through the collection of electrical impedance data, has historically been used to study one cell line at a time. However, we show that if applied to data from multiple cell lines, ECIS can be used to classify unknown or potentially mislabeled cells, factors which have previously been associated with the reproducibility crisis in the biological literature. We assess a range of approaches to this new problem, testing different classification methods and deriving a dictionary of 29 features to characterize ECIS data. Most notably, our analysis enriches the current field by making use of simultaneous multi-frequency ECIS data, where previous studies have focused on only one frequency; using classification methods to distinguish multiple cell lines, rather than simple statistical tests that compare only two cell lines; and assessing a range of features derived from ECIS data based on their classification performance. In classification tests on fifteen mammalian cell lines, we obtain very high out-of-sample predictive accuracy. These preliminary findings provide a baseline for future large-scale studies in this field.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":"16 1","pages":""},"PeriodicalIF":1.2,"publicationDate":"2017-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2018-0083","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49126933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abstract: We introduce combinatorial mixtures – a flexible class of models for inference on mixture distributions whose components have multidimensional parameters. The key idea is to allow each element of the component-specific parameter vectors to be shared by a subset of other components. This approach allows for mixtures that range from very flexible to very parsimonious and unifies inference on component-specific parameters with inference on the number of components. We develop Bayesian inference and computational approaches for this class of distributions, and illustrate them in an application. This work was originally motivated by the analysis of cancer subtypes: in terms of biological measures of interest, subtypes may be characterized by differences in location, scale, correlations or any of the combinations. We illustrate our approach using publicly available data on molecular subtypes of lung and prostate cancers.
{"title":"Combinatorial Mixtures of Multiparameter Distributions: An Application to Bivariate Data","authors":"V. Edefonti, G. Parmigiani","doi":"10.1515/ijb-2015-0064","DOIUrl":"https://doi.org/10.1515/ijb-2015-0064","url":null,"abstract":"Abstract: We introduce combinatorial mixtures – a flexible class of models for inference on mixture distributions whose components have multidimensional parameters. The key idea is to allow each element of the component-specific parameter vectors to be shared by a subset of other components. This approach allows for mixtures that range from very flexible to very parsimonious and unifies inference on component-specific parameters with inference on the number of components. We develop Bayesian inference and computational approaches for this class of distributions, and illustrate them in an application. This work was originally motivated by the analysis of cancer subtypes: in terms of biological measures of interest, subtypes may be characterized by differences in location, scale, correlations or any of the combinations. We illustrate our approach using publicly available data on molecular subtypes of lung and prostate cancers.","PeriodicalId":50333,"journal":{"name":"International Journal of Biostatistics","volume":" ","pages":""},"PeriodicalIF":1.2,"publicationDate":"2017-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1515/ijb-2015-0064","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44982771","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}