首页 > 最新文献

Statistical Modeling最新文献

英文 中文
Mixture of linear mixed models for clustering gene expression profiles from repeated microarray experiments 从重复微阵列实验中聚类基因表达谱的线性混合模型的混合物
Pub Date : 2005-10-01 DOI: 10.1191/1471082X05st096oa
G. Celeux, O. Martin, C. Lavergne
Data variability can be important in microarray data analysis. Thus, when clustering gene expression profiles, it could be judicious to make use of repeated data. In this paper, the problem of analysing repeated data in the model-based cluster analysis context is considered. Linear mixed models are chosen to take into account data variability and mixture of these models are considered. This leads to a large range of possible models depending on the assumptions made on both the covariance structure of the observations and the mixture model. The maximum likelihood estimation of this family of models through the EM algorithm is presented. The problem of selecting a particular mixture of linear mixed models is considered using penalized likelihood criteria. Illustrative Monte Carlo experiments are presented and an application to the clustering of gene expression profiles is detailed. All those experiments highlight the interest of linear mixed model mixtures to take into account data variability in a cluster analysis context.
数据变异性在微阵列数据分析中很重要。因此,在聚类基因表达谱时,利用重复数据可能是明智的。本文研究了基于模型的聚类分析环境中重复数据的分析问题。考虑到数据的可变性,选择了线性混合模型,并考虑了这些模型的混合。这就导致根据对观测值的协方差结构和混合模型所作的假设,产生了很大范围的可能模型。利用EM算法对这类模型进行了极大似然估计。利用惩罚似然准则考虑了选择特定线性混合模型的问题。介绍了蒙特卡罗实验,并详细介绍了在基因表达谱聚类中的应用。所有这些实验都突出了线性混合模型在聚类分析背景下考虑数据可变性的兴趣。
{"title":"Mixture of linear mixed models for clustering gene expression profiles from repeated microarray experiments","authors":"G. Celeux, O. Martin, C. Lavergne","doi":"10.1191/1471082X05st096oa","DOIUrl":"https://doi.org/10.1191/1471082X05st096oa","url":null,"abstract":"Data variability can be important in microarray data analysis. Thus, when clustering gene expression profiles, it could be judicious to make use of repeated data. In this paper, the problem of analysing repeated data in the model-based cluster analysis context is considered. Linear mixed models are chosen to take into account data variability and mixture of these models are considered. This leads to a large range of possible models depending on the assumptions made on both the covariance structure of the observations and the mixture model. The maximum likelihood estimation of this family of models through the EM algorithm is presented. The problem of selecting a particular mixture of linear mixed models is considered using penalized likelihood criteria. Illustrative Monte Carlo experiments are presented and an application to the clustering of gene expression profiles is detailed. All those experiments highlight the interest of linear mixed model mixtures to take into account data variability in a cluster analysis context.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132911325","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 103
A latent variable scorecard for neonatal baby frailty 新生儿虚弱的潜在变量记分卡
Pub Date : 2005-07-01 DOI: 10.1191/1471082X05st093oa
J. Bowden, J. Whittaker
A latent variable frailty model is built for data coming from a neonatal study conducted to investigate whether the presence of a particular hospital service given to families with premature babies has a positive effect on their care requirements within the first year of life. The predicted value of the latent frailty term from information obtained from the family in advance of the birth furnishes an overall measure of the quality of health of the baby. This identifies families at risk. Maximum likelihood and Bayesian approaches are used to estimate the effect of the variables on the value of the latent baby frailty and for prediction of health complications. It is found that these give much the same estimates of regression coefficients, but that the variance components are the more difficult to estimate. We indicate how the findings from the model may be presented as a scorecard for predicting frailty, and so be useful to doctors working in hospital neonatal units. New information about a baby is automatically combined with the current score to provide an up-to-date score, so that rapid decisions for taking appropriate action are made more possible. A diagnostic procedure is proposed to assess how well the independence assumptions of the model are met in fitting to this data. It is concluded that the frailty model provides an informative summary of the data from this neonatal study.
针对新生儿研究的数据,建立了一个潜在变量脆弱性模型,该模型旨在调查向有早产儿的家庭提供的特定医院服务是否对其出生后第一年的护理需求有积极影响。的预测价值潜在的弱点任期从从家庭获得的信息在出生之前提供一个总体衡量质量的婴儿的健康。这可以识别处于危险中的家庭。最大似然和贝叶斯方法用于估计变量对潜在婴儿虚弱值的影响,并用于预测健康并发症。我们发现,这些给出了大致相同的回归系数估计,但方差成分更难估计。我们指出,该模型的发现可能作为预测虚弱的计分卡,因此对在医院新生儿病房工作的医生有用。有关婴儿的新信息会自动与当前得分相结合,以提供最新的得分,因此更有可能快速做出采取适当行动的决定。提出了一种诊断程序,以评估模型的独立性假设在拟合该数据时满足的程度。结论是衰弱模型提供了新生儿研究数据的翔实总结。
{"title":"A latent variable scorecard for neonatal baby frailty","authors":"J. Bowden, J. Whittaker","doi":"10.1191/1471082X05st093oa","DOIUrl":"https://doi.org/10.1191/1471082X05st093oa","url":null,"abstract":"A latent variable frailty model is built for data coming from a neonatal study conducted to investigate whether the presence of a particular hospital service given to families with premature babies has a positive effect on their care requirements within the first year of life. The predicted value of the latent frailty term from information obtained from the family in advance of the birth furnishes an overall measure of the quality of health of the baby. This identifies families at risk. Maximum likelihood and Bayesian approaches are used to estimate the effect of the variables on the value of the latent baby frailty and for prediction of health complications. It is found that these give much the same estimates of regression coefficients, but that the variance components are the more difficult to estimate. We indicate how the findings from the model may be presented as a scorecard for predicting frailty, and so be useful to doctors working in hospital neonatal units. New information about a baby is automatically combined with the current score to provide an up-to-date score, so that rapid decisions for taking appropriate action are made more possible. A diagnostic procedure is proposed to assess how well the independence assumptions of the model are met in fitting to this data. It is concluded that the frailty model provides an informative summary of the data from this neonatal study.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130692825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Measuring customer quality in retail banking 衡量零售银行客户质量
Pub Date : 2005-07-01 DOI: 10.1191/1471082X05st092oa
D. Hand, M. Crowder
The retail banking sector makes heavy use of statistical models to predict various aspects of customer behaviour. These models are built using data from earlier customers, but have several weaknesses. An alternative approach, widely used in social measurement, but apparently not yet applied in the retail banking sector, is to use latent-variable techniques to measure the underlying key aspect of customer behaviour. This paper describes such a model that separates the observed variables for a customer into primary characteristics on the one hand, and indicators of previous behaviour on the other, and links the two via a latent variable that we identify as ‘customer quality’. We describe how to estimate the conditional distribution of customer quality, given the observed values of primary characteristics and past behaviour.
零售银行业大量使用统计模型来预测客户行为的各个方面。这些模型是使用早期客户的数据构建的,但有几个弱点。另一种方法是使用潜在变量技术来衡量客户行为的潜在关键方面,这种方法广泛用于社会衡量,但显然尚未应用于零售银行业。本文描述了这样一个模型,该模型将观察到的客户变量分离为一方面的主要特征,另一方面的先前行为指标,并通过我们识别为“客户质量”的潜在变量将两者联系起来。我们描述了如何估计客户质量的条件分布,给定的观察值的主要特征和过去的行为。
{"title":"Measuring customer quality in retail banking","authors":"D. Hand, M. Crowder","doi":"10.1191/1471082X05st092oa","DOIUrl":"https://doi.org/10.1191/1471082X05st092oa","url":null,"abstract":"The retail banking sector makes heavy use of statistical models to predict various aspects of customer behaviour. These models are built using data from earlier customers, but have several weaknesses. An alternative approach, widely used in social measurement, but apparently not yet applied in the retail banking sector, is to use latent-variable techniques to measure the underlying key aspect of customer behaviour. This paper describes such a model that separates the observed variables for a customer into primary characteristics on the one hand, and indicators of previous behaviour on the other, and links the two via a latent variable that we identify as ‘customer quality’. We describe how to estimate the conditional distribution of customer quality, given the observed values of primary characteristics and past behaviour.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133761420","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 21
The role of perturbation in compositional data analysis 摄动在成分数据分析中的作用
Pub Date : 2005-07-01 DOI: 10.1191/1471082X05st091oa
J. Aitchison, K. Ng
In standard multivariate statistical analysis, common hypotheses of interest concern changes in mean vectors and subvectors. In compositional data analysis it is now well established that compositional change is most readily described in terms of the simplicial operation of perturbation and that subcompositions replace the marginal concept of subvectors. Against the background of two motivating experimental studies in the food industry, involving the compositions of cow’s milk and chicken carcasses, this paper emphasizes the importance of recognizing this fundamental operation of change in the associated simplex sample space. Well-defined hypotheses about the nature of any compositional effect can be expressed, for example, in terms of perturbation values and subcompositional stability and testing procedures developed. These procedures are applied to lattices of such hypotheses in the two practical situations. We identify the two problems as being the counterpart of the analysis of paired comparison or split plot experiments and of separate sample comparative experiments in the jargon of standard multivariate analysis.
在标准的多元统计分析中,常见的假设涉及平均向量和子向量的变化。在成分数据分析中,现在已经很好地确定,成分变化最容易用简单的摄动运算来描述,并且子成分取代了子向量的边缘概念。在食品工业的两个激励实验研究的背景下,涉及牛奶和鸡尸体的成分,本文强调了认识到相关单纯形样本空间变化的基本操作的重要性。例如,可以用扰动值和亚成分稳定性以及开发的测试程序来表示关于任何成分效应性质的定义良好的假设。这些程序在两种实际情况下应用于这些假设的格。我们认为这两个问题是配对比较或分裂图实验和标准多变量分析术语中的单独样本比较实验分析的对应问题。
{"title":"The role of perturbation in compositional data analysis","authors":"J. Aitchison, K. Ng","doi":"10.1191/1471082X05st091oa","DOIUrl":"https://doi.org/10.1191/1471082X05st091oa","url":null,"abstract":"In standard multivariate statistical analysis, common hypotheses of interest concern changes in mean vectors and subvectors. In compositional data analysis it is now well established that compositional change is most readily described in terms of the simplicial operation of perturbation and that subcompositions replace the marginal concept of subvectors. Against the background of two motivating experimental studies in the food industry, involving the compositions of cow’s milk and chicken carcasses, this paper emphasizes the importance of recognizing this fundamental operation of change in the associated simplex sample space. Well-defined hypotheses about the nature of any compositional effect can be expressed, for example, in terms of perturbation values and subcompositional stability and testing procedures developed. These procedures are applied to lattices of such hypotheses in the two practical situations. We identify the two problems as being the counterpart of the analysis of paired comparison or split plot experiments and of separate sample comparative experiments in the jargon of standard multivariate analysis.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"41 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131008130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 33
Graphical chain models for the analysis of complex genetic diseases: an application to hypertension 用于分析复杂遗传疾病的图形链模型:在高血压中的应用
Pub Date : 2005-07-01 DOI: 10.1191/1471082X05st088oa
C. Serio, Paola Vicard
A crucial task in modern genetic medicine is the understanding of complex genetic diseases. The main complicating features are that a combination of genetic and environmental risk factors is involved, and the phenotype of interest may be complex. Traditional statistical techniques based on lod-scores fail when the disease is no longer monogenic and the underlying disease transmission model is not defined. Different kinds of association tests have been proved to be an appropriate and powerful statistical tool to detect a ‘candidate gene’ for a complex disorder. However, statistical techniques able to investigate direct and indirect influences among phenotypes, genotypes and environmental risk factors, are required to analyse the association structure of complex diseases. In this paper, we propose graphical models as a natural tool to analyse the multifactorial structure of complex genetic diseases. An application of this model to primary hypertension data set is illustrated.
现代遗传医学的一项重要任务是了解复杂的遗传疾病。主要的复杂特征是遗传和环境风险因素的结合,感兴趣的表型可能是复杂的。当疾病不再是单基因的,并且潜在的疾病传播模型没有定义时,传统的基于负荷评分的统计技术就失效了。不同类型的关联测试已被证明是检测复杂疾病的“候选基因”的适当和强大的统计工具。然而,需要能够调查表型、基因型和环境风险因素之间直接和间接影响的统计技术来分析复杂疾病的关联结构。在本文中,我们提出图形模型作为一种自然的工具来分析复杂遗传疾病的多因子结构。并举例说明了该模型在原发性高血压数据集中的应用。
{"title":"Graphical chain models for the analysis of complex genetic diseases: an application to hypertension","authors":"C. Serio, Paola Vicard","doi":"10.1191/1471082X05st088oa","DOIUrl":"https://doi.org/10.1191/1471082X05st088oa","url":null,"abstract":"A crucial task in modern genetic medicine is the understanding of complex genetic diseases. The main complicating features are that a combination of genetic and environmental risk factors is involved, and the phenotype of interest may be complex. Traditional statistical techniques based on lod-scores fail when the disease is no longer monogenic and the underlying disease transmission model is not defined. Different kinds of association tests have been proved to be an appropriate and powerful statistical tool to detect a ‘candidate gene’ for a complex disorder. However, statistical techniques able to investigate direct and indirect influences among phenotypes, genotypes and environmental risk factors, are required to analyse the association structure of complex diseases. In this paper, we propose graphical models as a natural tool to analyse the multifactorial structure of complex genetic diseases. An application of this model to primary hypertension data set is illustrated.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125312573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
The practical utility of incorporating model selection uncertainty into prognostic models for survival data 将模型选择的不确定性纳入生存数据的预后模型的实际效用
Pub Date : 2005-07-01 DOI: 10.1191/1471082X05st089oa
N. Augustin, W. Sauerbrei, M. Schumacher
Predictions of disease outcome in prognostic factor models are usually based on one selected model. However, often several models fit the data equally well, but these models might differ substantially in terms of included explanatory variables and might lead to different predictions for individual patients. For survival data, we discuss two approaches to account for model selection uncertainty in two data examples, with the main emphasis on variable selection in a proportional hazard Cox model. The main aim of our investigation is to establish the ways in which either of the two approaches is useful in such prognostic models. The first approach is Bayesian model averaging (BMA) adapted for the proportional hazard model, termed ‘approx. BMA’ here. As a new approach, we propose a method which averages over a set of possible models using weights estimated from bootstrap resampling as proposed by Buckland et al., but in addition, we perform an initial screening of variables based on the inclusion frequency of each variable to reduce the set of variables and corresponding models. For some necessary parameters of the procedure, investigations concerning sensible choices are still required. The main objective of prognostic models is prediction, but the interpretation of single effects is also important and models should be general enough to ensure transportability to other clinical centres. In the data examples, we compare predictions of our new approach with approx. BMA, with ‘conventional’ predictions from one selected model and with predictions from the full model. Confidence intervals are compared in one example. Comparisons are based on the partial predictive score and the Brier score. We conclude that the two model averaging methods yield similar results and are especially useful when there is a high number of potential prognostic factors, most likely some of them without influence in a multivariable context. Although the method based on bootstrap resampling lacks formal justification and requires some ad hoc decisions, it has the additional positive effect of achieving model parsimony by reducing the number of explanatory variables and dealing with correlated variables in an automatic fashion.
预后因素模型对疾病结果的预测通常基于一个选定的模型。然而,通常有几个模型都能很好地拟合数据,但这些模型在包含的解释变量方面可能存在很大差异,并可能导致对个体患者的不同预测。对于生存数据,我们讨论了两种方法来解释两个数据示例中的模型选择不确定性,主要强调比例风险Cox模型中的变量选择。我们调查的主要目的是建立两种方法中的任何一种在这种预测模型中有用的方法。第一种方法是贝叶斯模型平均(BMA),适用于比例风险模型,称为“近似”。BMA的这里。作为一种新方法,我们提出了一种方法,该方法使用Buckland等人提出的自举重采样估计的权重对一组可能的模型进行平均,但此外,我们根据每个变量的包含频率对变量进行初始筛选,以减少变量集和相应的模型。对于程序的一些必要参数,仍然需要对合理选择进行调查。预后模型的主要目的是预测,但对单一效应的解释也很重要,模型应具有足够的通用性,以确保可移植到其他临床中心。在数据示例中,我们将新方法的预测与近似进行比较。BMA,一个选定模型的“传统”预测和一个完整模型的预测。在一个示例中比较置信区间。比较是基于部分预测评分和Brier评分。我们得出的结论是,两种模型平均方法产生相似的结果,并且在存在大量潜在预后因素时特别有用,其中一些因素很可能在多变量环境中没有影响。尽管基于自举重采样的方法缺乏正式的证明,并且需要一些特别的决定,但它具有通过减少解释变量的数量和以自动方式处理相关变量来实现模型简约的额外积极影响。
{"title":"The practical utility of incorporating model selection uncertainty into prognostic models for survival data","authors":"N. Augustin, W. Sauerbrei, M. Schumacher","doi":"10.1191/1471082X05st089oa","DOIUrl":"https://doi.org/10.1191/1471082X05st089oa","url":null,"abstract":"Predictions of disease outcome in prognostic factor models are usually based on one selected model. However, often several models fit the data equally well, but these models might differ substantially in terms of included explanatory variables and might lead to different predictions for individual patients. For survival data, we discuss two approaches to account for model selection uncertainty in two data examples, with the main emphasis on variable selection in a proportional hazard Cox model. The main aim of our investigation is to establish the ways in which either of the two approaches is useful in such prognostic models. The first approach is Bayesian model averaging (BMA) adapted for the proportional hazard model, termed ‘approx. BMA’ here. As a new approach, we propose a method which averages over a set of possible models using weights estimated from bootstrap resampling as proposed by Buckland et al., but in addition, we perform an initial screening of variables based on the inclusion frequency of each variable to reduce the set of variables and corresponding models. For some necessary parameters of the procedure, investigations concerning sensible choices are still required. The main objective of prognostic models is prediction, but the interpretation of single effects is also important and models should be general enough to ensure transportability to other clinical centres. In the data examples, we compare predictions of our new approach with approx. BMA, with ‘conventional’ predictions from one selected model and with predictions from the full model. Confidence intervals are compared in one example. Comparisons are based on the partial predictive score and the Brier score. We conclude that the two model averaging methods yield similar results and are especially useful when there is a high number of potential prognostic factors, most likely some of them without influence in a multivariable context. Although the method based on bootstrap resampling lacks formal justification and requires some ad hoc decisions, it has the additional positive effect of achieving model parsimony by reducing the number of explanatory variables and dealing with correlated variables in an automatic fashion.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116844288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
Efficient models for correlated data via convolutions of intrinsic processes 通过内在过程的卷积建立相关数据的有效模型
Pub Date : 2005-04-01 DOI: 10.1191/1471082X05st085oa
Herbert K. H. Lee, D. Higdon, Catherine A. Calder, C. Holloman
Gaussian processes (GP) have proven to be useful and versatile stochastic models in a wide variety of applications including computer experiments, environmental monitoring, hydrology and climate modeling. A GP model is determined by its mean and covariance functions. In most cases, the mean is specified to be a constant, or some other simple linear function, whereas the covariance function is governed by a few parameters. A Bayesian formulation is attractive as it allows for formal incorporation of uncertainty regarding the parameters governing the GP. However, estimation of these parameters can be problematic. Large datasets, posterior correlation and inverse problems can all lead to difficulties in exploring the posterior distribution. Here, we propose an alternative model which is quite tractable computationally - even with large datasets or indirectly observed data - while still maintaining the flexibility and adaptiveness of traditional GP models. This model is based on convolving simple Markov random fields with a smoothing kernel. We consider applications in hydrology and aircraft prototype testing.
高斯过程(GP)在计算机实验、环境监测、水文和气候建模等广泛应用中已被证明是有用和通用的随机模型。GP模型由其均值函数和协方差函数决定。在大多数情况下,平均值被指定为一个常数,或其他一些简单的线性函数,而协方差函数由几个参数控制。贝叶斯公式是有吸引力的,因为它允许关于控制GP的参数的不确定性的正式合并。然而,这些参数的估计可能是有问题的。大数据集、后验相关和逆问题都会给后验分布的探索带来困难。在这里,我们提出了一种替代模型,该模型在计算上非常易于处理-即使是大型数据集或间接观测数据-同时仍然保持传统GP模型的灵活性和适应性。该模型基于带平滑核的简单马尔可夫随机场的卷积。我们考虑在水文学和飞机原型测试中的应用。
{"title":"Efficient models for correlated data via convolutions of intrinsic processes","authors":"Herbert K. H. Lee, D. Higdon, Catherine A. Calder, C. Holloman","doi":"10.1191/1471082X05st085oa","DOIUrl":"https://doi.org/10.1191/1471082X05st085oa","url":null,"abstract":"Gaussian processes (GP) have proven to be useful and versatile stochastic models in a wide variety of applications including computer experiments, environmental monitoring, hydrology and climate modeling. A GP model is determined by its mean and covariance functions. In most cases, the mean is specified to be a constant, or some other simple linear function, whereas the covariance function is governed by a few parameters. A Bayesian formulation is attractive as it allows for formal incorporation of uncertainty regarding the parameters governing the GP. However, estimation of these parameters can be problematic. Large datasets, posterior correlation and inverse problems can all lead to difficulties in exploring the posterior distribution. Here, we propose an alternative model which is quite tractable computationally - even with large datasets or indirectly observed data - while still maintaining the flexibility and adaptiveness of traditional GP models. This model is based on convolving simple Markov random fields with a smoothing kernel. We consider applications in hydrology and aircraft prototype testing.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"258 5","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120896224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 41
Two-component mixtures of generalized linear mixed effects models for cluster correlated data 聚类相关数据广义线性混合效应模型的双组分混合
Pub Date : 2005-04-01 DOI: 10.1191/1471082X05st090oa
D. Hall, Lihua Wang
Finite mixtures of generalized linear mixed effect models are presented to handle situations where within-cluster correlation and heterogeneity (subpopulations) exist simultaneously. For this class of model, we consider maximum likelihood (ML) as our main approach to estimation. Owing to the complexity of the marginal loglikelihood of this model, the EM algorithm is employed to facilitate computation. The major obstacle in this procedure is to integrate over the random effects’ distribution to evaluate the expectation in the E step. When assuming normally distributed random effects, we consider adaptive Gaussian quadrature to perform this integration numerically. We also discuss nonparametric ML estimation under a relaxation of the normality assumption on the random effects. Two real data sets are analysed to compare our proposed model with other existing models and illustrate our estimation methods.
提出了广义线性混合效应模型的有限混合模型来处理集群内相关性和异质性(亚种群)同时存在的情况。对于这类模型,我们考虑最大似然(ML)作为我们的主要估计方法。由于该模型的边际对数似然比较复杂,为了便于计算,采用了EM算法。这个过程的主要障碍是对随机效应的分布进行积分,以评估E步中的期望。当假设正态分布随机效应时,我们考虑自适应高斯正交来进行数值积分。我们还讨论了随机效应正态性假设松弛下的非参数ML估计。通过对两个实际数据集的分析,将我们提出的模型与其他现有模型进行了比较,并说明了我们的估计方法。
{"title":"Two-component mixtures of generalized linear mixed effects models for cluster correlated data","authors":"D. Hall, Lihua Wang","doi":"10.1191/1471082X05st090oa","DOIUrl":"https://doi.org/10.1191/1471082X05st090oa","url":null,"abstract":"Finite mixtures of generalized linear mixed effect models are presented to handle situations where within-cluster correlation and heterogeneity (subpopulations) exist simultaneously. For this class of model, we consider maximum likelihood (ML) as our main approach to estimation. Owing to the complexity of the marginal loglikelihood of this model, the EM algorithm is employed to facilitate computation. The major obstacle in this procedure is to integrate over the random effects’ distribution to evaluate the expectation in the E step. When assuming normally distributed random effects, we consider adaptive Gaussian quadrature to perform this integration numerically. We also discuss nonparametric ML estimation under a relaxation of the normality assumption on the random effects. Two real data sets are analysed to compare our proposed model with other existing models and illustrate our estimation methods.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"94 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131756557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Random effect models for repeated measures of zero-inflated count data 零膨胀计数数据重复测量的随机效应模型
Pub Date : 2005-04-01 DOI: 10.1191/1471082X05st084oa
Yongyi Min, A. Agresti
For count responses, the situation of excess zeros (relative to what standard models allow) often occurs in biomedical and sociological applications. Modeling repeated measures of zero-inflated count data presents special challenges. This is because in addition to the problem of extra zeros, the correlation between measurements upon the same subject at different occasions needs to be taken into account. This article discusses random effect models for repeated measurements on this type of response variable. A useful model is the hurdle model with random effects, which separately handles the zero observations and the positive counts. In maximum likelihood model fitting, we consider both a normal distribution and a nonparametric approach for the random effects. A special case of the hurdle model can be used to test for zero inflation. Random effects can also be introduced in a zero-inflated Poisson or negative binomial model, but such a model may encounter fitting problems if there is zero deflation at any settings of the explanatory variables. A simple alternative approach adapts the cumulative logit model with random effects, which has a single set of parameters for describing effects. We illustrate the proposed methods with examples.
对于计数响应,在生物医学和社会学应用中经常出现超过零的情况(相对于标准模型允许的情况)。对零膨胀计数数据的重复测量建模提出了特殊的挑战。这是因为除了额外零的问题外,还需要考虑在不同场合对同一主题的测量之间的相关性。本文讨论了对这类响应变量进行重复测量的随机效应模型。一个有用的模型是具有随机效应的障碍模型,它分别处理零观测值和正计数。在最大似然模型拟合中,我们同时考虑正态分布和随机效应的非参数方法。障碍模型的一个特例可以用来测试零通货膨胀。在零膨胀泊松模型或负二项模型中也可以引入随机效应,但如果在任何解释变量的设置下都存在零通货紧缩,则这种模型可能会遇到拟合问题。一种简单的替代方法将累积logit模型与随机效应相适应,该模型具有一组用于描述效应的参数。我们用实例来说明所提出的方法。
{"title":"Random effect models for repeated measures of zero-inflated count data","authors":"Yongyi Min, A. Agresti","doi":"10.1191/1471082X05st084oa","DOIUrl":"https://doi.org/10.1191/1471082X05st084oa","url":null,"abstract":"For count responses, the situation of excess zeros (relative to what standard models allow) often occurs in biomedical and sociological applications. Modeling repeated measures of zero-inflated count data presents special challenges. This is because in addition to the problem of extra zeros, the correlation between measurements upon the same subject at different occasions needs to be taken into account. This article discusses random effect models for repeated measurements on this type of response variable. A useful model is the hurdle model with random effects, which separately handles the zero observations and the positive counts. In maximum likelihood model fitting, we consider both a normal distribution and a nonparametric approach for the random effects. A special case of the hurdle model can be used to test for zero inflation. Random effects can also be introduced in a zero-inflated Poisson or negative binomial model, but such a model may encounter fitting problems if there is zero deflation at any settings of the explanatory variables. A simple alternative approach adapts the cumulative logit model with random effects, which has a single set of parameters for describing effects. We illustrate the proposed methods with examples.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131889444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 351
Software reliability modelling and prediction with hidden Markov chains 基于隐马尔可夫链的软件可靠性建模与预测
Pub Date : 2005-04-01 DOI: 10.1191/1471082X05st087oa
Jean-Baptiste Durand, O. Gaudoin
The purpose of this paper is to use the framework of hidden Markov chains (HMCs) for the modelling of the failure and debugging process of software, and the prediction of software reliability. The model parameters are estimated using the forward-backward expectation maximization algorithm, and model selection is done with the Bayesian information criterion. The advantages and drawbacks of this approach, with respect to usual modelling, are analysed. Comparison is also done on real software failure data. The main contribution of HMC modelling is that it highlights the existence of homogeneous periods in the debugging process, which allow one to identify major corrections or version updates. In terms of reliability predictions, the HMC model performs well, on average, with respect to usual models, especially when the reliability is not regularly growing.
本文的目的是利用隐马尔可夫链(hmc)框架对软件的故障和调试过程进行建模,并预测软件的可靠性。模型参数估计采用前向-后向期望最大化算法,模型选择采用贝叶斯信息准则。分析了这种方法相对于通常的建模方法的优点和缺点。并与实际软件故障数据进行了比较。HMC建模的主要贡献在于它突出了调试过程中同构阶段的存在,这允许人们识别主要的更正或版本更新。在可靠性预测方面,相对于通常的模型,HMC模型平均表现良好,特别是当可靠性不定期增长时。
{"title":"Software reliability modelling and prediction with hidden Markov chains","authors":"Jean-Baptiste Durand, O. Gaudoin","doi":"10.1191/1471082X05st087oa","DOIUrl":"https://doi.org/10.1191/1471082X05st087oa","url":null,"abstract":"The purpose of this paper is to use the framework of hidden Markov chains (HMCs) for the modelling of the failure and debugging process of software, and the prediction of software reliability. The model parameters are estimated using the forward-backward expectation maximization algorithm, and model selection is done with the Bayesian information criterion. The advantages and drawbacks of this approach, with respect to usual modelling, are analysed. Comparison is also done on real software failure data. The main contribution of HMC modelling is that it highlights the existence of homogeneous periods in the debugging process, which allow one to identify major corrections or version updates. In terms of reliability predictions, the HMC model performs well, on average, with respect to usual models, especially when the reliability is not regularly growing.","PeriodicalId":354759,"journal":{"name":"Statistical Modeling","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129340136","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 34
期刊
Statistical Modeling
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1