Journal of the Japanese Society of Computational Statistics最新文献

英文中文

CONSTRUCTION OF REGRESSION TREES ON INTERVAL-VALUED SYMBOLIC VARIABLES 区间值符号变量回归树的构造

Journal of the Japanese Society of Computational Statistics

Pub Date : 2014-12-20 DOI: 10.5183/JJSCS.1405001_211

Asanao Shimokawa, Y. Kawasaki, E. Miyaoka

Analysis based on interval-valued symbolic variables, which are given as p-dimensional hyperrectangles in R, is considered appropriate in some scenarios. However, the methods analyzing these variables are not as well studied as those for classical variables, which are given as single points in R. The regression tree, which is constructed using the CART algorithm, is one such example, and we consider it in this paper. To construct a regression tree based on interval-valued symbolic variables, several models are considered. Our proposed model is different from the other models, because, in this model, a concept can be included in several terminal nodes in a tree. If we want to construct a regression tree using the proposed model, several problems such as the representation method of predictive models in each node and searching an optimal splitting point in interval values, should be addressed. We address these problems and present an application of this model in reference to the study of HIV-1-infected patients’ data.

在某些情况下，基于区间值符号变量的分析被认为是合适的，这些变量在R中以p维超矩形的形式给出。然而，分析这些变量的方法并没有像分析经典变量的方法那样得到很好的研究，经典变量在r中以单点形式给出。使用CART算法构建的回归树就是这样一个例子，我们在本文中考虑了它。为了构造基于区间值符号变量的回归树，考虑了几种模型。我们提出的模型与其他模型不同，因为在这个模型中，一个概念可以包含在树的几个终端节点中。如果我们想用所提出的模型构造回归树，需要解决几个问题，如预测模型在每个节点上的表示方法和在区间值中寻找最优分裂点。我们解决了这些问题，并提出了该模型在参考hiv -1感染患者的数据研究中的应用。

引用次数: 2

PERFORMANCE OF INFORMATION CRITERIA FOR MODEL SELECTION IN A LATENT GROWTH CURVE MIXTURE MODEL 潜在生长曲线混合模型中模型选择信息准则的性能

Journal of the Japanese Society of Computational Statistics

Pub Date : 2014-12-20 DOI: 10.5183/JJSCS.1309001_207

S. Usami

Novel simulation studies are performed to investigate the performance of likelihood-based and entropy-based information criteria for estimating the number of classes in latent growth curve mixture models, considering influences of true model complexity and model misspecification. Simulation results can be summarized as (1) Increased model complexity worsens the performance of all criteria, and this is salient in Bayesian Information Criteria (BIC) and Consistent Akaike Information Criteria (CAIC). (2) The classification likelihood information criterion (CLC) and integrated completed likelihood criterion with BIC approximation (ICL.BIC) frequently underestimate the number of classes. (3) Entropy-based criteria correctly estimate the number of classes more frequently. (4) When a normal mixture is incorrectly fit to non-normal data including outliers, although this seriously worsens the performance of many criteria, BIC, CAIC, and ICL.BIC are relatively robust. Additionally, overextracted classes with trivially small mixture proportions can be detected when the sample size is large. (5) When there is an upper bound of measurement, although this worsens the performance of almost all criteria, entropy-based criteria are robust. (6) Although no single criterion is always best, ICL.BIC shows better performance on average.

在考虑真实模型复杂性和模型错配影响的情况下，研究了基于似然和基于熵的信息准则在估计潜在生长曲线混合模型中类别数量方面的性能。仿真结果表明:(1)模型复杂度的增加会使所有准则的性能恶化，这在贝叶斯信息准则(BIC)和一致赤池信息准则(CAIC)中表现得尤为明显。(2)分类似然信息准则(CLC)和综合完全似然准则与BIC近似(ICL.BIC)经常低估类的数量。(3)基于熵的准则更频繁地正确估计类的数量。(4)当正态混合不正确地拟合包括异常值在内的非正态数据时，尽管这严重恶化了许多标准的性能，如BIC、CAIC和ICL。BIC相对稳健。此外，当样本量很大时，可以检测到混合比例非常小的过度提取类。(5)当存在测量的上界时，尽管这会使几乎所有准则的性能恶化，但基于熵的准则是稳健的。虽然没有单一的标准总是最好的，但ICL。BIC的平均表现更好。

{"title":"PERFORMANCE OF INFORMATION CRITERIA FOR MODEL SELECTION IN A LATENT GROWTH CURVE MIXTURE MODEL","authors":"S. Usami","doi":"10.5183/JJSCS.1309001_207","DOIUrl":"https://doi.org/10.5183/JJSCS.1309001_207","url":null,"abstract":"Novel simulation studies are performed to investigate the performance of likelihood-based and entropy-based information criteria for estimating the number of classes in latent growth curve mixture models, considering influences of true model complexity and model misspecification. Simulation results can be summarized as (1) Increased model complexity worsens the performance of all criteria, and this is salient in Bayesian Information Criteria (BIC) and Consistent Akaike Information Criteria (CAIC). (2) The classification likelihood information criterion (CLC) and integrated completed likelihood criterion with BIC approximation (ICL.BIC) frequently underestimate the number of classes. (3) Entropy-based criteria correctly estimate the number of classes more frequently. (4) When a normal mixture is incorrectly fit to non-normal data including outliers, although this seriously worsens the performance of many criteria, BIC, CAIC, and ICL.BIC are relatively robust. Additionally, overextracted classes with trivially small mixture proportions can be detected when the sample size is large. (5) When there is an upper bound of measurement, although this worsens the performance of almost all criteria, entropy-based criteria are robust. (6) Although no single criterion is always best, ICL.BIC shows better performance on average.","PeriodicalId":338719,"journal":{"name":"Journal of the Japanese Society of Computational Statistics","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131495668","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

BIAS REDUCTION IN ESTIMATING A CONCORDANCE FOR CENSORED TIME-TO-EVENT RESPONSES 估计删减时间-事件响应一致性的偏差减少

Journal of the Japanese Society of Computational Statistics

Pub Date : 2014-12-20 DOI: 10.5183/JJSCS.1312001_209

Kenichi Hayashi

引用次数: 0

LIFESPAN DISTRIBUTION OF SIMD GROUPS ON A GPU ENGAGED IN A CLASS OF PROBABILISTIC COMPUTATION 从事一类概率计算的gpu上simd组的寿命分布

Journal of the Japanese Society of Computational Statistics

Pub Date : 2014-12-20 DOI: 10.5183/JJSCS.1308001_206

Masanari Iida, N. Niki

In each SIMD (Single Instruction, Multiple Data) group, called a ‘warp’ of a GPU (Graphics Processing Unit), all the (cid:12)xed number of threads execute the same instruction concurrently at each unit period of time. We consider a class of probabilistic algorithms designed for use on GPUs, including a wide variety of Monte Carlo methods, such that each thread contains a loop iterated stochastically variable times, and that the life-cycle of a warp ends when the slowest thread completes its requested task. A run-time model is proposed in order to explain the distributions of execution time observed in SIMD parallel computations using the algorithms of this class. Asymptotic properties of those distributions are also presented.

在每个SIMD(单指令，多数据)组中，称为GPU(图形处理单元)的“warp”，所有(cid:12)固定数量的线程在每个单位时间内并发地执行相同的指令。我们考虑了一类设计用于gpu的概率算法，包括各种各样的蒙特卡罗方法，使得每个线程包含一个随机变量迭代的循环，并且当最慢的线程完成其请求的任务时，warp的生命周期结束。为了解释使用该类算法进行SIMD并行计算时观察到的执行时间分布，提出了一个运行时模型。并给出了这些分布的渐近性质。

引用次数: 2

MODIFIED NON-OVERLAPPING TEMPLATE MATCHING TEST AND PROPOSAL ON SETTING TEMPLATE 修改了不重叠模板匹配测试，提出了模板设置的建议

Journal of the Japanese Society of Computational Statistics

Pub Date : 2014-12-20 DOI: 10.5183/JJSCS.1311001_208

Y. Takeda, Mituaki Huzii, N. Watanabe, T. Kamakura

Rukhin et al. (2010) proposed the non-overlapping template matching test as one of methods for statistical testing of randomness in cryptographic applications. This test is the very interesting, but statistical properties of this test and any methods on setting the template have not been shown. Our new contribution in this paper is to propose a modified version of this test including the setting of the template and to show how this modified test works effectively by some simulation studies.

Rukhin et al.(2010)提出了非重叠模板匹配检验作为密码学应用中随机性统计检验的方法之一。这个测试非常有趣，但是这个测试的统计属性和设置模板的任何方法都没有显示。我们在本文中的新贡献是提出了该测试的修改版本，包括模板的设置，并通过一些模拟研究显示了修改后的测试如何有效地工作。

引用次数: 4

MODEL SELECTION BASED ON QUASI-LIKELIHOOD WITH APPLICATION TO OVERDISPERSED DATA 基于准似然的模型选择及其在过分散数据中的应用

Journal of the Japanese Society of Computational Statistics

Pub Date : 2013-12-20 DOI: 10.5183/JJSCS.1212002_204

Yiping Tang

In analyzing complicated data, we are often unwilling or not confident to impose a parametric model for the data-generating structure. One important example is data analysis for proportional or count data with overdispersion. The obvious advantage of assuming full parametric models is that one can resort to likelihood analyses, for instance, to use AIC or BIC to choose the most appropriate regression models. For overdispersed proportional data, possible parametric models include the Beta-binomial models, the double exponential models, etc. In this paper, we extend the generalized linear models by replacing the full parametric models with a finite number of moment restrictions on both the data and the structural parameters. For such semiparametric statistical models, we propose a method for selecting the best possible regression model in the semiparametric model class. We will apply the proposed model selection technique to overdispersed data. We will demonstrate the use of the proposed semiparametric information criterion using the well-known data on germination of Orobanche.

在分析复杂数据时，我们往往不愿意或没有信心对数据生成结构施加参数化模型。一个重要的例子是对具有过色散的比例或计数数据的数据分析。假设全参数模型的明显优势是，人们可以求助于似然分析，例如，使用AIC或BIC来选择最合适的回归模型。对于过分散的比例数据，可能的参数模型包括beta二项模型、双指数模型等。在本文中，我们通过在数据和结构参数上用有限个数的矩限制来代替全参数模型，从而扩展了广义线性模型。对于这类半参数统计模型，我们提出了一种在半参数模型类中选择最佳可能回归模型的方法。我们将提出的模型选择技术应用于过度分散的数据。我们将展示使用所提出的半参数信息准则，利用众所周知的数据萌发的Orobanche。

引用次数: 0

ITEM RESPONSE THEORY USING A FINITE MIXTURE OF LOGISTIC MODELS WITH ITEM-SPECIFIC MIXING WEIGHTS 项目反应理论使用具有特定项目混合权的有限混合逻辑模型

Journal of the Japanese Society of Computational Statistics

Pub Date : 2013-12-20 DOI: 10.5183/JJSCS.1206001_199

Joji Mori, Y. Kano

Since a latent trait θ can not be directly observed in item response theory models, it is difficult to specify an item response function (IRF). Many mathematical models have been proposed, among which the two-parameter logistic model (2PLM) is often included. In this article, we will propose a new parametric model, namely, a finite mixture of logistic models (MLM). The MLM has different mixing weights per item, and can model a plateau in the learning curve, which is a well-known phenomenon in education and psychology. It is also known that finite mixtures have some problems with estimating item parameters. Therefore, we develop a new useful estimation algorithm for item parameters and present simulation studies which show that this estimation algorithm works well. In fact, when the MLM was applied to analyze real data, we also found that the MLM makes it possible to distinguish whether or not a plateau appears in an IRF, whereas the 2PLM does not have this capability.

由于潜在特质θ在项目反应理论模型中不能直接观察到，因此很难确定项目反应函数。人们提出了许多数学模型，其中经常包括双参数逻辑模型(2PLM)。在本文中，我们将提出一种新的参数模型，即有限混合逻辑模型(MLM)。传销每个项目有不同的混合权重，并且可以在学习曲线中建立平台，这是教育和心理学中众所周知的现象。众所周知，有限混合在估计项目参数方面存在一些问题。因此，我们开发了一种新的实用的项目参数估计算法，并进行了仿真研究，结果表明该估计算法是有效的。事实上，当MLM应用于分析真实数据时，我们还发现MLM可以区分是否在IRF中出现平台，而2PLM则没有这种能力。

引用次数: 1

A RESTRAINED CONDITION NUMBER LEAST SQUARES TECHNIQUE WITH ITS APPLICATIONS TO AVOIDING RANK DEFICIENCY 约束条件数最小二乘技术及其在避免秩不足中的应用

Journal of the Japanese Society of Computational Statistics

Pub Date : 2013-12-20 DOI: 10.5183/JJSCS.1208002_201

K. Adachi

An algorithm for the constrained least squares problem is proposed in which the upper bound of the condition number of a parameter matrix is predetermined. In the algorithm, the parameter matrix to be obtained is reparameterized using its singular value decomposition, and the loss function is minimized alternately over the singular vector matrices and the singular values with condition number constraint. It was demonstrated that the algorithm recovered full rank matrices in simulated reverse component analysis, in which the matrices were estimated from their reduced rank counterparts. The proposed algorithm is useful for avoiding degenerate solutions in which parameter matrices become rank de(cid:12)cient, which is illustrated in its application to generalized oblique Procrustes rotation and three-mode Parafac component analysis.

提出了一种确定参数矩阵条件数上界的约束最小二乘问题算法。该算法利用其奇异值分解对待获取的参数矩阵进行重新参数化，并在奇异向量矩阵和条件数约束的奇异值上交替最小化损失函数。结果表明，该算法在模拟逆向成分分析中恢复了全秩矩阵，其中矩阵是由它们的降秩对应物估计的。该算法可有效避免参数矩阵秩(cid:12)次的退化解，并应用于广义斜Procrustes旋转和三模Parafac分量分析。

引用次数: 3

MULTIPLE COMPARISON PROCEDURES FOR HIGH-DIMENSIONAL DATA AND THEIR ROBUSTNESS UNDER NON-NORMALITY 高维数据的多重比较方法及其非正态性下的鲁棒性

Journal of the Japanese Society of Computational Statistics

Pub Date : 2013-12-20 DOI: 10.5183/JJSCS.1211001_202

Sho Takahashi, Masashi Hyodo, T. Nishiyama, T. Pavlenko

This paper analyzes whether procedures for multiple comparison derived in Hyodo et al. (2012) work for an unbalanced case and under non-normality. We focus on pairwise multiple comparisons and comparison with a control among mean vectors, and show that the asymptotic properties of these procedures remain valid in unbalanced high-dimensional setting. We also numerically justify that the derived procedures are robust under non-normality, i.e., the coverage probability of these procedures can be controlled with or without the assumption of normality of the data.

本文分析了Hyodo et al.(2012)导出的多重比较程序是否适用于非平衡情况和非正态情况。我们重点研究了两两多重比较和均值向量间的控制比较，并证明了这些过程的渐近性质在不平衡高维环境下仍然有效。我们还在数值上证明了导出的程序在非正态性下是鲁棒的，即，这些程序的覆盖概率可以在假设数据正态性或不假设数据正态性的情况下控制。

引用次数: 4

AN EVALUATION OF TREATMENT-COVARIATE INTERACTION IN META-ANALYSIS WITH MARGINALIZING OF MISSING INDIVIDUAL PATIENT DATA 荟萃分析中治疗-协变量相互作用的评估，剔除缺失的个体患者数据

Journal of the Japanese Society of Computational Statistics

Pub Date : 2013-12-20 DOI: 10.5183/JJSCS.1212001_203

Y. Yamaguchi, Wataru Sakamoto, S. Shirahata, M. Goto

Meta-analysis methods based on individual patient data (IPD) have attracted attention in estimating a treatment-covariate interaction effect. An existing metaregression approach, based on aggregate data (AD) such as a treatment effect estimate and its standard error, is used only for the inference of between-trial interaction which indicates a relationship between the treatment effect estimates and mean covariate values; in contrast, the use of IPD can achieve estimation of not only the between-trial interaction but also within-trial interaction which indicates a relationship between individual outcomes and individual covariate values. However, most of the IPD metaanalyses are often difficult to implement because practitioners cannot always collect the IPD from all trials involved. We propose a new meta-analysis method for estimating both the between-trial and the within-trial interaction, in which we assume an IPD meta-analysis model for the missing IPD and then marginalize its density with respect to the missing IPD. The proposed method allows one to estimate the withintrial interaction even when only AD are available, and has potential benefits for another meta-analytic situation where some trials provide IPD and the others provide only AD. Through simulation studies, we demonstrate how close estimates of the between-trial and the within-trial interaction from the proposed method are to those from the IPD meta-analysis.

基于个体患者数据(IPD)的荟萃分析方法在估计治疗-协变量相互作用效应方面引起了人们的关注。现有的元回归方法基于综合数据(AD)，如治疗效果估计及其标准误差，仅用于试验间相互作用的推断，这表明治疗效果估计与协变量平均值之间存在关系;相比之下，使用IPD不仅可以估计试验间相互作用，还可以估计试验内相互作用，这表明个体结果与个体协变量值之间存在关系。然而，大多数IPD荟萃分析往往难以实施，因为从业者不能总是从所有相关试验中收集IPD。我们提出了一种新的元分析方法来估计试验之间和试验内部的相互作用，其中我们假设缺失IPD的IPD元分析模型，然后将其密度相对于缺失IPD边缘化。所提出的方法允许人们估计试验内相互作用，即使只有AD可用，并且对另一种荟萃分析情况有潜在的好处，其中一些试验提供IPD，而其他试验只提供AD。通过模拟研究，我们证明了所提出的方法对试验间和试验内相互作用的估计与IPD荟萃分析的估计是多么接近。

{"title":"AN EVALUATION OF TREATMENT-COVARIATE INTERACTION IN META-ANALYSIS WITH MARGINALIZING OF MISSING INDIVIDUAL PATIENT DATA","authors":"Y. Yamaguchi, Wataru Sakamoto, S. Shirahata, M. Goto","doi":"10.5183/JJSCS.1212001_203","DOIUrl":"https://doi.org/10.5183/JJSCS.1212001_203","url":null,"abstract":"Meta-analysis methods based on individual patient data (IPD) have attracted attention in estimating a treatment-covariate interaction effect. An existing metaregression approach, based on aggregate data (AD) such as a treatment effect estimate and its standard error, is used only for the inference of between-trial interaction which indicates a relationship between the treatment effect estimates and mean covariate values; in contrast, the use of IPD can achieve estimation of not only the between-trial interaction but also within-trial interaction which indicates a relationship between individual outcomes and individual covariate values. However, most of the IPD metaanalyses are often difficult to implement because practitioners cannot always collect the IPD from all trials involved. We propose a new meta-analysis method for estimating both the between-trial and the within-trial interaction, in which we assume an IPD meta-analysis model for the missing IPD and then marginalize its density with respect to the missing IPD. The proposed method allows one to estimate the withintrial interaction even when only AD are available, and has potential benefits for another meta-analytic situation where some trials provide IPD and the others provide only AD. Through simulation studies, we demonstrate how close estimates of the between-trial and the within-trial interaction from the proposed method are to those from the IPD meta-analysis.","PeriodicalId":338719,"journal":{"name":"Journal of the Japanese Society of Computational Statistics","volume":"9 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132595758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

首页上一页

下一页尾页

类型

全部化学•材料生命科学医学物理工程技术环境•农林材料科学地球科学法学管理学化学环境科学与生态学计算机科学教育学经济学农林科学人文科学生物学数学物理与天体物理心理学综合性期刊其他工业工程理学历史学农学文学信息工程

数据库

全部 ACS Publications Elsevier ieeexplore Springer The Royal Society of Chemistry Wiley

期刊

Journal of the Japanese Society of Computational Statistics

全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.

﹀