We assume X and Y be two independent random variables and define θ=P (X < Y ). The inference for θ can be found in various fields. This paper not only compares several methods for constructing the confidence interval for θ in a small sample but also proposes some new methods. The intervals derived by these new methods show good performance in a small sample, and their actual coverage probability is close to the nominal level. In addition, one of the biggest advantages of our methods is that it does not require complicated calculations.
{"title":"On confidence intervals for P (X < Y )","authors":"Youhei Kawasaki, Youhei Kawasaki, E. Miyaoka","doi":"10.5183/JJSCS.23.1_1","DOIUrl":"https://doi.org/10.5183/JJSCS.23.1_1","url":null,"abstract":"We assume X and Y be two independent random variables and define θ=P (X < Y ). The inference for θ can be found in various fields. This paper not only compares several methods for constructing the confidence interval for θ in a small sample but also proposes some new methods. The intervals derived by these new methods show good performance in a small sample, and their actual coverage probability is close to the nominal level. In addition, one of the biggest advantages of our methods is that it does not require complicated calculations.","PeriodicalId":338719,"journal":{"name":"Journal of the Japanese Society of Computational Statistics","volume":"273 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122718891","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper proposes a dimension reduction technique in the framework of symbolic data analysis (SDA). Recent advances in technology have increased the complexity of datasets, and today, their size is much larger than it was in the past decade. Most statistical methods do not have sufficient power to analyze these datasets. SDA was proposed by Diday at the end of the 1980s and is a new approach for analyzing huge and complex data.SDA examines “symbolic data”, which consist of concepts. A concept consists of not only values but also “higher-level units” such as an interval and a distribution. Their combination can also be represented as a kind of a concept. This implies that complex data can be formally handled in the framework of SDA. However, there are very few studies based on this simple idea. Therefore, practical methods should be developed to apply this idea to solve problems in the real world. In this study, we focus on the case in which a concept contains some subsets (the concept acts as a typical complex dataset) and develop a new method to analyze this dataset directly using SDA. In this paper, we propose a dimension reduction technique in the framework of SDA, especially for a group structure, and introduce a numerical example.
{"title":"Common principal components model for symbolic data","authors":"K. Katayama, Hiroyuki Minami, M. Mizuta","doi":"10.5183/JJSCS.23.1_41","DOIUrl":"https://doi.org/10.5183/JJSCS.23.1_41","url":null,"abstract":"This paper proposes a dimension reduction technique in the framework of symbolic data analysis (SDA). Recent advances in technology have increased the complexity of datasets, and today, their size is much larger than it was in the past decade. Most statistical methods do not have sufficient power to analyze these datasets. SDA was proposed by Diday at the end of the 1980s and is a new approach for analyzing huge and complex data.SDA examines “symbolic data”, which consist of concepts. A concept consists of not only values but also “higher-level units” such as an interval and a distribution. Their combination can also be represented as a kind of a concept. This implies that complex data can be formally handled in the framework of SDA. However, there are very few studies based on this simple idea. Therefore, practical methods should be developed to apply this idea to solve problems in the real world. In this study, we focus on the case in which a concept contains some subsets (the concept acts as a typical complex dataset) and develop a new method to analyze this dataset directly using SDA. In this paper, we propose a dimension reduction technique in the framework of SDA, especially for a group structure, and introduce a numerical example.","PeriodicalId":338719,"journal":{"name":"Journal of the Japanese Society of Computational Statistics","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127938127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The rank statistic for a location-scale parameter is introduced to a change-point problem. A combination of the Wilcoxon and Mood statistics is extended to the change-point context. The proposed rank statistic is used to detect a change-point in a setting involving at most one change in this paper. The limiting distribution of the suggested statistic is derived under the null hypothesis (no change). The finite sample critical value of the suggested statistic is estimated by simulation studies. In addition, the accuracy of detecting a change-point is investigated by simulation studies. The method is illustrated by the analysis of various data.
{"title":"A RANK STATISTIC FOR THE CHANGE-POINT PROBLEM AND ITS APPLICATION","authors":"H. Murakami","doi":"10.5183/JJSCS.23.1_27","DOIUrl":"https://doi.org/10.5183/JJSCS.23.1_27","url":null,"abstract":"The rank statistic for a location-scale parameter is introduced to a change-point problem. A combination of the Wilcoxon and Mood statistics is extended to the change-point context. The proposed rank statistic is used to detect a change-point in a setting involving at most one change in this paper. The limiting distribution of the suggested statistic is derived under the null hypothesis (no change). The finite sample critical value of the suggested statistic is estimated by simulation studies. In addition, the accuracy of detecting a change-point is investigated by simulation studies. The method is illustrated by the analysis of various data.","PeriodicalId":338719,"journal":{"name":"Journal of the Japanese Society of Computational Statistics","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116068095","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We describe a Box and Cox power-transformation to simultaneously provide additivity and homoscedasticity in regression. The two methods developed here are extensions of the power-additive transformation (PAT) discussed by Goto (1992, 1995) and Hamasaki and Goto (2005). The PAT aims to improve the additivity or linearity of some simple model represented by linear predicators. We then consider combinations of the PAT with the weighting and transform-both-sides methods. We discuss the procedures to find the maximum likelihood estimates of parameters and then consider the relationship between the methods. Also, we compare the performances of the methods through a simulation study.
{"title":"Box and Cox power-transformation to additivity and homoscedasticity in regression","authors":"T. Hamasaki, Tomoyuki Sugimoto, M. Goto","doi":"10.5183/JJSCS.23.1_13","DOIUrl":"https://doi.org/10.5183/JJSCS.23.1_13","url":null,"abstract":"We describe a Box and Cox power-transformation to simultaneously provide additivity and homoscedasticity in regression. The two methods developed here are extensions of the power-additive transformation (PAT) discussed by Goto (1992, 1995) and Hamasaki and Goto (2005). The PAT aims to improve the additivity or linearity of some simple model represented by linear predicators. We then consider combinations of the PAT with the weighting and transform-both-sides methods. We discuss the procedures to find the maximum likelihood estimates of parameters and then consider the relationship between the methods. Also, we compare the performances of the methods through a simulation study.","PeriodicalId":338719,"journal":{"name":"Journal of the Japanese Society of Computational Statistics","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133843976","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bayesian estimation of the end point of a distribution is proposed and examined. For this problem, it is well known that the maximum likelihood method does not work well. By modifying the prior density in Hall and Wang (2005) and applying marginal inference, we derive estimators superior to existing ones. The proposed estimators are closely related to the estimating functions which are known to outperform maximum likelihood equations. Another advantage of the proposed method is to resolve the convergence problem. Our simulation results strongly support the superiority of the proposed estimators over the existing ones under the mean squared error. Illustrative examples are also given.
{"title":"IMPROVING BAYESIAN ESTIMATION OF THE END POINT OF A DISTRIBUTION","authors":"Yuta Minoda, T. Kamakura, T. Yanagimoto","doi":"10.5183/JJSCS.22.1_79","DOIUrl":"https://doi.org/10.5183/JJSCS.22.1_79","url":null,"abstract":"Bayesian estimation of the end point of a distribution is proposed and examined. For this problem, it is well known that the maximum likelihood method does not work well. By modifying the prior density in Hall and Wang (2005) and applying marginal inference, we derive estimators superior to existing ones. The proposed estimators are closely related to the estimating functions which are known to outperform maximum likelihood equations. Another advantage of the proposed method is to resolve the convergence problem. Our simulation results strongly support the superiority of the proposed estimators over the existing ones under the mean squared error. Illustrative examples are also given.","PeriodicalId":338719,"journal":{"name":"Journal of the Japanese Society of Computational Statistics","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124514537","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A concept of matchability of survey data is introduced based on decompositions of the joint probability density functions. This definition of matchability naturally leads to restrictions on the joint distributions in the form of various conditional independence relations. The concept of partial matchability is defined as the global matchability with respect to a subset of the underlying variables. The global matchability does not imply partial matchability and vice versa, which constitutes part of Simpson's paradox. A numerical experiment is carried out to show possible merits of algorithms based on partial matchability. We also show numerically that when the ideal assumption of matchability holds only approximately, estimation accuracy is still guaranteed to some extent. Extension to the problem of matching three files is also briefly discussed.
{"title":"Statistical matching based on probabilistic conditional independence","authors":"Jinfang Wang, Ping Jing","doi":"10.5183/JJSCS.22.1_43","DOIUrl":"https://doi.org/10.5183/JJSCS.22.1_43","url":null,"abstract":"A concept of matchability of survey data is introduced based on decompositions of the joint probability density functions. This definition of matchability naturally leads to restrictions on the joint distributions in the form of various conditional independence relations. The concept of partial matchability is defined as the global matchability with respect to a subset of the underlying variables. The global matchability does not imply partial matchability and vice versa, which constitutes part of Simpson's paradox. A numerical experiment is carried out to show possible merits of algorithms based on partial matchability. We also show numerically that when the ideal assumption of matchability holds only approximately, estimation accuracy is still guaranteed to some extent. Extension to the problem of matching three files is also briefly discussed.","PeriodicalId":338719,"journal":{"name":"Journal of the Japanese Society of Computational Statistics","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114927757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
There are several methods to estimate regression functions and their derivatives. Among them, B-spline procedures and kernel procedures are known to be useful. However, at present, it is not determined which procedure is better than the others. In this paper, we investigate the performance of the procedures by computer simulations. Two B-spline procedures are considered. The first one is to estimate derivatives using a different roughness penalty for each degree of the derivative d. In this procedure, the smoothing parameters and the coefficients of the B-spline functions are different for each d. The second procedure is to estimate the dth derivative just by differentiating the estimated regression function d-times. In this case, the regression function and its derivatives have a common coefficient vector of B-spline functions. Two kernel procedures are also considered. The first kernel procedure used in our simulations is constructed with the Gasser-Muller estimator and a global plug-in bandwidth selector. The second one is a local polynomial fitting with a refined bandwidth selector. As a result of our simulations, we find that B-spline procedures can give better estimates than the kernel ones in estimating regression functions. For derivatives, we also find that in B-spline methods, it is necessary to choose a different smoothing parameter (or coefficient vector) for each degree of derivative; between the two kernel methods, the Gasser-Muller procedure gives better results than the local polynomial fitting in most cases. Furthermore, the first B-spline method can still work better than the Gasser-Muller procedure in the central area of the domain of the functions. But in the boundary areas, the Gasser-Muller procedure gives more stable derivative estimates than all the other methods.
有几种方法可以估算回归函数及其导数。其中,已知 B-样条程序和核程序是有用的。然而,目前还没有确定哪种程序比其他程序更好。在本文中,我们通过计算机模拟来研究这些程序的性能。本文考虑了两种 B-样条程序。第一种是对导数 d 的每个度使用不同的粗糙度惩罚来估计导数。在这种程序中,B-样条函数的平滑参数和系数对每个 d 都是不同的。在这种情况下,回归函数及其导数具有共同的 B-样条函数系数向量。我们还考虑了两种核程序。我们模拟中使用的第一个核过程是用 Gasser-Muller 估计器和全局插件带宽选择器构建的。第二个内核程序是一个局部多项式拟合程序,带有一个细化带宽选择器。模拟结果表明,在估计回归函数时,B-样条程序比核程序能给出更好的估计结果。对于导数,我们还发现,在 B-样条方法中,有必要为每一级导数选择不同的平滑参数(或系数向量);在两种核方法中,Gasser-Muller 程序在大多数情况下比局部多项式拟合的结果更好。此外,在函数域的中心区域,第一种 B-样条法仍然比 Gasser-Muller 程序更有效。但在边界区域,Gasser-Muller 程序比其他所有方法都能给出更稳定的导数估计值。
{"title":"Comparisons of B-spline procedures with kernel procedures in estimating regression functions and their derivatives","authors":"Xiaoling Dou, S. Shirahata","doi":"10.5183/JJSCS.22.1_57","DOIUrl":"https://doi.org/10.5183/JJSCS.22.1_57","url":null,"abstract":"There are several methods to estimate regression functions and their derivatives. Among them, B-spline procedures and kernel procedures are known to be useful. However, at present, it is not determined which procedure is better than the others. In this paper, we investigate the performance of the procedures by computer simulations. Two B-spline procedures are considered. The first one is to estimate derivatives using a different roughness penalty for each degree of the derivative d. In this procedure, the smoothing parameters and the coefficients of the B-spline functions are different for each d. The second procedure is to estimate the dth derivative just by differentiating the estimated regression function d-times. In this case, the regression function and its derivatives have a common coefficient vector of B-spline functions. Two kernel procedures are also considered. The first kernel procedure used in our simulations is constructed with the Gasser-Muller estimator and a global plug-in bandwidth selector. The second one is a local polynomial fitting with a refined bandwidth selector. As a result of our simulations, we find that B-spline procedures can give better estimates than the kernel ones in estimating regression functions. For derivatives, we also find that in B-spline methods, it is necessary to choose a different smoothing parameter (or coefficient vector) for each degree of derivative; between the two kernel methods, the Gasser-Muller procedure gives better results than the local polynomial fitting in most cases. Furthermore, the first B-spline method can still work better than the Gasser-Muller procedure in the central area of the domain of the functions. But in the boundary areas, the Gasser-Muller procedure gives more stable derivative estimates than all the other methods.","PeriodicalId":338719,"journal":{"name":"Journal of the Japanese Society of Computational Statistics","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128608876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In this paper, we derive the exact distribution of a new test statistic for the equality of two mean vectors in the intraclass correlation model when monotone missing observations occur. Simultaneous confidence intervals for all contrasts of two mean vectors are given by using the idea in Seo and Srivastava (2000). Finally, in order to evaluate the procedure proposed in this paper, we investigate the power function of a new test statistic and the widths of simultaneous confidence intervals. The numerical results of a real example and simulation study are presented.
{"title":"TESTING EQUALITY OF TWO MEAN VECTORS AND SIMULTANEOUS CONFIDENCE INTERVALS IN REPEATED MEASURES WITH MISSING DATA","authors":"Kazuyuki Koizumi, Kazuyuki Koizumi, T. Seo","doi":"10.5183/JJSCS.22.1_33","DOIUrl":"https://doi.org/10.5183/JJSCS.22.1_33","url":null,"abstract":"In this paper, we derive the exact distribution of a new test statistic for the equality of two mean vectors in the intraclass correlation model when monotone missing observations occur. Simultaneous confidence intervals for all contrasts of two mean vectors are given by using the idea in Seo and Srivastava (2000). Finally, in order to evaluate the procedure proposed in this paper, we investigate the power function of a new test statistic and the widths of simultaneous confidence intervals. The numerical results of a real example and simulation study are presented.","PeriodicalId":338719,"journal":{"name":"Journal of the Japanese Society of Computational Statistics","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132628864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A test statistic for the equality of the j-th largest eigenvalues of the covariance matrix in a multipopulation is proposed. Asymptotic distribution of the statistic is derived under the normal population when the sample sizes are equal. By simulation studies, we investigate the power of a test using the suggested statistic for normal, contaminated normal and skew normal populations, and compare it with two nonparametric tests.
{"title":"A STATISTIC FOR TESTING THE EQUALITY OF EIGENVALUE OF COVARIANCE MATRIX ON MULTIPOPULATION","authors":"H. Murakami, S. Tsukada, Y. Takeda","doi":"10.5183/JJSCS1988.21.21","DOIUrl":"https://doi.org/10.5183/JJSCS1988.21.21","url":null,"abstract":"A test statistic for the equality of the j-th largest eigenvalues of the covariance matrix in a multipopulation is proposed. Asymptotic distribution of the statistic is derived under the normal population when the sample sizes are equal. By simulation studies, we investigate the power of a test using the suggested statistic for normal, contaminated normal and skew normal populations, and compare it with two nonparametric tests.","PeriodicalId":338719,"journal":{"name":"Journal of the Japanese Society of Computational Statistics","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132807308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EXACT MOMENTS OF FEASIBLE GENERALIZED RIDGE REGRESSION ESTIMATOR AND NUMERICAL EVALUATIONS","authors":"Masayuki Jimichi","doi":"10.5183/JJSCS1988.21.1","DOIUrl":"https://doi.org/10.5183/JJSCS1988.21.1","url":null,"abstract":"","PeriodicalId":338719,"journal":{"name":"Journal of the Japanese Society of Computational Statistics","volume":"2 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131850469","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}