At the present day, it becomes imperative to develop appropriate statistical methods for high-dimensional and small sample data analysis because data formats in the biological or medical fields have been dramatically changed. Especially, it will be common in the near future to analyze clinical data together with genomic data. In this review paper, we introduce several current approaches to the analysis relating to genomic and proteomic data, and describe some limitations or problems in the statistical performance.In the former part of this paper, we explain a problem of p»n, which is the fundamental challenge in data analysis in bioinformatics. In particular, we consider a typical problem of p»n in prediction of treatment effects using microarray data as feature vectors. Then, we introduce some new boosting methods based on the area under the ROC curve. After showing some applications of the boosting methods, we summarize the present problems and refer to outlook for the future.
{"title":"ゲノム・プロテオミクスデータを用いた予測解析:機械学習による新しい統計的手法","authors":"理 小森, 真透 江口","doi":"10.5691/JJB.32.49","DOIUrl":"https://doi.org/10.5691/JJB.32.49","url":null,"abstract":"At the present day, it becomes imperative to develop appropriate statistical methods for high-dimensional and small sample data analysis because data formats in the biological or medical fields have been dramatically changed. Especially, it will be common in the near future to analyze clinical data together with genomic data. In this review paper, we introduce several current approaches to the analysis relating to genomic and proteomic data, and describe some limitations or problems in the statistical performance.In the former part of this paper, we explain a problem of p»n, which is the fundamental challenge in data analysis in bioinformatics. In particular, we consider a typical problem of p»n in prediction of treatment effects using microarray data as feature vectors. Then, we introduce some new boosting methods based on the area under the ROC curve. After showing some applications of the boosting methods, we summarize the present problems and refer to outlook for the future.","PeriodicalId":365545,"journal":{"name":"Japanese journal of biometrics","volume":"345 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2011-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124262207","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. Taguri, Y. Matsuyama, Y. Ohashi, H. Sone, Y. Yoshimura, N. Yamada
To examine the effect of food intakes on the occurrence of a specific disease, it is necessary to take account of numerous measurement errors in dietary assessment instruments, such as the 24-hour recall or the food frequency questionnaire. The regression calibration (RC) method has been widely used for correcting the measurement error. However, the resulting corrected estimator is generally more variable than the naive biased one. Using the Bayesian hierarchical regression models, one can obtain more precise estimates than using ordinary regression models by incorporating additional information into a second-stage regression. In this paper, we propose a hierarchical Poisson regression model, in which multivariate measurement errors are adjusted by RC method. Simulation studies were conducted to investigate the performances of the proposed method, which showed that the proposed estimators were nearly unbiased, and were more precise than the usual RC ones even in the case of a few number of exposure. We also applied the proposed method to the analysis of a large prospective study, JDCS (Japan Diabetes Complications Study), to examine the effect of food group intakes on the occurrence of the cardiovascular disease (CVD) among type2 diabetic patients.
{"title":"A Hierarchical Regression Model for Dietary Data Adjusting for Covariates Measurement Error by Regression Calibration: An Application to a Large Prospective Study for Diabetic Complications","authors":"M. Taguri, Y. Matsuyama, Y. Ohashi, H. Sone, Y. Yoshimura, N. Yamada","doi":"10.5691/JJB.31.49","DOIUrl":"https://doi.org/10.5691/JJB.31.49","url":null,"abstract":"To examine the effect of food intakes on the occurrence of a specific disease, it is necessary to take account of numerous measurement errors in dietary assessment instruments, such as the 24-hour recall or the food frequency questionnaire. The regression calibration (RC) method has been widely used for correcting the measurement error. However, the resulting corrected estimator is generally more variable than the naive biased one. Using the Bayesian hierarchical regression models, one can obtain more precise estimates than using ordinary regression models by incorporating additional information into a second-stage regression. In this paper, we propose a hierarchical Poisson regression model, in which multivariate measurement errors are adjusted by RC method. Simulation studies were conducted to investigate the performances of the proposed method, which showed that the proposed estimators were nearly unbiased, and were more precise than the usual RC ones even in the case of a few number of exposure. We also applied the proposed method to the analysis of a large prospective study, JDCS (Japan Diabetes Complications Study), to examine the effect of food group intakes on the occurrence of the cardiovascular disease (CVD) among type2 diabetic patients.","PeriodicalId":365545,"journal":{"name":"Japanese journal of biometrics","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122677559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Differences in some traits between males and females, called sexual dimorphism, are observed among wild and livestock animals. For traits in which variances may be heterogeneous between sexes in some cases, evaluating the relevant genetic parameters, including genetic correlation between sexes, is an important topic requiring estimation of the components of (co)variances. This study developed a Bayesian approach via the Gibbs sampler to estimate the (co)variance components and genetic parameters of sexual dimorphism. As prior distributions, uniform, multivariate normal, two dimensional scaled inverted Wishart and independent scaled inverted chi-square distributions were used for the macro-environmental effects, breeding values, additive genetic (co)variances and residual variances, respectively. This approach was applied to beef carcass trait data, and the estimates of the (co)variance components and genetic parameters (especially the modes of the marginal posterior densities) were generally in agreement with those obtained using the restricted maximum likelihood procedure.
{"title":"A Bayesian Inference of Genetic Parameters for Sexual Dimorphism Using Carcass Trait Data","authors":"A. Arakawa, H. Iwaisaki","doi":"10.5691/JJB.31.77","DOIUrl":"https://doi.org/10.5691/JJB.31.77","url":null,"abstract":"Differences in some traits between males and females, called sexual dimorphism, are observed among wild and livestock animals. For traits in which variances may be heterogeneous between sexes in some cases, evaluating the relevant genetic parameters, including genetic correlation between sexes, is an important topic requiring estimation of the components of (co)variances. This study developed a Bayesian approach via the Gibbs sampler to estimate the (co)variance components and genetic parameters of sexual dimorphism. As prior distributions, uniform, multivariate normal, two dimensional scaled inverted Wishart and independent scaled inverted chi-square distributions were used for the macro-environmental effects, breeding values, additive genetic (co)variances and residual variances, respectively. This approach was applied to beef carcass trait data, and the estimates of the (co)variance components and genetic parameters (especially the modes of the marginal posterior densities) were generally in agreement with those obtained using the restricted maximum likelihood procedure.","PeriodicalId":365545,"journal":{"name":"Japanese journal of biometrics","volume":"186 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116905505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Epidemiologic findings by conventional statistical methods reflect uncertainty due to random error but omit uncertainty due to biases, such as unmeasured confounding, selection bias, and misclassification error. One approach for addressing this problem is to perform sensitivity analyses. We used MCSA (Monte Carlo sensitivity analysis) to analyze data from a large population-based cohort study, Japan Arteriosclerosis Longitudinal Study-Existing Cohorts Combine. The effects of the blood pressure on arteriosclerotic disease were examined among 21,949 subjects accounting for both misclassification of exposure and unmeasured confounding. We used a Poisson regression model to estimate the gender-specific incidence rate ratio (IRR) of each blood pressure category adjusted for several measured risk factors. The prior information on the misclassified blood pressure and the unmeasured diabetes mellitus history was obtained from sub-cohort members. Sequential correction of two biases by the MCSA led to large decrease of IRR among pre-hypertensive men (IRR = 1.79 [95% limits = 0.22−3.78]) and women (1.15 [0.28−2.25]), and large increase of IRR among stage 2 hypertensive men (7.24 [3.50−11.2]) and women (4.12 [2.14−6.89]). Our expanded MCSA provides valuable approach for bias analysis, which makes explicit and quantifies sources of uncertainty.
{"title":"Monte Carlo Sensitivity Analysis for Adjusting Multiple-bias in the Longitudinal Cardiovascular Study","authors":"A. Takeuchi, Y. Matsuyama, Y. Ohashi, H. Ueshima","doi":"10.5691/JJB.31.63","DOIUrl":"https://doi.org/10.5691/JJB.31.63","url":null,"abstract":"Epidemiologic findings by conventional statistical methods reflect uncertainty due to random error but omit uncertainty due to biases, such as unmeasured confounding, selection bias, and misclassification error. One approach for addressing this problem is to perform sensitivity analyses. We used MCSA (Monte Carlo sensitivity analysis) to analyze data from a large population-based cohort study, Japan Arteriosclerosis Longitudinal Study-Existing Cohorts Combine. The effects of the blood pressure on arteriosclerotic disease were examined among 21,949 subjects accounting for both misclassification of exposure and unmeasured confounding. We used a Poisson regression model to estimate the gender-specific incidence rate ratio (IRR) of each blood pressure category adjusted for several measured risk factors. The prior information on the misclassified blood pressure and the unmeasured diabetes mellitus history was obtained from sub-cohort members. Sequential correction of two biases by the MCSA led to large decrease of IRR among pre-hypertensive men (IRR = 1.79 [95% limits = 0.22−3.78]) and women (1.15 [0.28−2.25]), and large increase of IRR among stage 2 hypertensive men (7.24 [3.50−11.2]) and women (4.12 [2.14−6.89]). Our expanded MCSA provides valuable approach for bias analysis, which makes explicit and quantifies sources of uncertainty.","PeriodicalId":365545,"journal":{"name":"Japanese journal of biometrics","volume":"424 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115578792","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Noncompliance is an important problem in randomized trials. The estimation and bounds of average causal effects (ACEs) have been discussed as a way to address this issue. Previous studies have considered ACEs under the instrumental variable (IV) assumption, which postulates that potential outcomes are constant across subject sub-populations assigned to separate treatment regimens. However, the IV assumption may not be valid in unmasked trials. In the present analyses, the IV assumption is relaxed to the monotone IV (MIV) assumption, which replaces equality in the IV assumption with inequality. We propose bounds on ACEs under the MIV assumption in addition to the other existing assumptions. The results demonstrate that the intention-to-treat effect is an upper or lower bound under one assumption and the per-protocol effect is an upper or lower bound under the other assumption, even using the MIV assumption in place of the IV assumption. These proposed bounds are illustrated using a classic randomized trial.
{"title":"The Monotone Instrumental Variable in Randomized Trials with Noncompliance","authors":"Y. Chiba","doi":"10.5691/JJB.31.93","DOIUrl":"https://doi.org/10.5691/JJB.31.93","url":null,"abstract":"Noncompliance is an important problem in randomized trials. The estimation and bounds of average causal effects (ACEs) have been discussed as a way to address this issue. Previous studies have considered ACEs under the instrumental variable (IV) assumption, which postulates that potential outcomes are constant across subject sub-populations assigned to separate treatment regimens. However, the IV assumption may not be valid in unmasked trials. In the present analyses, the IV assumption is relaxed to the monotone IV (MIV) assumption, which replaces equality in the IV assumption with inequality. We propose bounds on ACEs under the MIV assumption in addition to the other existing assumptions. The results demonstrate that the intention-to-treat effect is an upper or lower bound under one assumption and the per-protocol effect is an upper or lower bound under the other assumption, even using the MIV assumption in place of the IV assumption. These proposed bounds are illustrated using a classic randomized trial.","PeriodicalId":365545,"journal":{"name":"Japanese journal of biometrics","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132324844","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shiro Tanaka, Koji Oba, K. Yoshimura, S. Teramukai
Surrogate endpoints, which represent a compromise in the conflict between measurability and clinical relevance of endpoints, have considerable advantage in rapid drug approvals compared to true endpoints in confirmatory clinical trials dealing with life-threatening diseases, such as cancer or AIDS. However, past experiences have shown the risk of relying too heavily on surrogate endpoints. In this paper, we review statistical criteria for evaluating surrogate endpoints and the past examples properly evaluated the surrogacy, taking into consideration relevant clinical and statistical issues.
{"title":"Statistical Criteria for Surrogate Endpoint and Applications : A Review","authors":"Shiro Tanaka, Koji Oba, K. Yoshimura, S. Teramukai","doi":"10.5691/JJB.31.23","DOIUrl":"https://doi.org/10.5691/JJB.31.23","url":null,"abstract":"Surrogate endpoints, which represent a compromise in the conflict between measurability and clinical relevance of endpoints, have considerable advantage in rapid drug approvals compared to true endpoints in confirmatory clinical trials dealing with life-threatening diseases, such as cancer or AIDS. However, past experiences have shown the risk of relying too heavily on surrogate endpoints. In this paper, we review statistical criteria for evaluating surrogate endpoints and the past examples properly evaluated the surrogacy, taking into consideration relevant clinical and statistical issues.","PeriodicalId":365545,"journal":{"name":"Japanese journal of biometrics","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128920237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
This paper presents an asymptotic method of power calculations for likelihood ratio tests in the nested case-control designs with a randomly sampled control per case. It is an extension of the approach described in Self et al. (1992) for proportional hazards models. Our approach here focuses on a simple scenario with 1 : 1 case-control ratio, simple random sampling design, and two independent dichotomous covariates with no interaction effects. The approximation of the noncentrality of the noncentral chi-square distribution for the likelihood ratio statistic is provided. Simulation studies are conducted to examine the accuracy for several parameter values and data configurations. Overall the results suggest that estimates of power using our proposed method are consistent with those of actual power from Monte Carlo simulation. Therefore, the proposed approach can be practically useful in assessing the statistical power for the simple nested case-control design.
{"title":"Power Calculation for Likelihood Ratio Tests in the Nested Case-control Designs with a Randomly Sampled Control Per Case","authors":"S. Izumi, Y. Fujii","doi":"10.5691/JJB.31.1","DOIUrl":"https://doi.org/10.5691/JJB.31.1","url":null,"abstract":"This paper presents an asymptotic method of power calculations for likelihood ratio tests in the nested case-control designs with a randomly sampled control per case. It is an extension of the approach described in Self et al. (1992) for proportional hazards models. Our approach here focuses on a simple scenario with 1 : 1 case-control ratio, simple random sampling design, and two independent dichotomous covariates with no interaction effects. The approximation of the noncentrality of the noncentral chi-square distribution for the likelihood ratio statistic is provided. Simulation studies are conducted to examine the accuracy for several parameter values and data configurations. Overall the results suggest that estimates of power using our proposed method are consistent with those of actual power from Monte Carlo simulation. Therefore, the proposed approach can be practically useful in assessing the statistical power for the simple nested case-control design.","PeriodicalId":365545,"journal":{"name":"Japanese journal of biometrics","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114194250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ranking significant genes based on the P-value in multiple testing is a simple and common practice in microarray data analysis, and its theoretical optimality is of particular interest. McLachlan et al. (Bioinformatics 2006; 22: 1608-1615) presented a method for calculating the local FDR under normal mixture models and provided a theoretical optimality of the local FDR as a ranking statistic. In this article, we show that the optimal gene ranking based on the local FDR calculated by the McLachlan et al.’s method perfectly accords with that based on P-value under certain conditions. We argue that these conditions are generally satisfied for significant genes with small P-values. We demonstrate it using several real examples.
在微阵列数据分析中,基于多重测试中的p值对重要基因进行排序是一种简单而常见的做法,其理论最优性特别令人感兴趣。McLachlan et al.(生物信息学2006;(22: 1608-1615)提出了一种计算正常混合模型下局部FDR的方法,并给出了局部FDR作为排序统计量的理论最优性。在本文中,我们证明了McLachlan等人基于局部FDR计算的最优基因排序在一定条件下与基于p值的最优排序完全一致。我们认为,对于p值较小的显著性基因,通常满足这些条件。我们用几个真实的例子来证明它。
{"title":"Optimality of Gene Ranking Based on Univariate P-values for Detecting Differentially Expressed Genes","authors":"H. Noma, S. Matsui","doi":"10.5691/JJB.31.13","DOIUrl":"https://doi.org/10.5691/JJB.31.13","url":null,"abstract":"Ranking significant genes based on the P-value in multiple testing is a simple and common practice in microarray data analysis, and its theoretical optimality is of particular interest. McLachlan et al. (Bioinformatics 2006; 22: 1608-1615) presented a method for calculating the local FDR under normal mixture models and provided a theoretical optimality of the local FDR as a ranking statistic. In this article, we show that the optimal gene ranking based on the local FDR calculated by the McLachlan et al.’s method perfectly accords with that based on P-value under certain conditions. We argue that these conditions are generally satisfied for significant genes with small P-values. We demonstrate it using several real examples.","PeriodicalId":365545,"journal":{"name":"Japanese journal of biometrics","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129457632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kenta Murotani∗1, Yoshiko Aoyama∗2,∗4, Shuji Nagata∗3 and Takashi Yanagawa∗2 ∗1Department of Biostatistics, Graduate School of Medicine, Kurume University, Asahi-machi 67, Kurume, Fukuoka 830-0011, Japan ∗2The Biostatistics Center, Kurume University, Asahi-machi 67, Kurume, Fukuoka 830-0011, Japan ∗3The Department of Radiology, Kurume University Faculty of Medicine, Asahi-machi 67, Kurume, Fukuoka 830-0011, Japan ∗4Bell System 24, Inc., 2-16-8 Minami Ikebukuro, Toshima-ku, Tokyo 171-0022, Japan e-mail:a205gm024m@std.kurume-u.ac.jp
室谷健太* 1,青山芳子* 2,* 4,永田修二* 3,柳川隆* 2 * 1久留大学医学研究生院生物统计学系,福冈久留市830-0011,日本* 2久留大学生物统计中心,福冈久留市830-0011,日本* 3久留大学医学部放射学学系,福冈久留市830-0011,日本* 4 bell System 24, 2-16-8东京富岛南池黑,171-0022,日本;日本电子邮件:a205gm024m@std.kurume-u.ac.jp
{"title":"Exact Method for Comparing Two Diagnostic Tests with Multiple Readers Based on Categorical Measurements","authors":"K. Murotani, Y. Aoyama, S. Nagata, T. Yanagawa","doi":"10.5691/JJB.30.69","DOIUrl":"https://doi.org/10.5691/JJB.30.69","url":null,"abstract":"Kenta Murotani∗1, Yoshiko Aoyama∗2,∗4, Shuji Nagata∗3 and Takashi Yanagawa∗2 ∗1Department of Biostatistics, Graduate School of Medicine, Kurume University, Asahi-machi 67, Kurume, Fukuoka 830-0011, Japan ∗2The Biostatistics Center, Kurume University, Asahi-machi 67, Kurume, Fukuoka 830-0011, Japan ∗3The Department of Radiology, Kurume University Faculty of Medicine, Asahi-machi 67, Kurume, Fukuoka 830-0011, Japan ∗4Bell System 24, Inc., 2-16-8 Minami Ikebukuro, Toshima-ku, Tokyo 171-0022, Japan e-mail:a205gm024m@std.kurume-u.ac.jp","PeriodicalId":365545,"journal":{"name":"Japanese journal of biometrics","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128178168","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}