The Q-Q Plot of p-values for Predicting Outcomes with the Gene Expression Data

Y. Ito, Y. Fujiwara, Y. Ohashi
{"title":"The Q-Q Plot of p-values for Predicting Outcomes with the Gene Expression Data","authors":"Y. Ito, Y. Fujiwara, Y. Ohashi","doi":"10.5691/JJB.28.37","DOIUrl":null,"url":null,"abstract":"Michiels et al. (2005) showed that a list of genes identified as predictors of prognosis via a non-repeated training — validation approach is unstable and advocate the validation by repeated random sampling. They considered that the genes which were selected as top 50 genes in more than half of their jackknife samples were stable for prediction. However, there is no rationale of the determination of the length of the gene list and the threshold of stability. Since evaluating an accumulation of low p-values in the repeated random sampling is essentially required for a stability assessment, it is better to compare the distribution of p-values of a gene observed with the distribution of p-values under the null hypothesis directly. In this study, the Quantile-Quantile plot (Q-Q plot) of p-values with null reference was proposed for this purpose. We applied the proposed method to a clinical data for primary breast cancer. The Q-Q plot approach can reveal that the genes with a similar p-value in the ordinary analysis have different p-value distributions in the repeated random sampling, and the gene with low p-values accumulated in the repeated random sampling could be evaluated according to the reference lines in the Q-Q plot.","PeriodicalId":365545,"journal":{"name":"Japanese journal of biometrics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2007-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Japanese journal of biometrics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5691/JJB.28.37","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Michiels et al. (2005) showed that a list of genes identified as predictors of prognosis via a non-repeated training — validation approach is unstable and advocate the validation by repeated random sampling. They considered that the genes which were selected as top 50 genes in more than half of their jackknife samples were stable for prediction. However, there is no rationale of the determination of the length of the gene list and the threshold of stability. Since evaluating an accumulation of low p-values in the repeated random sampling is essentially required for a stability assessment, it is better to compare the distribution of p-values of a gene observed with the distribution of p-values under the null hypothesis directly. In this study, the Quantile-Quantile plot (Q-Q plot) of p-values with null reference was proposed for this purpose. We applied the proposed method to a clinical data for primary breast cancer. The Q-Q plot approach can reveal that the genes with a similar p-value in the ordinary analysis have different p-value distributions in the repeated random sampling, and the gene with low p-values accumulated in the repeated random sampling could be evaluated according to the reference lines in the Q-Q plot.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
用基因表达数据预测预后的p值Q-Q图
Michiels et al.(2005)表明,通过非重复训练验证方法确定的预后预测因子基因列表是不稳定的,并主张通过重复随机抽样进行验证。他们认为,在一半以上的折刀样本中,被选为前50个基因的基因是稳定的,可以预测。然而,目前还没有确定基因列表长度和稳定性阈值的基本原理。由于评估重复随机抽样中低p值的积累对于稳定性评估是必不可少的,因此最好直接将观察到的基因p值分布与零假设下的p值分布进行比较。为此,本研究提出了无参考p值的分位数-分位数图(Q-Q图)。我们将提出的方法应用于原发性乳腺癌的临床资料。Q-Q图方法可以揭示在常规分析中p值相近的基因在重复随机抽样中p值分布不同,在重复随机抽样中积累的p值较低的基因可以根据Q-Q图中的参考线进行评价。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
自引率
0.00%
发文量
0
期刊最新文献
Meta-Analysis for Time-to-event Outcome Based on Restored Individual Participant Data and Summary Statistics Theoretical Examination and Simulation Study on Analyses for Progression Free Survival as Interval-censored Data がん臨床試験と競合リスク・マルチステートモデル Robust and Interpretable Hazard-based Summary Measures of the Magnitude of the Treatment Effect and Their Inference Procedures Bayesian Ridge Estimators Based on Copula-based Joint Prior Distributions: Cox Regression Model
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1