When is an answer not an answer
D. Streiner, G. Norman
{"title":"When is an answer not an answer","authors":"D. Streiner, G. Norman","doi":"10.12788/J.CMONC.0037","DOIUrl":null,"url":null,"abstract":"When your beloved authors were studying research and statistics, around the time that Methuselah was celebrating his first birthday, we thought we knew the difference between hypothesis testing and hypothesis generating. With the former, you begin with a question, design a study to answer it, carry it out, and then do some statistical mumbo-jumbo on the data to determine if you have reasonable evidence to answer the question. With the latter, usually done after you’ve answered the main questions, you don’t have any preconceived idea of what’s going on, so you analyze anything that moves. We know that’s not really kosher, because the probability of finding something just by chance (a Type I error) increases astronomically as you do more tests. So, in the hypothesis generating phase, you don’t come to any conclusions; you just say, “That’s an interesting finding. Now we’ll have to do a real study to see if our observation holds up.” Well, we thought we knew the difference, but something must have changed over the past few centuries when we weren’t paying too much attention. The reason for our puzzlement is an article by Hurvitz et al about the relative effectiveness of trastuzumab emtansine (T-DM1) compared with trastuzumab plus docetaxel (HT) in patients with metastatic breast cancer. First, a bit about the study itself. This was a phase 2, multicenter open label randomized controlled trial. “Phase 2” means it’s not yet ready for prime time, and “open label” means that nobody was blinded regarding who got what. (“Multicenter” means a great opportunity for the investigators to rack up frequent flier points.) There were 137 women with HER2-positive metastatic breast cancer or recurrent locally advanced breast cancer, randomly divided between the 2 groups. The primary endpoints were progressionfree survival (PFS) and safety, both assessed by the investigators. Key secondary endpoints were overall survival (OS), objective response rate (ORR), quality of life (QOL), and a handful of others. What they found was that the median PFS was 9.2 months with HT, compared with 14.2 months for T-DM1; and an ORR of 58.0% in the HT group and 64.4% in the T-DM1 group; both were statistically significant. However, “preliminary OS results were similar between treatment arms.” So, let’s begin looking at the study. It may help if you jotted down all of the abbreviations that were used; we listed 15 before we ran out of lead for our pencils, and we never even got to the ones from statistics we were familiar with. We can’t fault the authors for this; it appears to be an editorial policy to abbreviate everything and not provide a table of them to help readers. Ink must be a very precious commodity. But back to the study. The paper states that “This study had a hypothesis-generating statistical design.” If you go back over all of the papers we have written for this journal, looking for a definition of “hypothesisgenerating statistical design,” you will look in vain. If you think that we have been remiss in not discussing all research designs (actually, we have been, and haven’t mentioned many of them) and check textbooks of research design, your search will again prove to be fruitless. In fact, we had to resort to that salvation of all serious academic researchers, Google. What we found, among all the hundreds of millions of Web pages, was only one mention of the term – in the article we are reviewing! So what does the term mean? Given our vast knowledge of statistics and research design, we feel safe in saying, “We don’t have the foggiest idea.” There are indeed many research designs, and we should know; we’ve written books about them (OK, so maybe only one book). Different designs depend on how the subjects were located, how (and if) they were followed up, whether or not the researchers had any control over who got what, and a host of other factors, but not whether the study was meant to test hypotheses or to generate them – that depends solely on whether the analyses were specified beforehand or not. Commun Oncol 2013;10:189-190 © 2013 Frontline Medical Communications DOI: 10.12788/j.cmonc.0037 Features Practical Biostatistics","PeriodicalId":72649,"journal":{"name":"Community oncology","volume":"10 1","pages":"189-190"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Community oncology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12788/J.CMONC.0037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
When your beloved authors were studying research and statistics, around the time that Methuselah was celebrating his first birthday, we thought we knew the difference between hypothesis testing and hypothesis generating. With the former, you begin with a question, design a study to answer it, carry it out, and then do some statistical mumbo-jumbo on the data to determine if you have reasonable evidence to answer the question. With the latter, usually done after you’ve answered the main questions, you don’t have any preconceived idea of what’s going on, so you analyze anything that moves. We know that’s not really kosher, because the probability of finding something just by chance (a Type I error) increases astronomically as you do more tests. So, in the hypothesis generating phase, you don’t come to any conclusions; you just say, “That’s an interesting finding. Now we’ll have to do a real study to see if our observation holds up.” Well, we thought we knew the difference, but something must have changed over the past few centuries when we weren’t paying too much attention. The reason for our puzzlement is an article by Hurvitz et al about the relative effectiveness of trastuzumab emtansine (T-DM1) compared with trastuzumab plus docetaxel (HT) in patients with metastatic breast cancer. First, a bit about the study itself. This was a phase 2, multicenter open label randomized controlled trial. “Phase 2” means it’s not yet ready for prime time, and “open label” means that nobody was blinded regarding who got what. (“Multicenter” means a great opportunity for the investigators to rack up frequent flier points.) There were 137 women with HER2-positive metastatic breast cancer or recurrent locally advanced breast cancer, randomly divided between the 2 groups. The primary endpoints were progressionfree survival (PFS) and safety, both assessed by the investigators. Key secondary endpoints were overall survival (OS), objective response rate (ORR), quality of life (QOL), and a handful of others. What they found was that the median PFS was 9.2 months with HT, compared with 14.2 months for T-DM1; and an ORR of 58.0% in the HT group and 64.4% in the T-DM1 group; both were statistically significant. However, “preliminary OS results were similar between treatment arms.” So, let’s begin looking at the study. It may help if you jotted down all of the abbreviations that were used; we listed 15 before we ran out of lead for our pencils, and we never even got to the ones from statistics we were familiar with. We can’t fault the authors for this; it appears to be an editorial policy to abbreviate everything and not provide a table of them to help readers. Ink must be a very precious commodity. But back to the study. The paper states that “This study had a hypothesis-generating statistical design.” If you go back over all of the papers we have written for this journal, looking for a definition of “hypothesisgenerating statistical design,” you will look in vain. If you think that we have been remiss in not discussing all research designs (actually, we have been, and haven’t mentioned many of them) and check textbooks of research design, your search will again prove to be fruitless. In fact, we had to resort to that salvation of all serious academic researchers, Google. What we found, among all the hundreds of millions of Web pages, was only one mention of the term – in the article we are reviewing! So what does the term mean? Given our vast knowledge of statistics and research design, we feel safe in saying, “We don’t have the foggiest idea.” There are indeed many research designs, and we should know; we’ve written books about them (OK, so maybe only one book). Different designs depend on how the subjects were located, how (and if) they were followed up, whether or not the researchers had any control over who got what, and a host of other factors, but not whether the study was meant to test hypotheses or to generate them – that depends solely on whether the analyses were specified beforehand or not. Commun Oncol 2013;10:189-190 © 2013 Frontline Medical Communications DOI: 10.12788/j.cmonc.0037 Features Practical Biostatistics
什么时候答案不是答案
当我们敬爱的作者们学习研究和统计学的时候,也就是玛土撒拉庆祝一岁生日的时候,我们以为我们知道假设检验和假设生成之间的区别。对于前者,你从一个问题开始,设计一个研究来回答它,执行它,然后对数据做一些统计上的繁文缛节,以确定你是否有合理的证据来回答这个问题。对于后者,通常在你回答了主要问题之后进行,你对正在发生的事情没有任何先入为主的想法,所以你要分析任何移动的东西。我们知道这并不是很合理,因为随着测试次数的增加,偶然发现某些东西的概率(类型I错误)会以天文数字的方式增加。所以,在假设生成阶段,你不会得出任何结论;你只是说,“这是一个有趣的发现。现在我们必须做一个真正的研究,看看我们的观察是否站得住脚。”好吧,我们以为我们知道其中的区别,但在过去的几个世纪里,当我们不太注意的时候,一定有什么东西发生了变化。我们困惑的原因是Hurvitz等人的一篇关于曲妥珠单抗emtansine (T-DM1)与曲妥珠单抗+多西他赛(HT)在转移性乳腺癌患者中的相对有效性的文章。首先,介绍一下这项研究本身。这是一项2期多中心开放标签随机对照试验。“第二阶段”意味着它还没有准备好进入黄金时段,“开放标签”意味着没有人对谁得到了什么一无所知。(“多中心”意味着调查人员有很大的机会积累飞行积分。)137例her2阳性转移性乳腺癌或复发性局部晚期乳腺癌患者随机分为两组。主要终点是无进展生存期(PFS)和安全性,均由研究人员评估。关键的次要终点是总生存期(OS)、客观缓解率(ORR)、生活质量(QOL)和其他一些终点。他们发现,HT患者的中位PFS为9.2个月,而T-DM1患者为14.2个月;HT组和T-DM1组的ORR分别为58.0%和64.4%;两者都有统计学意义。然而,“治疗组之间的初步OS结果相似。”那么,让我们开始看看这项研究。如果你记下所有使用过的缩写,可能会有所帮助;在铅笔用完之前,我们列出了15个,我们甚至没有从我们熟悉的统计数据中列出。我们不能因此责怪作者;这似乎是一个编辑政策,缩写所有的东西,不提供一个表格,以帮助读者。墨水一定是一种非常珍贵的商品。回到研究上来。论文指出,“这项研究有一个产生假设的统计设计。”如果你回顾我们为本刊写的所有论文,寻找“产生假设的统计设计”的定义,你将一无所获。如果你认为我们没有讨论所有的研究设计是疏忽的(实际上,我们已经讨论过了,而且没有提到很多),并检查研究设计的教科书,你的搜索将再次证明是徒劳的。事实上,我们不得不求助于所有严肃的学术研究者的救星,b谷歌。我们发现,在所有数以亿计的网页中,只有一个提到了这个术语——在我们正在审查的文章中!那么这个词是什么意思呢?鉴于我们在统计学和研究设计方面的丰富知识,我们可以放心地说:“我们一点也不清楚。”确实有很多研究设计,我们应该知道;我们已经写了关于它们的书(好吧,也许只有一本书)。不同的设计取决于研究对象是如何定位的,他们是如何(以及是否)被跟踪的,研究人员是否对谁得到了什么有任何控制,以及许多其他因素,但不是研究是为了测试假设还是产生假设——这完全取决于分析是否事先指定。common Oncol 2013; 10:19 9-190©2013 Frontline Medical Communications DOI: 10.12788/j.cmonc.0037实用生物统计学
本文章由计算机程序翻译,如有差异,请以英文原文为准。