{"title":"什么时候答案不是答案","authors":"D. Streiner, G. Norman","doi":"10.12788/J.CMONC.0037","DOIUrl":null,"url":null,"abstract":"When your beloved authors were studying research and statistics, around the time that Methuselah was celebrating his first birthday, we thought we knew the difference between hypothesis testing and hypothesis generating. With the former, you begin with a question, design a study to answer it, carry it out, and then do some statistical mumbo-jumbo on the data to determine if you have reasonable evidence to answer the question. With the latter, usually done after you’ve answered the main questions, you don’t have any preconceived idea of what’s going on, so you analyze anything that moves. We know that’s not really kosher, because the probability of finding something just by chance (a Type I error) increases astronomically as you do more tests. So, in the hypothesis generating phase, you don’t come to any conclusions; you just say, “That’s an interesting finding. Now we’ll have to do a real study to see if our observation holds up.” Well, we thought we knew the difference, but something must have changed over the past few centuries when we weren’t paying too much attention. The reason for our puzzlement is an article by Hurvitz et al about the relative effectiveness of trastuzumab emtansine (T-DM1) compared with trastuzumab plus docetaxel (HT) in patients with metastatic breast cancer. First, a bit about the study itself. This was a phase 2, multicenter open label randomized controlled trial. “Phase 2” means it’s not yet ready for prime time, and “open label” means that nobody was blinded regarding who got what. (“Multicenter” means a great opportunity for the investigators to rack up frequent flier points.) There were 137 women with HER2-positive metastatic breast cancer or recurrent locally advanced breast cancer, randomly divided between the 2 groups. The primary endpoints were progressionfree survival (PFS) and safety, both assessed by the investigators. Key secondary endpoints were overall survival (OS), objective response rate (ORR), quality of life (QOL), and a handful of others. What they found was that the median PFS was 9.2 months with HT, compared with 14.2 months for T-DM1; and an ORR of 58.0% in the HT group and 64.4% in the T-DM1 group; both were statistically significant. However, “preliminary OS results were similar between treatment arms.” So, let’s begin looking at the study. It may help if you jotted down all of the abbreviations that were used; we listed 15 before we ran out of lead for our pencils, and we never even got to the ones from statistics we were familiar with. We can’t fault the authors for this; it appears to be an editorial policy to abbreviate everything and not provide a table of them to help readers. Ink must be a very precious commodity. But back to the study. The paper states that “This study had a hypothesis-generating statistical design.” If you go back over all of the papers we have written for this journal, looking for a definition of “hypothesisgenerating statistical design,” you will look in vain. If you think that we have been remiss in not discussing all research designs (actually, we have been, and haven’t mentioned many of them) and check textbooks of research design, your search will again prove to be fruitless. In fact, we had to resort to that salvation of all serious academic researchers, Google. What we found, among all the hundreds of millions of Web pages, was only one mention of the term – in the article we are reviewing! So what does the term mean? Given our vast knowledge of statistics and research design, we feel safe in saying, “We don’t have the foggiest idea.” There are indeed many research designs, and we should know; we’ve written books about them (OK, so maybe only one book). Different designs depend on how the subjects were located, how (and if) they were followed up, whether or not the researchers had any control over who got what, and a host of other factors, but not whether the study was meant to test hypotheses or to generate them – that depends solely on whether the analyses were specified beforehand or not. Commun Oncol 2013;10:189-190 © 2013 Frontline Medical Communications DOI: 10.12788/j.cmonc.0037 Features Practical Biostatistics","PeriodicalId":72649,"journal":{"name":"Community oncology","volume":"10 1","pages":"189-190"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"When is an answer not an answer\",\"authors\":\"D. Streiner, G. Norman\",\"doi\":\"10.12788/J.CMONC.0037\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When your beloved authors were studying research and statistics, around the time that Methuselah was celebrating his first birthday, we thought we knew the difference between hypothesis testing and hypothesis generating. With the former, you begin with a question, design a study to answer it, carry it out, and then do some statistical mumbo-jumbo on the data to determine if you have reasonable evidence to answer the question. With the latter, usually done after you’ve answered the main questions, you don’t have any preconceived idea of what’s going on, so you analyze anything that moves. We know that’s not really kosher, because the probability of finding something just by chance (a Type I error) increases astronomically as you do more tests. So, in the hypothesis generating phase, you don’t come to any conclusions; you just say, “That’s an interesting finding. Now we’ll have to do a real study to see if our observation holds up.” Well, we thought we knew the difference, but something must have changed over the past few centuries when we weren’t paying too much attention. The reason for our puzzlement is an article by Hurvitz et al about the relative effectiveness of trastuzumab emtansine (T-DM1) compared with trastuzumab plus docetaxel (HT) in patients with metastatic breast cancer. First, a bit about the study itself. This was a phase 2, multicenter open label randomized controlled trial. “Phase 2” means it’s not yet ready for prime time, and “open label” means that nobody was blinded regarding who got what. (“Multicenter” means a great opportunity for the investigators to rack up frequent flier points.) There were 137 women with HER2-positive metastatic breast cancer or recurrent locally advanced breast cancer, randomly divided between the 2 groups. The primary endpoints were progressionfree survival (PFS) and safety, both assessed by the investigators. Key secondary endpoints were overall survival (OS), objective response rate (ORR), quality of life (QOL), and a handful of others. What they found was that the median PFS was 9.2 months with HT, compared with 14.2 months for T-DM1; and an ORR of 58.0% in the HT group and 64.4% in the T-DM1 group; both were statistically significant. However, “preliminary OS results were similar between treatment arms.” So, let’s begin looking at the study. It may help if you jotted down all of the abbreviations that were used; we listed 15 before we ran out of lead for our pencils, and we never even got to the ones from statistics we were familiar with. We can’t fault the authors for this; it appears to be an editorial policy to abbreviate everything and not provide a table of them to help readers. Ink must be a very precious commodity. But back to the study. The paper states that “This study had a hypothesis-generating statistical design.” If you go back over all of the papers we have written for this journal, looking for a definition of “hypothesisgenerating statistical design,” you will look in vain. If you think that we have been remiss in not discussing all research designs (actually, we have been, and haven’t mentioned many of them) and check textbooks of research design, your search will again prove to be fruitless. In fact, we had to resort to that salvation of all serious academic researchers, Google. What we found, among all the hundreds of millions of Web pages, was only one mention of the term – in the article we are reviewing! So what does the term mean? Given our vast knowledge of statistics and research design, we feel safe in saying, “We don’t have the foggiest idea.” There are indeed many research designs, and we should know; we’ve written books about them (OK, so maybe only one book). Different designs depend on how the subjects were located, how (and if) they were followed up, whether or not the researchers had any control over who got what, and a host of other factors, but not whether the study was meant to test hypotheses or to generate them – that depends solely on whether the analyses were specified beforehand or not. Commun Oncol 2013;10:189-190 © 2013 Frontline Medical Communications DOI: 10.12788/j.cmonc.0037 Features Practical Biostatistics\",\"PeriodicalId\":72649,\"journal\":{\"name\":\"Community oncology\",\"volume\":\"10 1\",\"pages\":\"189-190\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Community oncology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.12788/J.CMONC.0037\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Community oncology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12788/J.CMONC.0037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
When is an answer not an answer
When your beloved authors were studying research and statistics, around the time that Methuselah was celebrating his first birthday, we thought we knew the difference between hypothesis testing and hypothesis generating. With the former, you begin with a question, design a study to answer it, carry it out, and then do some statistical mumbo-jumbo on the data to determine if you have reasonable evidence to answer the question. With the latter, usually done after you’ve answered the main questions, you don’t have any preconceived idea of what’s going on, so you analyze anything that moves. We know that’s not really kosher, because the probability of finding something just by chance (a Type I error) increases astronomically as you do more tests. So, in the hypothesis generating phase, you don’t come to any conclusions; you just say, “That’s an interesting finding. Now we’ll have to do a real study to see if our observation holds up.” Well, we thought we knew the difference, but something must have changed over the past few centuries when we weren’t paying too much attention. The reason for our puzzlement is an article by Hurvitz et al about the relative effectiveness of trastuzumab emtansine (T-DM1) compared with trastuzumab plus docetaxel (HT) in patients with metastatic breast cancer. First, a bit about the study itself. This was a phase 2, multicenter open label randomized controlled trial. “Phase 2” means it’s not yet ready for prime time, and “open label” means that nobody was blinded regarding who got what. (“Multicenter” means a great opportunity for the investigators to rack up frequent flier points.) There were 137 women with HER2-positive metastatic breast cancer or recurrent locally advanced breast cancer, randomly divided between the 2 groups. The primary endpoints were progressionfree survival (PFS) and safety, both assessed by the investigators. Key secondary endpoints were overall survival (OS), objective response rate (ORR), quality of life (QOL), and a handful of others. What they found was that the median PFS was 9.2 months with HT, compared with 14.2 months for T-DM1; and an ORR of 58.0% in the HT group and 64.4% in the T-DM1 group; both were statistically significant. However, “preliminary OS results were similar between treatment arms.” So, let’s begin looking at the study. It may help if you jotted down all of the abbreviations that were used; we listed 15 before we ran out of lead for our pencils, and we never even got to the ones from statistics we were familiar with. We can’t fault the authors for this; it appears to be an editorial policy to abbreviate everything and not provide a table of them to help readers. Ink must be a very precious commodity. But back to the study. The paper states that “This study had a hypothesis-generating statistical design.” If you go back over all of the papers we have written for this journal, looking for a definition of “hypothesisgenerating statistical design,” you will look in vain. If you think that we have been remiss in not discussing all research designs (actually, we have been, and haven’t mentioned many of them) and check textbooks of research design, your search will again prove to be fruitless. In fact, we had to resort to that salvation of all serious academic researchers, Google. What we found, among all the hundreds of millions of Web pages, was only one mention of the term – in the article we are reviewing! So what does the term mean? Given our vast knowledge of statistics and research design, we feel safe in saying, “We don’t have the foggiest idea.” There are indeed many research designs, and we should know; we’ve written books about them (OK, so maybe only one book). Different designs depend on how the subjects were located, how (and if) they were followed up, whether or not the researchers had any control over who got what, and a host of other factors, but not whether the study was meant to test hypotheses or to generate them – that depends solely on whether the analyses were specified beforehand or not. Commun Oncol 2013;10:189-190 © 2013 Frontline Medical Communications DOI: 10.12788/j.cmonc.0037 Features Practical Biostatistics