从逻辑回归的嵌套序列中得出站得住脚的推论：困惑者指南

IF 2.6 Q1 EDUCATION & EDUCATIONAL RESEARCH Large-Scale Assessments in Education Pub Date : 2021-07-21 DOI:10.1186/s40536-021-00111-7

Gulsah Gurkan, Yoav Benjamini, Henry Braun

{"title":"从逻辑回归的嵌套序列中得出站得住脚的推论：困惑者指南","authors":"Gulsah Gurkan, Yoav Benjamini, Henry Braun","doi":"10.1186/s40536-021-00111-7","DOIUrl":null,"url":null,"abstract":"<p>Employing nested sequences of models is a common practice when exploring the extent to which one set of variables mediates the impact of another set. Such an analysis in the context of logistic regression models confronts two challenges: (i) direct comparisons of coefficients across models are generally biased due to the changes in scale that accompany the changes in the set of explanatory variables, (ii) conducting a large number of tests induces a problem of multiplicity that can lead to spurious findings of significance if not heeded. This article aims to illustrate a practical strategy for conducting analyses in the face of these challenges. The challenges—and how to address them—are illustrated using a subset of the findings reported by Braun (Large-scale Assess Educ 6(4):1–52, 2018. 10.1186/s40536-018-0058-x), drawn from the Programme for the International Assessment of Adult Competencies (PIAAC), an international, large-scale assessment of adults. For each country in the dataset, a nested pair of logistic regression models was fit in order to investigate the role of Educational Attainment and Cognitive Skills in mediating the impact of family background and demographic characteristics on the location of an individual’s annual income in the national income distribution. A modified version of the Karlson–Holm–Breen (KHB) method was employed to obtain an unbiased estimate of the true differences in the coefficients between nested logistic models. In order to address the issue of multiplicity, a recent generalization of the Benjamini–Hochberg (BH) False Discovery Rate (FDR)-controlling procedure to hierarchically structured hypotheses was employed and compared to two conventional methods. The differences between the changes in coefficients calculated conventionally and with the KHB adjustment varied from negligible to very substantial. When combined with the actual magnitudes of the coefficients, we concluded that the more proximal factors indeed act as strong mediators for the background factors, but less so for Age, and hardly at all for Gender. With respect to multiplicity, applying the FDR-controlling procedure yielded results very similar to those obtained by applying a standard per-comparison procedure, but quite a few more discoveries in comparison to the Bonferroni procedure. The KHB methodology illustrated here can be applied wherever there is interest in comparing nested logistic regressions. Modifications to account for probability sampling are practicable. The categorization of variables and the order of entry should be determined by substantive considerations. On the other hand, the BH procedure is perfectly general and can be implemented to address multiplicity issues in a broad range of settings.</p>","PeriodicalId":37009,"journal":{"name":"Large-Scale Assessments in Education","volume":"16 1","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2021-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Defensible inferences from a nested sequence of logistic regressions: a guide for the perplexed\",\"authors\":\"Gulsah Gurkan, Yoav Benjamini, Henry Braun\",\"doi\":\"10.1186/s40536-021-00111-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Employing nested sequences of models is a common practice when exploring the extent to which one set of variables mediates the impact of another set. Such an analysis in the context of logistic regression models confronts two challenges: (i) direct comparisons of coefficients across models are generally biased due to the changes in scale that accompany the changes in the set of explanatory variables, (ii) conducting a large number of tests induces a problem of multiplicity that can lead to spurious findings of significance if not heeded. This article aims to illustrate a practical strategy for conducting analyses in the face of these challenges. The challenges—and how to address them—are illustrated using a subset of the findings reported by Braun (Large-scale Assess Educ 6(4):1–52, 2018. 10.1186/s40536-018-0058-x), drawn from the Programme for the International Assessment of Adult Competencies (PIAAC), an international, large-scale assessment of adults. For each country in the dataset, a nested pair of logistic regression models was fit in order to investigate the role of Educational Attainment and Cognitive Skills in mediating the impact of family background and demographic characteristics on the location of an individual’s annual income in the national income distribution. A modified version of the Karlson–Holm–Breen (KHB) method was employed to obtain an unbiased estimate of the true differences in the coefficients between nested logistic models. In order to address the issue of multiplicity, a recent generalization of the Benjamini–Hochberg (BH) False Discovery Rate (FDR)-controlling procedure to hierarchically structured hypotheses was employed and compared to two conventional methods. The differences between the changes in coefficients calculated conventionally and with the KHB adjustment varied from negligible to very substantial. When combined with the actual magnitudes of the coefficients, we concluded that the more proximal factors indeed act as strong mediators for the background factors, but less so for Age, and hardly at all for Gender. With respect to multiplicity, applying the FDR-controlling procedure yielded results very similar to those obtained by applying a standard per-comparison procedure, but quite a few more discoveries in comparison to the Bonferroni procedure. The KHB methodology illustrated here can be applied wherever there is interest in comparing nested logistic regressions. Modifications to account for probability sampling are practicable. The categorization of variables and the order of entry should be determined by substantive considerations. On the other hand, the BH procedure is perfectly general and can be implemented to address multiplicity issues in a broad range of settings.</p>\",\"PeriodicalId\":37009,\"journal\":{\"name\":\"Large-Scale Assessments in Education\",\"volume\":\"16 1\",\"pages\":\"\"},\"PeriodicalIF\":2.6000,\"publicationDate\":\"2021-07-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Large-Scale Assessments in Education\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1186/s40536-021-00111-7\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"EDUCATION & EDUCATIONAL RESEARCH\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Large-Scale Assessments in Education","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s40536-021-00111-7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 0

摘要

在探索一组变量在多大程度上介导另一组变量的影响时，采用嵌套序列模型是一种常见的做法。在逻辑回归模型中进行这种分析面临两个挑战：(i) 直接比较不同模型的系数通常会有偏差，这是因为随着解释变量组的变化，规模也会发生变化；(ii) 进行大量检验会引起多重性问题，如果不加以注意，可能会导致虚假的显著性结论。本文旨在说明面对这些挑战进行分析的实用策略。布劳恩（Large-scale Assess Educ 6(4):1-52, 2018.10.1186/s40536-018-0058-x），该数据来自国际成人能力评估计划（PIAAC），这是一项国际性的大规模成人评估。针对数据集中的每个国家，我们拟合了一对嵌套的逻辑回归模型，以研究教育程度和认知技能在家庭背景和人口特征对个人年收入在国民收入分配中的位置的影响方面所起的中介作用。为了对嵌套逻辑模型之间系数的真实差异进行无偏估计，我们采用了卡尔森-霍尔姆-布林（Karlson-Holm-Breen，KHB）方法的改进版。为了解决多重性问题，我们采用了本杰明-霍奇伯格（Benjamini-Hochberg，BH）假发现率（FDR）控制程序的最新推广方法，并与两种传统方法进行了比较。传统方法计算出的系数变化与 KHB 调整方法计算出的系数变化之间的差异从微不足道到非常显著不等。结合系数的实际大小，我们得出结论：较近的因素确实对背景因素起着强有力的中介作用，但对年龄的作用较小，对性别的作用几乎没有。在多重性方面，应用 FDR 控制程序得出的结果与应用标准每比较程序得出的结果非常相似，但与 Bonferroni 程序相比，多了很多发现。这里说明的 KHB 方法可用于任何需要比较嵌套逻辑回归的地方。对概率抽样进行修改也是可行的。变量的分类和输入顺序应由实质性因素决定。另一方面，BH 程序是完全通用的，可以在广泛的环境中用于解决多重性问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文

微信好友朋友圈 QQ好友复制链接

本刊更多论文

Defensible inferences from a nested sequence of logistic regressions: a guide for the perplexed

Employing nested sequences of models is a common practice when exploring the extent to which one set of variables mediates the impact of another set. Such an analysis in the context of logistic regression models confronts two challenges: (i) direct comparisons of coefficients across models are generally biased due to the changes in scale that accompany the changes in the set of explanatory variables, (ii) conducting a large number of tests induces a problem of multiplicity that can lead to spurious findings of significance if not heeded. This article aims to illustrate a practical strategy for conducting analyses in the face of these challenges. The challenges—and how to address them—are illustrated using a subset of the findings reported by Braun (Large-scale Assess Educ 6(4):1–52, 2018. 10.1186/s40536-018-0058-x), drawn from the Programme for the International Assessment of Adult Competencies (PIAAC), an international, large-scale assessment of adults. For each country in the dataset, a nested pair of logistic regression models was fit in order to investigate the role of Educational Attainment and Cognitive Skills in mediating the impact of family background and demographic characteristics on the location of an individual’s annual income in the national income distribution. A modified version of the Karlson–Holm–Breen (KHB) method was employed to obtain an unbiased estimate of the true differences in the coefficients between nested logistic models. In order to address the issue of multiplicity, a recent generalization of the Benjamini–Hochberg (BH) False Discovery Rate (FDR)-controlling procedure to hierarchically structured hypotheses was employed and compared to two conventional methods. The differences between the changes in coefficients calculated conventionally and with the KHB adjustment varied from negligible to very substantial. When combined with the actual magnitudes of the coefficients, we concluded that the more proximal factors indeed act as strong mediators for the background factors, but less so for Age, and hardly at all for Gender. With respect to multiplicity, applying the FDR-controlling procedure yielded results very similar to those obtained by applying a standard per-comparison procedure, but quite a few more discoveries in comparison to the Bonferroni procedure. The KHB methodology illustrated here can be applied wherever there is interest in comparing nested logistic regressions. Modifications to account for probability sampling are practicable. The categorization of variables and the order of entry should be determined by substantive considerations. On the other hand, the BH procedure is perfectly general and can be implemented to address multiplicity issues in a broad range of settings.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊