The statistical and pragmatic tension between explanation and prediction is well recognized in psychology. Yarkoni and Westfall (2017) suggested focusing more on predictions, which will ultimately produce better calibrated interpretations. Variable selection methods, such as regularization, are strongly recommended because it will help construct interpretable models while optimizing prediction accuracy. However, when the data contain a nonignorable proportion of missingness, variable selection and model building via penalized regression methods are not straightforward. What further complicates the analysis protocol is when the model performance is evaluated on both prediction accuracy and fairness, the latter is of increasing attention when the predictive outcome has societal implications. This study explored two methods for variable selection with incomplete data: the bootstrap imputation-stability selection (BI-SS) method and the stacked elastic net (SENET) method. Both methods work with multiply imputed data sets but in different ways. BI-SS implements variable selection separately on each imputed bootstrap data set and aggregates the results via stability selection, while SENET stacks all imputed data sets and fits a single pooled model. We thoroughly evaluated their performance using a suite of metrics (including area under the curve, F1 score, and fairness criteria) via three increasingly complex simulation studies. Results reveal that while BI-SS and SENET methods perform almost equally well in settings with generalized linear models, only BI-SS fares well with nested data design because of high computation demand in fitting the regularized generalized linear mixed effects models. Finally, we demonstrated both methods with an example using rich electronic health data. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
在心理学中,解释和预测之间的统计学和语用学张力是公认的。Yarkoni和Westfall(2017)建议更多地关注预测,这最终将产生更好的校准解释。变量选择方法,如正则化,是强烈推荐的,因为它将有助于构建可解释的模型,同时优化预测精度。然而,当数据包含不可忽略的缺失比例时,通过惩罚回归方法进行变量选择和模型构建并不简单。使分析方案进一步复杂化的是,当模型性能同时评估预测准确性和公平性时,后者在预测结果具有社会影响时越来越受到关注。本文探讨了两种不完全数据下的变量选择方法:自举法(BI-SS)和叠弹性网法(SENET)。这两种方法都适用于多输入数据集,但方式不同。BI-SS分别对每个输入的自举数据集进行变量选择,并通过稳定性选择汇总结果,而SENET将所有输入的数据集叠加并拟合单个池模型。我们通过三个日益复杂的模拟研究,使用一系列指标(包括曲线下面积、F1分数和公平性标准)彻底评估了他们的表现。结果表明,虽然BI-SS和SENET方法在广义线性模型设置中表现几乎相同,但只有BI-SS方法在嵌套数据设计中表现良好,因为在拟合正则化广义线性混合效应模型时需要大量的计算量。最后,我们通过一个使用丰富电子健康数据的示例演示了这两种方法。(PsycInfo Database Record (c) 2025 APA,版权所有)。
{"title":"Constructing a binary prediction model with incomplete data: Variable selection to balance fairness and precision.","authors":"He Ren, Chun Wang, Gongjun Xu, David J Weiss","doi":"10.1037/met0000786","DOIUrl":"10.1037/met0000786","url":null,"abstract":"<p><p>The statistical and pragmatic tension between explanation and prediction is well recognized in psychology. Yarkoni and Westfall (2017) suggested focusing more on predictions, which will ultimately produce better calibrated interpretations. Variable selection methods, such as regularization, are strongly recommended because it will help construct interpretable models while optimizing prediction accuracy. However, when the data contain a nonignorable proportion of missingness, variable selection and model building via penalized regression methods are not straightforward. What further complicates the analysis protocol is when the model performance is evaluated on both prediction accuracy and fairness, the latter is of increasing attention when the predictive outcome has societal implications. This study explored two methods for variable selection with incomplete data: the bootstrap imputation-stability selection (BI-SS) method and the stacked elastic net (SENET) method. Both methods work with multiply imputed data sets but in different ways. BI-SS implements variable selection separately on each imputed bootstrap data set and aggregates the results via stability selection, while SENET stacks all imputed data sets and fits a single pooled model. We thoroughly evaluated their performance using a suite of metrics (including area under the curve, F1 score, and fairness criteria) via three increasingly complex simulation studies. Results reveal that while BI-SS and SENET methods perform almost equally well in settings with generalized linear models, only BI-SS fares well with nested data design because of high computation demand in fitting the regularized generalized linear mixed effects models. Finally, we demonstrated both methods with an example using rich electronic health data. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":""},"PeriodicalIF":7.8,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12356495/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144856196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In a time when the alarms of research replicability are sounding louder than ever, mapping out studies with statistical and inferential integrity is of paramount importance. Indeed, funding agencies almost always require grant applicants to present compelling a priori power analyses to justify proposed sample sizes, as a critical part of the information considered collectively to ensure a sound investment. Unfortunately, even researchers' most sincere attempts at sample size planning are fraught with the fundamental challenge of setting numerical values not just for the focal parameters for which statistical tests are planned, but for each of the model's other, more peripheral or contextual parameters as well. As we plainly demonstrate, regarding the latter parameters, even in very simple models, any slight deviation in well-intentioned numerical guesses can undermine power for the assessment of the more focal parameters that are of key theoretical interest. Toward remedying this all-too-common but seemingly underestimated problem in power analysis, we adopt a hope-for-the-best-but-plan-for-the-worst mindset and present new methods that attempt to (a) restore appropriate conservatism and robustness, and in turn credibility, to the sample size planning process, and (b) greatly simplify that process. Derivations and suggestions for practice are presented using the framework of measured variable path analysis models as they subsume many of the types of models (e.g., multiple linear regression, analysis of variance) for which sample size planning is of interest. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
在一个研究可复制性的警报比以往任何时候都响亮的时代,绘制具有统计和推理完整性的研究是至关重要的。事实上,资助机构几乎总是要求赠款申请人提出令人信服的先验能力分析,以证明拟议的样本量是合理的,这是集体考虑的信息的关键部分,以确保合理的投资。不幸的是,即使是研究人员在样本量规划方面最真诚的尝试也充满了基本的挑战,即不仅要为计划进行统计测试的重点参数设置数值,还要为模型的其他更外围或背景参数设置数值。正如我们清楚地表明的那样,对于后一种参数,即使在非常简单的模型中,善意的数值猜测的任何轻微偏差都可能破坏对具有关键理论兴趣的更重要参数的评估能力。为了纠正这个在功率分析中太常见但似乎被低估的问题,我们采用了一种抱最好的希望但做最坏的计划的心态,并提出了新的方法,试图(a)恢复适当的保守性和稳健性,以及反过来的可信度,以样本量计划过程,并且(b)大大简化该过程。使用测量变量路径分析模型的框架提出了推导和实践建议,因为它们包含了许多类型的模型(例如,多元线性回归,方差分析),其中样本量规划是感兴趣的。(PsycInfo Database Record (c) 2025 APA,版权所有)。
{"title":"nmax and the quest to restore caution, integrity, and practicality to the sample size planning process.","authors":"Gregory R Hancock,Yi Feng","doi":"10.1037/met0000776","DOIUrl":"https://doi.org/10.1037/met0000776","url":null,"abstract":"In a time when the alarms of research replicability are sounding louder than ever, mapping out studies with statistical and inferential integrity is of paramount importance. Indeed, funding agencies almost always require grant applicants to present compelling a priori power analyses to justify proposed sample sizes, as a critical part of the information considered collectively to ensure a sound investment. Unfortunately, even researchers' most sincere attempts at sample size planning are fraught with the fundamental challenge of setting numerical values not just for the focal parameters for which statistical tests are planned, but for each of the model's other, more peripheral or contextual parameters as well. As we plainly demonstrate, regarding the latter parameters, even in very simple models, any slight deviation in well-intentioned numerical guesses can undermine power for the assessment of the more focal parameters that are of key theoretical interest. Toward remedying this all-too-common but seemingly underestimated problem in power analysis, we adopt a hope-for-the-best-but-plan-for-the-worst mindset and present new methods that attempt to (a) restore appropriate conservatism and robustness, and in turn credibility, to the sample size planning process, and (b) greatly simplify that process. Derivations and suggestions for practice are presented using the framework of measured variable path analysis models as they subsume many of the types of models (e.g., multiple linear regression, analysis of variance) for which sample size planning is of interest. (PsycInfo Database Record (c) 2025 APA, all rights reserved).","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":"5 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144820002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Supplemental Material for Constructing a Binary Prediction Model With Incomplete Data: Variable Selection to Balance Fairness and Precision","authors":"","doi":"10.1037/met0000786.supp","DOIUrl":"https://doi.org/10.1037/met0000786.supp","url":null,"abstract":"","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":"16 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144899812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Benjamin Brindle, Thomas Derrick Hull, Matteo Malgaroli, Nicolas Charon
We introduce varying and irregular sampling time-series analysis (VISTA), a clustering approach for multivariate and irregularly sampled time series based on a parametric state-space mixture model. VISTA is specifically designed for the unsupervised identification of groups in data sets originating from healthcare and psychology where such sampling issues are commonplace. Our approach adapts linear Gaussian state-space models (LGSSMs) to provide a flexible parametric framework for fitting a wide range of time series dynamics. The clustering approach itself is based on the assumption that the population can be represented as a mixture of a fixed number of LGSSMs. VISTA's model formulation allows for an explicit derivation of the log-likelihood function, from which we develop an expectation-maximization scheme for fitting model parameters to the observed data samples. Our algorithmic implementation is designed to handle populations of multivariate time series that can exhibit large changes in sampling rate as well as irregular sampling. We evaluate the versatility and accuracy of our approach on simulated and real-world data sets, including demographic trends, wearable sensor data, epidemiological time series, and ecological momentary assessments. Our results indicate that VISTA outperforms most comparable standard times series clustering methods. We provide an open-source implementation of VISTA in Python. (PsycInfo Database Record (c) 2025 APA, all rights reserved).
我们介绍了变采样和不规则采样时间序列分析(VISTA),这是一种基于参数状态空间混合模型的多变量和不规则采样时间序列聚类方法。VISTA是专门为来自医疗保健和心理学的数据集中的群体的无监督识别而设计的,这些抽样问题是司空见惯的。我们的方法采用线性高斯状态空间模型(lgssm)来提供一个灵活的参数框架来拟合大范围的时间序列动力学。聚类方法本身是基于这样的假设,即总体可以表示为固定数量的lgssm的混合物。VISTA的模型公式允许对数似然函数的显式推导,从中我们开发了一个期望最大化方案,用于将模型参数拟合到观察到的数据样本。我们的算法实现旨在处理多元时间序列的总体,这些总体可以表现出采样率的大变化以及不规则采样。我们在模拟和现实世界的数据集上评估了我们方法的通用性和准确性,包括人口趋势、可穿戴传感器数据、流行病学时间序列和生态瞬间评估。我们的结果表明,VISTA优于大多数可比较的标准时间序列聚类方法。我们在Python中提供了VISTA的开源实现。(PsycInfo Database Record (c) 2025 APA,版权所有)。
{"title":"VISTA-SSM: Varying and irregular sampling time-series analysis via state-space models.","authors":"Benjamin Brindle, Thomas Derrick Hull, Matteo Malgaroli, Nicolas Charon","doi":"10.1037/met0000785","DOIUrl":"10.1037/met0000785","url":null,"abstract":"<p><p>We introduce varying and irregular sampling time-series analysis (VISTA), a clustering approach for multivariate and irregularly sampled time series based on a parametric state-space mixture model. VISTA is specifically designed for the unsupervised identification of groups in data sets originating from healthcare and psychology where such sampling issues are commonplace. Our approach adapts linear Gaussian state-space models (LGSSMs) to provide a flexible parametric framework for fitting a wide range of time series dynamics. The clustering approach itself is based on the assumption that the population can be represented as a mixture of a fixed number of LGSSMs. VISTA's model formulation allows for an explicit derivation of the log-likelihood function, from which we develop an expectation-maximization scheme for fitting model parameters to the observed data samples. Our algorithmic implementation is designed to handle populations of multivariate time series that can exhibit large changes in sampling rate as well as irregular sampling. We evaluate the versatility and accuracy of our approach on simulated and real-world data sets, including demographic trends, wearable sensor data, epidemiological time series, and ecological momentary assessments. Our results indicate that VISTA outperforms most comparable standard times series clustering methods. We provide an open-source implementation of VISTA in Python. (PsycInfo Database Record (c) 2025 APA, all rights reserved).</p>","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":" ","pages":""},"PeriodicalIF":7.8,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12344451/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144822384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The practice of aggregating lower level predictors in clustered data: A reflection on reflective variables.","authors":"Timothy R. Konold, Elizabeth A. Sanders","doi":"10.1037/met0000792","DOIUrl":"https://doi.org/10.1037/met0000792","url":null,"abstract":"","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":"14 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144792590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring how many categories are needed to model ordinal intensive longitudinal data as continuous with dynamic structural equation models.","authors":"Daniel McNeish, Andrea Savord","doi":"10.1037/met0000784","DOIUrl":"https://doi.org/10.1037/met0000784","url":null,"abstract":"","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":"16 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144792490","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Supplemental Material for Integration of Latent Space and Confirmatory Factor Analysis to Explain Unexplained Person–Item Interactions","authors":"","doi":"10.1037/met0000791.supp","DOIUrl":"https://doi.org/10.1037/met0000791.supp","url":null,"abstract":"","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":"10 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144792487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Supplemental Material for VISTA-SSM: Varying and Irregular Sampling Time-Series Analysis via State-Space Models","authors":"","doi":"10.1037/met0000785.supp","DOIUrl":"https://doi.org/10.1037/met0000785.supp","url":null,"abstract":"","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":"2 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144792595","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Supplemental Material for Exploring How Many Categories Are Needed to Model Ordinal Intensive Longitudinal Data as Continuous With Dynamic Structural Equation Models","authors":"","doi":"10.1037/met0000784.supp","DOIUrl":"https://doi.org/10.1037/met0000784.supp","url":null,"abstract":"","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":"6 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144792486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Supplemental Material for The Practice of Aggregating Lower Level Predictors in Clustered Data: A Reflection on Reflective Variables","authors":"","doi":"10.1037/met0000792.supp","DOIUrl":"https://doi.org/10.1037/met0000792.supp","url":null,"abstract":"","PeriodicalId":20782,"journal":{"name":"Psychological methods","volume":"1 1","pages":""},"PeriodicalIF":7.0,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144792488","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}