Yisen Jin, Aaron J Molstad, Ander Wilson, Joseph Antonelli
Exposure to environmental pollutants during the gestational period can significantly impact infant health outcomes, such as birth weight and neurological development. Identifying critical windows of susceptibility, which are specific periods during pregnancy when exposure has the most profound effects, is essential for developing targeted interventions. Distributed lag models (DLMs) are widely used in environmental epidemiology to analyze the temporal patterns of exposure and their impact on health outcomes. However, traditional DLMs focus on modeling the conditional mean, which may fail to capture heterogeneity in the relationship between predictors and the outcome. Moreover, when modeling the distribution of health outcomes like gestational birth weight, it is the extreme quantiles that are of most clinical relevance. We introduce 2 new quantile distributed lag model (QDLM) estimators designed to address the limitations of existing methods by leveraging smoothness and shape constraints, such as unimodality and concavity, to enhance interpretability and efficiency. We apply our QDLM estimators to the Colorado birth cohort data, demonstrating their effectiveness in identifying critical windows of susceptibility and informing public health interventions.
{"title":"Smooth and shape-constrained quantile distributed lag models.","authors":"Yisen Jin, Aaron J Molstad, Ander Wilson, Joseph Antonelli","doi":"10.1093/biomtc/ujaf101","DOIUrl":"10.1093/biomtc/ujaf101","url":null,"abstract":"<p><p>Exposure to environmental pollutants during the gestational period can significantly impact infant health outcomes, such as birth weight and neurological development. Identifying critical windows of susceptibility, which are specific periods during pregnancy when exposure has the most profound effects, is essential for developing targeted interventions. Distributed lag models (DLMs) are widely used in environmental epidemiology to analyze the temporal patterns of exposure and their impact on health outcomes. However, traditional DLMs focus on modeling the conditional mean, which may fail to capture heterogeneity in the relationship between predictors and the outcome. Moreover, when modeling the distribution of health outcomes like gestational birth weight, it is the extreme quantiles that are of most clinical relevance. We introduce 2 new quantile distributed lag model (QDLM) estimators designed to address the limitations of existing methods by leveraging smoothness and shape constraints, such as unimodality and concavity, to enhance interpretability and efficiency. We apply our QDLM estimators to the Colorado birth cohort data, demonstrating their effectiveness in identifying critical windows of susceptibility and informing public health interventions.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12381565/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144941091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
We consider a common nonparametric regression setting, where the data consist of a response variable Y, some easily obtainable covariates $mathbf {X}$, and a set of costly covariates $mathbf {Z}$. Before establishing predictive models for Y, a natural question arises: Is it worthwhile to include $mathbf {Z}$ as predictors, given the additional cost of collecting data on $mathbf {Z}$ for both training the models and predicting Y for future individuals? Therefore, we aim to conduct preliminary investigations to infer importance of $mathbf {Z}$ in predicting Y in the presence of $mathbf {X}$. To achieve this goal, we propose a nonparametric variable importance measure for $mathbf {Z}$. It is defined as a parameter that aggregates maximum potential contributions of $mathbf {Z}$ in single or multiple predictive models, with contributions quantified by general loss functions. Considering two-phase data that provide a large number of observations for $(Y,mathbf {X})$ with the expensive $mathbf {Z}$ measured only in a small subsample, we develop a novel approach to infer the proposed importance measure, accommodating missingness of $mathbf {Z}$ in the sample by substituting functions of $(Y,mathbf {X})$ for each individual's contribution to the predictive loss of models involving $mathbf {Z}$. Our approach attains unified and efficient inference regardless of whether $mathbf {Z}$ makes zero or positive contribution to predicting Y, a desirable yet surprising property owing to data incompleteness. As intermediate steps of our theoretical development, we establish novel results in two relevant research areas, semi-supervised inference and two-phase nonparametric estimation. Numerical results from both simulated and real data demonstrate superior performance of our approach.
{"title":"Valid and efficient inference for nonparametric variable importance in two-phase studies.","authors":"Guorong Dai, Raymond J Carroll, Jinbo Chen","doi":"10.1093/biomtc/ujaf095","DOIUrl":"10.1093/biomtc/ujaf095","url":null,"abstract":"<p><p>We consider a common nonparametric regression setting, where the data consist of a response variable Y, some easily obtainable covariates $mathbf {X}$, and a set of costly covariates $mathbf {Z}$. Before establishing predictive models for Y, a natural question arises: Is it worthwhile to include $mathbf {Z}$ as predictors, given the additional cost of collecting data on $mathbf {Z}$ for both training the models and predicting Y for future individuals? Therefore, we aim to conduct preliminary investigations to infer importance of $mathbf {Z}$ in predicting Y in the presence of $mathbf {X}$. To achieve this goal, we propose a nonparametric variable importance measure for $mathbf {Z}$. It is defined as a parameter that aggregates maximum potential contributions of $mathbf {Z}$ in single or multiple predictive models, with contributions quantified by general loss functions. Considering two-phase data that provide a large number of observations for $(Y,mathbf {X})$ with the expensive $mathbf {Z}$ measured only in a small subsample, we develop a novel approach to infer the proposed importance measure, accommodating missingness of $mathbf {Z}$ in the sample by substituting functions of $(Y,mathbf {X})$ for each individual's contribution to the predictive loss of models involving $mathbf {Z}$. Our approach attains unified and efficient inference regardless of whether $mathbf {Z}$ makes zero or positive contribution to predicting Y, a desirable yet surprising property owing to data incompleteness. As intermediate steps of our theoretical development, we establish novel results in two relevant research areas, semi-supervised inference and two-phase nonparametric estimation. Numerical results from both simulated and real data demonstrate superior performance of our approach.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12312401/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144752245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Directed acyclic graphical models with additive noises are essential in nonlinear causal discovery and have numerous applications in various domains, such as social science and systems biology. Most such models further assume that structural causal functions are additive to ensure causal identifiability and computational feasibility, which may be too restrictive in the presence of causal interactions. Some methods consider general nonlinear causal functions represented by, for example, Gaussian processes and neural networks, to accommodate interactions. However, they are either computationally intensive or lack interpretability. We propose a highly interpretable and computationally feasible approach using trees to incorporate interactions in nonlinear causal discovery, termed tree-based additive noise models. The nature of the tree construction leads to piecewise constant causal functions, making existing causal identifiability results of additive noise models with continuous and smooth causal functions inapplicable. Therefore, we provide new conditions under which the proposed model is identifiable. We develop a recursive algorithm for source node identification and a score-based ordering search algorithm. Through extensive simulations, we demonstrate the utility of the proposed model and algorithms benchmarking against existing additive noise models, especially when there are strong causal interactions. Our method is applied to infer a protein-protein interaction network for breast cancer, where proteins may form protein complexes to perform their functions.
{"title":"Tree-based additive noise directed acyclic graphical models for nonlinear causal discovery with interactions.","authors":"Fangting Zhou, Kejun He, Yang Ni","doi":"10.1093/biomtc/ujaf089","DOIUrl":"10.1093/biomtc/ujaf089","url":null,"abstract":"<p><p>Directed acyclic graphical models with additive noises are essential in nonlinear causal discovery and have numerous applications in various domains, such as social science and systems biology. Most such models further assume that structural causal functions are additive to ensure causal identifiability and computational feasibility, which may be too restrictive in the presence of causal interactions. Some methods consider general nonlinear causal functions represented by, for example, Gaussian processes and neural networks, to accommodate interactions. However, they are either computationally intensive or lack interpretability. We propose a highly interpretable and computationally feasible approach using trees to incorporate interactions in nonlinear causal discovery, termed tree-based additive noise models. The nature of the tree construction leads to piecewise constant causal functions, making existing causal identifiability results of additive noise models with continuous and smooth causal functions inapplicable. Therefore, we provide new conditions under which the proposed model is identifiable. We develop a recursive algorithm for source node identification and a score-based ordering search algorithm. Through extensive simulations, we demonstrate the utility of the proposed model and algorithms benchmarking against existing additive noise models, especially when there are strong causal interactions. Our method is applied to infer a protein-protein interaction network for breast cancer, where proteins may form protein complexes to perform their functions.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288665/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144706199","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
When incorporating historical control data into the analysis of current randomized controlled trial data, it is critical to account for differences between the datasets. When the cause of difference is an unmeasured factor and adjustment for only observed covariates is insufficient, it is desirable to use a dynamic borrowing method that reduces the impact of heterogeneous historical controls. We propose a nonparametric Bayesian approach that addresses between-trial heterogeneity and allows borrowing historical controls homogeneous with the current control. Additionally, to emphasize conflict resolution between historical controls and the current control, we introduce a method based on the dependent Dirichlet process (DP) mixture. The proposed methods can be implemented using the same procedure, regardless of whether the outcome data comprise aggregated study-level data or individual participant data. We also develop a novel index of similarity between the historical and current control data, based on the posterior distribution of the parameter of interest. We conduct a simulation study and analyze clinical trial examples to evaluate the performance of the proposed methods compared to existing methods. The proposed method, based on the dependent DP mixture, can accurately borrow from homogeneous historical controls while reducing the impact of heterogeneous historical controls compared to the typical DP mixture. The proposed methods outperform existing methods in scenarios with heterogeneous historical controls, in which the meta-analytic approach is ineffective.
{"title":"Nonparametric Bayesian approach for dynamic borrowing of historical control data.","authors":"Tomohiro Ohigashi, Kazushi Maruo, Takashi Sozu, Masahiko Gosho","doi":"10.1093/biomtc/ujaf118","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf118","url":null,"abstract":"<p><p>When incorporating historical control data into the analysis of current randomized controlled trial data, it is critical to account for differences between the datasets. When the cause of difference is an unmeasured factor and adjustment for only observed covariates is insufficient, it is desirable to use a dynamic borrowing method that reduces the impact of heterogeneous historical controls. We propose a nonparametric Bayesian approach that addresses between-trial heterogeneity and allows borrowing historical controls homogeneous with the current control. Additionally, to emphasize conflict resolution between historical controls and the current control, we introduce a method based on the dependent Dirichlet process (DP) mixture. The proposed methods can be implemented using the same procedure, regardless of whether the outcome data comprise aggregated study-level data or individual participant data. We also develop a novel index of similarity between the historical and current control data, based on the posterior distribution of the parameter of interest. We conduct a simulation study and analyze clinical trial examples to evaluate the performance of the proposed methods compared to existing methods. The proposed method, based on the dependent DP mixture, can accurately borrow from homogeneous historical controls while reducing the impact of heterogeneous historical controls compared to the typical DP mixture. The proposed methods outperform existing methods in scenarios with heterogeneous historical controls, in which the meta-analytic approach is ineffective.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144941173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Despite the importance of age-specific fertility for ecology and evolution, the methods for modeling and inference have proven considerably limited. However, other disciplines have long focused on exploring and developing a vast number of models. Here, I provide an overview of the different models proposed since the 1940s by formal demographers, statisticians, and social scientists, most of which are unknown to the ecological and evolutionary communities. I describe how these fall into 2 main categories, namely polynomials and those based on probability density functions. I discuss their merits in terms of their overall behavior and how well they represent the different stages of fertility. Despite many alternative models, inference on age-specific fertility has usually been limited to simple least squares. Although this might be sufficient for human data, I hope to demonstrate that inference requires more sophisticated approaches for ecological and evolutionary datasets. To illustrate how inference and model choice can be achieved on different types of typical ecological and evolutionary data, I present the new R package Bayesian Fertility Trajectory Analysis, which I apply to published aggregated data for lions and baboons. I then conduct a simulation study to test its performance on individual-level data. I show that appropriate inference and model selection can be achieved even when a small number of parents are followed.
{"title":"Inference on age-specific fertility in ecology and evolution. Learning from other disciplines and improving the state of the art.","authors":"Fernando Colchero","doi":"10.1093/biomtc/ujaf081","DOIUrl":"10.1093/biomtc/ujaf081","url":null,"abstract":"<p><p>Despite the importance of age-specific fertility for ecology and evolution, the methods for modeling and inference have proven considerably limited. However, other disciplines have long focused on exploring and developing a vast number of models. Here, I provide an overview of the different models proposed since the 1940s by formal demographers, statisticians, and social scientists, most of which are unknown to the ecological and evolutionary communities. I describe how these fall into 2 main categories, namely polynomials and those based on probability density functions. I discuss their merits in terms of their overall behavior and how well they represent the different stages of fertility. Despite many alternative models, inference on age-specific fertility has usually been limited to simple least squares. Although this might be sufficient for human data, I hope to demonstrate that inference requires more sophisticated approaches for ecological and evolutionary datasets. To illustrate how inference and model choice can be achieved on different types of typical ecological and evolutionary data, I present the new R package Bayesian Fertility Trajectory Analysis, which I apply to published aggregated data for lions and baboons. I then conduct a simulation study to test its performance on individual-level data. I show that appropriate inference and model selection can be achieved even when a small number of parents are followed.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144706183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Saijun Zhao, Peter F Thall, Ying Yuan, Juhee Lee, Pavlos Msaouel, Yong Zang
A new family of precision Bayesian dose optimization designs, PGen I-II, based on early efficacy, early toxicity, and long-term time to treatment failure is proposed. A PGen I-II design refines a Gen I-II design by accounting for patient heterogeneity characterized by subgroups that may be defined by prognostic levels, disease subtypes, or biomarker categories. The design makes subgroup-specific decisions, which may be to drop an unacceptably toxic or inefficacious dose, randomize patients among acceptable doses, or identify a best dose in terms of treatment success defined in terms of time to failure over long-term follow-up. A piecewise exponential distribution for failure time is assumed, including subgroup-specific effects of dose, response, and toxicity. Latent variables are used to adaptively cluster subgroups found to have similar dose-outcome distributions, with the model simplified to borrow strength between subgroups in the same cluster. Guidelines and user-friendly computer software for implementing the design are provided. A simulation study is reported that shows the PGen I-II design is superior to similarly structured designs that either assume patient homogeneity or conduct separate trials within subgroups.
{"title":"Precision generalized phase I-II designs.","authors":"Saijun Zhao, Peter F Thall, Ying Yuan, Juhee Lee, Pavlos Msaouel, Yong Zang","doi":"10.1093/biomtc/ujaf043","DOIUrl":"10.1093/biomtc/ujaf043","url":null,"abstract":"<p><p>A new family of precision Bayesian dose optimization designs, PGen I-II, based on early efficacy, early toxicity, and long-term time to treatment failure is proposed. A PGen I-II design refines a Gen I-II design by accounting for patient heterogeneity characterized by subgroups that may be defined by prognostic levels, disease subtypes, or biomarker categories. The design makes subgroup-specific decisions, which may be to drop an unacceptably toxic or inefficacious dose, randomize patients among acceptable doses, or identify a best dose in terms of treatment success defined in terms of time to failure over long-term follow-up. A piecewise exponential distribution for failure time is assumed, including subgroup-specific effects of dose, response, and toxicity. Latent variables are used to adaptively cluster subgroups found to have similar dose-outcome distributions, with the model simplified to borrow strength between subgroups in the same cluster. Guidelines and user-friendly computer software for implementing the design are provided. A simulation study is reported that shows the PGen I-II design is superior to similarly structured designs that either assume patient homogeneity or conduct separate trials within subgroups.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288667/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144706184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The predictive value of a covariate is often of interest in studies with a survival endpoint. A common situation is that there are some well established predictors and a potential valuable new marker. The challenge is how to judge the potentially added predictive value of this new marker. We propose to use the positive predictive value (PPV) curve based on a nonparametric scoring rule. The estimand of interest is viewed as a single transformation of the underlying data generating probability measure, which allows us to develop a robust nonparametric estimator of the PPV by first calculating the corresponding efficient influence function. We provide asymptotic results and illustrate the approach with numerical studies and with 2 cancer data studies.
{"title":"Doubly robust nonparametric estimators of the predictive value of covariates for survival data.","authors":"Torben Martinussen, Mark J van der Laan","doi":"10.1093/biomtc/ujaf084","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf084","url":null,"abstract":"<p><p>The predictive value of a covariate is often of interest in studies with a survival endpoint. A common situation is that there are some well established predictors and a potential valuable new marker. The challenge is how to judge the potentially added predictive value of this new marker. We propose to use the positive predictive value (PPV) curve based on a nonparametric scoring rule. The estimand of interest is viewed as a single transformation of the underlying data generating probability measure, which allows us to develop a robust nonparametric estimator of the PPV by first calculating the corresponding efficient influence function. We provide asymptotic results and illustrate the approach with numerical studies and with 2 cancer data studies.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.4,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144697476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to: Evaluating the effects of high-throughput structural neuroimaging predictors on whole-brain functional connectome outcomes via network-based matrix-on-vector regression.","authors":"","doi":"10.1093/biomtc/ujaf111","DOIUrl":"10.1093/biomtc/ujaf111","url":null,"abstract":"","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12341978/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144833843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
There exists a substantial body of literature that discusses regression analysis of interval-censored failure time data and also many methods have been proposed for handling the presence of a cured subgroup. However, only limited research exists on the problems incorporating change points, with or without a cured subgroup, which can occur in various contexts such as clinical trials where disease risks may shift dramatically when certain biological indicators exceed specific thresholds. To fill this gap, we consider a class of partly linear transformation models within the mixture cure model framework and propose a sieve maximum likelihood estimation approach using Bernstein polynomials and piecewise linear functions for inference. Additionally, we provide a data-driven adaptive procedure to identify the number and locations of change points and establish the asymptotic properties of the proposed method. Extensive simulation studies demonstrate the effectiveness and practical utility of the proposed methods, which are applied to the real data from a breast cancer study that motivated this work.
{"title":"Regression analysis of interval-censored failure time data with change points and a cured subgroup.","authors":"Yichen Lou, Mingyue Du, Xinyuan Song","doi":"10.1093/biomtc/ujaf100","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf100","url":null,"abstract":"<p><p>There exists a substantial body of literature that discusses regression analysis of interval-censored failure time data and also many methods have been proposed for handling the presence of a cured subgroup. However, only limited research exists on the problems incorporating change points, with or without a cured subgroup, which can occur in various contexts such as clinical trials where disease risks may shift dramatically when certain biological indicators exceed specific thresholds. To fill this gap, we consider a class of partly linear transformation models within the mixture cure model framework and propose a sieve maximum likelihood estimation approach using Bernstein polynomials and piecewise linear functions for inference. Additionally, we provide a data-driven adaptive procedure to identify the number and locations of change points and establish the asymptotic properties of the proposed method. Extensive simulation studies demonstrate the effectiveness and practical utility of the proposed methods, which are applied to the real data from a breast cancer study that motivated this work.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144833845","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, the use of wearable devices, for example, accelerometers, have become increasingly prevalent. Wearable devices enable more accurate real-time tracking of a subject's physical activity (PA) level, such as steps, number of activity bouts, or time in moderate-to-vigorous intensity PA (MVPA), which are important general health markers and can often be represented as counts. These intensive within-subject count data provided by wearable devices, for example, minutes in MVPA summarized per hour across days and even months, allow the possibility for modeling not only the mean PA level, but also the dispersion level for each subject. Especially in the context of daily PA, subjects' dispersion levels are potentially informative in reflecting their exercise patterns: some subjects might exhibit consistent PA across time and can be considered "less dispersed" subjects; while others might have a large amount of PA at a particular time point, while being sedentary for most of the day, and can be considered "more dispersed" subjects. Thus, we propose a negative binomial mixed effects location-scale model to model these intensive longitudinal PA counts and to account for the heterogeneity in both the mean and dispersion level across subjects. Further, to handle the issue of inflated numbers of zeros in the PA data, we also propose a hurdle/zero-inflated version which additionally includes the modeling of the probability of having $>$0 PA levels.
{"title":"Negative binomial mixed effects location-scale models for intensive longitudinal count-type physical activity data provided by wearable devices.","authors":"Qianheng Ma, Genevieve F Dunton, Donald Hedeker","doi":"10.1093/biomtc/ujaf099","DOIUrl":"10.1093/biomtc/ujaf099","url":null,"abstract":"<p><p>In recent years, the use of wearable devices, for example, accelerometers, have become increasingly prevalent. Wearable devices enable more accurate real-time tracking of a subject's physical activity (PA) level, such as steps, number of activity bouts, or time in moderate-to-vigorous intensity PA (MVPA), which are important general health markers and can often be represented as counts. These intensive within-subject count data provided by wearable devices, for example, minutes in MVPA summarized per hour across days and even months, allow the possibility for modeling not only the mean PA level, but also the dispersion level for each subject. Especially in the context of daily PA, subjects' dispersion levels are potentially informative in reflecting their exercise patterns: some subjects might exhibit consistent PA across time and can be considered \"less dispersed\" subjects; while others might have a large amount of PA at a particular time point, while being sedentary for most of the day, and can be considered \"more dispersed\" subjects. Thus, we propose a negative binomial mixed effects location-scale model to model these intensive longitudinal PA counts and to account for the heterogeneity in both the mean and dispersion level across subjects. Further, to handle the issue of inflated numbers of zeros in the PA data, we also propose a hurdle/zero-inflated version which additionally includes the modeling of the probability of having $>$0 PA levels.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 3","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12312402/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144752244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}