The generalized factor models have been widely employed for dimension reduction across various types of multivariate data, including binary choices, counts, and continuous observations. While determining the number of factors in such models has received significant scholarly attention, it remains an open challenge in the field. In this paper, we propose a cross-validation (CV) method based on entrywise splitting (ES), rather than sample splitting, to address this problem. Similar to traditional cross-validation, this approach primarily prevents the underestimation of the number of factors. We then introduce a penalized entrywise splitting cross-validation criterion, which integrates the original CV with information theoretic criteria by adding a penalty term. Its consistency is established under mild conditions in a high-dimensional setting, where both the sample size and the number of features grow to infinity. Furthermore, we extend our methodology to random missing data with different probability scenarios. We evaluate the performance of the proposed method through comprehensive simulations and apply it to a mouse brain single-cell RNA sequencing dataset.
{"title":"Entrywise splitting cross-validation in generalized factor models: from sample splitting to entrywise splitting.","authors":"Zhijing Wang","doi":"10.1093/biomtc/ujaf153","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf153","url":null,"abstract":"<p><p>The generalized factor models have been widely employed for dimension reduction across various types of multivariate data, including binary choices, counts, and continuous observations. While determining the number of factors in such models has received significant scholarly attention, it remains an open challenge in the field. In this paper, we propose a cross-validation (CV) method based on entrywise splitting (ES), rather than sample splitting, to address this problem. Similar to traditional cross-validation, this approach primarily prevents the underestimation of the number of factors. We then introduce a penalized entrywise splitting cross-validation criterion, which integrates the original CV with information theoretic criteria by adding a penalty term. Its consistency is established under mild conditions in a high-dimensional setting, where both the sample size and the number of features grow to infinity. Furthermore, we extend our methodology to random missing data with different probability scenarios. We evaluate the performance of the proposed method through comprehensive simulations and apply it to a mouse brain single-cell RNA sequencing dataset.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145628656","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuta Yamauchi, Genya Kobayashi, Shonosuke Sugasawa
Count data frequently arises in biomedical applications, such as the length of hospital stay. However, their discrete nature poses significant challenges for appropriately modeling conditional quantiles, which are crucial for understanding heterogeneous effects and variability in outcomes. To solve the practical difficulty, we propose a novel general Bayesian framework for quantile regression tailored to count data. We seek the regression parameter on the conditional quantile by minimizing the expected loss with respect to the distribution of the conditional quantile of the latent continuous variable associated with the observed count response variable. By modeling the unknown conditional distribution through a Bayesian nonparametric kernel mixture for the joint distribution of the count response and covariates, we obtain the posterior distribution of the regression parameter via a simple optimization. We numerically demonstrate that the proposed method improves bias and estimation accuracy of the existing crude approaches to count quantile regression. Furthermore, we analyze the length of hospital stay for acute myocardial infarction and demonstrate that the proposed method gives more interpretable and flexible results than the existing ones.
{"title":"Flexible Bayesian quantile regression for counts via generative modeling.","authors":"Yuta Yamauchi, Genya Kobayashi, Shonosuke Sugasawa","doi":"10.1093/biomtc/ujaf152","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf152","url":null,"abstract":"<p><p>Count data frequently arises in biomedical applications, such as the length of hospital stay. However, their discrete nature poses significant challenges for appropriately modeling conditional quantiles, which are crucial for understanding heterogeneous effects and variability in outcomes. To solve the practical difficulty, we propose a novel general Bayesian framework for quantile regression tailored to count data. We seek the regression parameter on the conditional quantile by minimizing the expected loss with respect to the distribution of the conditional quantile of the latent continuous variable associated with the observed count response variable. By modeling the unknown conditional distribution through a Bayesian nonparametric kernel mixture for the joint distribution of the count response and covariates, we obtain the posterior distribution of the regression parameter via a simple optimization. We numerically demonstrate that the proposed method improves bias and estimation accuracy of the existing crude approaches to count quantile regression. Furthermore, we analyze the length of hospital stay for acute myocardial infarction and demonstrate that the proposed method gives more interpretable and flexible results than the existing ones.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145628659","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kai Kang, Zhihao Wu, Xinjie Qian, Xinyuan Song, Hongtu Zhu
Federated learning enables the training of a global model while keeping data localized; however, current methods face challenges with high-dimensional semiparametric models that involve complex nuisance parameters. This paper proposes a federated double machine learning framework designed to address high-dimensional nuisance parameters of semiparametric models in multicenter studies. Our approach leverages double machine learning (Chernozhukov et al., 2018a) to estimate center-specific parameters, extends the surrogate efficient score method within a Neyman-orthogonal framework, and applies density ratio tilting to create a federated estimator that combines local individual-level data with summary statistics from other centers. This methodology mitigates regularization bias and overfitting in high-dimensional nuisance parameter estimation. We establish the estimator's limiting distribution under minimal assumptions, validate its performance through extensive simulations, and demonstrate its effectiveness in analyzing multiphase data from the Alzheimer's Disease Neuroimaging Initiative study.
{"title":"Federated double machine learning for high-dimensional semiparametric models.","authors":"Kai Kang, Zhihao Wu, Xinjie Qian, Xinyuan Song, Hongtu Zhu","doi":"10.1093/biomtc/ujaf150","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf150","url":null,"abstract":"<p><p>Federated learning enables the training of a global model while keeping data localized; however, current methods face challenges with high-dimensional semiparametric models that involve complex nuisance parameters. This paper proposes a federated double machine learning framework designed to address high-dimensional nuisance parameters of semiparametric models in multicenter studies. Our approach leverages double machine learning (Chernozhukov et al., 2018a) to estimate center-specific parameters, extends the surrogate efficient score method within a Neyman-orthogonal framework, and applies density ratio tilting to create a federated estimator that combines local individual-level data with summary statistics from other centers. This methodology mitigates regularization bias and overfitting in high-dimensional nuisance parameter estimation. We establish the estimator's limiting distribution under minimal assumptions, validate its performance through extensive simulations, and demonstrate its effectiveness in analyzing multiphase data from the Alzheimer's Disease Neuroimaging Initiative study.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145562431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matching is a commonly used causal inference study design in observational studies. Through matching on measured confounders between different treatment groups, valid randomization inferences can be conducted under the no unmeasured confounding assumption, and sensitivity analysis can be further performed to assess robustness of results to potential unmeasured confounding. However, for many common matched designs, there is still a lack of valid downstream randomization inference and sensitivity analysis methods. Specifically, in matched observational studies with treatment doses (eg, continuous or ordinal treatments), with the exception of some special cases such as pair matching, there is no existing randomization inference or sensitivity analysis method for studying analogs of the sample average treatment effect (ie, Neyman-type weak nulls), and no existing valid sensitivity analysis approach for testing the sharp null of no treatment effect for any subject (ie, Fisher's sharp null) when the outcome is nonbinary. To fill these important gaps, we propose new methods for randomization inference and sensitivity analysis that can work for general matched designs with treatment doses, applicable to general types of outcome variables (eg, binary, ordinal, or continuous), and cover both Fisher's sharp null and Neyman-type weak nulls. We illustrate our methods via comprehensive simulation studies and a real data application. All the proposed methods have been incorporated into $tt {R}$ package $tt {doseSens}$.
{"title":"Bridging the gap between design and analysis: randomization inference and sensitivity analysis for matched observational studies with treatment doses.","authors":"Jeffrey Zhang, Siyu Heng","doi":"10.1093/biomtc/ujaf156","DOIUrl":"10.1093/biomtc/ujaf156","url":null,"abstract":"<p><p>Matching is a commonly used causal inference study design in observational studies. Through matching on measured confounders between different treatment groups, valid randomization inferences can be conducted under the no unmeasured confounding assumption, and sensitivity analysis can be further performed to assess robustness of results to potential unmeasured confounding. However, for many common matched designs, there is still a lack of valid downstream randomization inference and sensitivity analysis methods. Specifically, in matched observational studies with treatment doses (eg, continuous or ordinal treatments), with the exception of some special cases such as pair matching, there is no existing randomization inference or sensitivity analysis method for studying analogs of the sample average treatment effect (ie, Neyman-type weak nulls), and no existing valid sensitivity analysis approach for testing the sharp null of no treatment effect for any subject (ie, Fisher's sharp null) when the outcome is nonbinary. To fill these important gaps, we propose new methods for randomization inference and sensitivity analysis that can work for general matched designs with treatment doses, applicable to general types of outcome variables (eg, binary, ordinal, or continuous), and cover both Fisher's sharp null and Neyman-type weak nulls. We illustrate our methods via comprehensive simulation studies and a real data application. All the proposed methods have been incorporated into $tt {R}$ package $tt {doseSens}$.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12665973/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145647307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The case-cohort study design is often used in modern epidemiological studies of rare diseases, as it can achieve similar efficiency as a much larger cohort study with a fraction of the cost. Previous work focused on parameter estimation for case-cohort studies based on a particular statistical model, but few discussed the survival prediction problem under such type of design. In this article, we propose a super learner algorithm for survival prediction in case-cohort studies. We further extend our proposed algorithm to generalized case-cohort studies. The proposed super learner algorithm is shown to have asymptotic model selection consistency as well as uniform consistency. We also demonstrate our algorithm has satisfactory finite sample performances. Simulation studies suggest that the proposed super learners trained by data from case-cohort and generalized case-cohort studies have better prediction accuracy than the ones trained by data from the simple random sampling design with the same sample sizes. Finally, we apply the proposed method to analyze a generalized case-cohort study conducted as part of the Atherosclerosis Risk in Communities Study.
{"title":"Super learner for survival prediction in case-cohort and generalized case-cohort studies.","authors":"Haolin Li, Haibo Zhou, David Couper, Jianwen Cai","doi":"10.1093/biomtc/ujaf155","DOIUrl":"10.1093/biomtc/ujaf155","url":null,"abstract":"<p><p>The case-cohort study design is often used in modern epidemiological studies of rare diseases, as it can achieve similar efficiency as a much larger cohort study with a fraction of the cost. Previous work focused on parameter estimation for case-cohort studies based on a particular statistical model, but few discussed the survival prediction problem under such type of design. In this article, we propose a super learner algorithm for survival prediction in case-cohort studies. We further extend our proposed algorithm to generalized case-cohort studies. The proposed super learner algorithm is shown to have asymptotic model selection consistency as well as uniform consistency. We also demonstrate our algorithm has satisfactory finite sample performances. Simulation studies suggest that the proposed super learners trained by data from case-cohort and generalized case-cohort studies have better prediction accuracy than the ones trained by data from the simple random sampling design with the same sample sizes. Finally, we apply the proposed method to analyze a generalized case-cohort study conducted as part of the Atherosclerosis Risk in Communities Study.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12665972/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145647246","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Effective treatment of medical conditions begins with an accurate diagnosis. However, many conditions are often underdiagnosed, either being overlooked or diagnosed after significant delays. Electronic health records (EHRs) contain extensive patient health information, offering an opportunity to probabilistically identify underdiagnosed individuals. The rationale is that both diagnosed and underdiagnosed patients may display similar health profiles in EHR data, distinguishing them from condition-free patients. Thus, EHR data can be leveraged to develop models that assess an individual's risk of having a condition. To date, this opportunity has largely remained unexploited, partly due to the lack of suitable statistical methods. The key challenge is the positive-unlabeled EHR data structure, which consists of data for diagnosed ("positive") patients and the remaining ("unlabeled") that include underdiagnosed patients and many condition-free patients. Therefore, data for patients who are unambiguously condition-free, essential for developing risk assessment models, are unavailable. To overcome this challenge, we propose ascertaining condition statuses for a small subset of unlabeled patients. We develop a novel statistical method for building accurate models using this supplemented EHR data to estimate the probability that a patient has the condition of interest. We study the asymptotic properties of our method and assess its finite-sample performance through simulation studies. Finally, we apply our method to develop a preliminary model for identifying potentially underdiagnosed non-alcoholic steatohepatitis patients using data from Penn Medicine EHRs.
{"title":"A semiparametric method for addressing underdiagnosis using electronic health record data.","authors":"Weidong Ma, Jordana B Cohen, Jinbo Chen","doi":"10.1093/biomtc/ujaf157","DOIUrl":"10.1093/biomtc/ujaf157","url":null,"abstract":"<p><p>Effective treatment of medical conditions begins with an accurate diagnosis. However, many conditions are often underdiagnosed, either being overlooked or diagnosed after significant delays. Electronic health records (EHRs) contain extensive patient health information, offering an opportunity to probabilistically identify underdiagnosed individuals. The rationale is that both diagnosed and underdiagnosed patients may display similar health profiles in EHR data, distinguishing them from condition-free patients. Thus, EHR data can be leveraged to develop models that assess an individual's risk of having a condition. To date, this opportunity has largely remained unexploited, partly due to the lack of suitable statistical methods. The key challenge is the positive-unlabeled EHR data structure, which consists of data for diagnosed (\"positive\") patients and the remaining (\"unlabeled\") that include underdiagnosed patients and many condition-free patients. Therefore, data for patients who are unambiguously condition-free, essential for developing risk assessment models, are unavailable. To overcome this challenge, we propose ascertaining condition statuses for a small subset of unlabeled patients. We develop a novel statistical method for building accurate models using this supplemented EHR data to estimate the probability that a patient has the condition of interest. We study the asymptotic properties of our method and assess its finite-sample performance through simulation studies. Finally, we apply our method to develop a preliminary model for identifying potentially underdiagnosed non-alcoholic steatohepatitis patients using data from Penn Medicine EHRs.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12665971/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145647261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to: Covariate-Adjusted Response-Adaptive Randomization for Multi-Arm Clinical Trials Using a Modified Forward Looking Gittins Index Rule.","authors":"","doi":"10.1093/biomtc/ujaf139","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf139","url":null,"abstract":"","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145372147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rejoinder to Letter to the Editors \"Comments on 'Statistical inference on change points in generalized semiparametric segmented models' by Yang et al. (2025)\" by Vito M.R. Muggeo.","authors":"Guangyu Yang, Min Zhang","doi":"10.1093/biomtc/ujaf148","DOIUrl":"10.1093/biomtc/ujaf148","url":null,"abstract":"","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":" ","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145602083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Byran J Smucker, Stephen E Wright, Isaac Williams, Richard C Page, Andor J Kiss, Surendra Bikram Silwal, Maria Weese, David J Edwards
High-throughput screening, in which large numbers of compounds are traditionally studied one-at-a-time in multiwell plates against specific targets, is widely used across many areas of the biological sciences, including drug discovery. To improve the effectiveness of these screens, we propose a new class of supersaturated designs that guide the construction of pools of compounds in each well. Because the size of the pools is typically limited by the particular application, the new designs accommodate this constraint and are part of a larger procedure that we call Constrained Row Screening or CRowS. We develop an efficient computational procedure to construct the CRowS designs, provide some initial lower bounds on the average squared off-diagonal values of their main-effects information matrix, and study the impact of the constraint on design quality. We also show via simulation that CRowS is statistically superior to the traditional one-compound-one-well approach as well as an existing pooling method, and demonstrate the use of the new methodology on a Verona Integron-encoded Metallo-$beta$-lactamase-2 assay.
{"title":"Large row-constrained supersaturated designs for high-throughput screening.","authors":"Byran J Smucker, Stephen E Wright, Isaac Williams, Richard C Page, Andor J Kiss, Surendra Bikram Silwal, Maria Weese, David J Edwards","doi":"10.1093/biomtc/ujaf160","DOIUrl":"10.1093/biomtc/ujaf160","url":null,"abstract":"<p><p>High-throughput screening, in which large numbers of compounds are traditionally studied one-at-a-time in multiwell plates against specific targets, is widely used across many areas of the biological sciences, including drug discovery. To improve the effectiveness of these screens, we propose a new class of supersaturated designs that guide the construction of pools of compounds in each well. Because the size of the pools is typically limited by the particular application, the new designs accommodate this constraint and are part of a larger procedure that we call Constrained Row Screening or CRowS. We develop an efficient computational procedure to construct the CRowS designs, provide some initial lower bounds on the average squared off-diagonal values of their main-effects information matrix, and study the impact of the constraint on design quality. We also show via simulation that CRowS is statistically superior to the traditional one-compound-one-well approach as well as an existing pooling method, and demonstrate the use of the new methodology on a Verona Integron-encoded Metallo-$beta$-lactamase-2 assay.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12696866/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145720530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Advances in wearable technologies and health interventions delivered by smartphones have greatly increased the accessibility of mobile health (mHealth) interventions. Micro-randomized trials (MRTs) are designed to assess the effectiveness of the mHealth intervention and introduce a novel class of causal estimands called "causal excursion effects." These estimands enable the evaluation of how intervention effects change over time and are influenced by individual characteristics or context. Existing methods for analyzing causal excursion effects assume known randomization probabilities, complete observations, and a linear nuisance function with prespecified features of the high-dimensional observed history. However, in complex mobile systems, these assumptions often fall short: randomization probabilities can be uncertain, observations may be incomplete, and the granularity of mHealth data makes linear modeling difficult. To address this issue, we propose a flexible and doubly robust inferential procedure, called "DR-WCLS," for estimating causal excursion effects from a meta-learner perspective. We present the bidirectional asymptotic properties of the proposed estimators and compare them with existing methods both theoretically and through extensive simulations. The results show a consistent and more efficient estimate, even with missing observations or uncertain treatment randomization probabilities. Finally, the practical utility of the proposed methods is demonstrated by analyzing data from a multi-institution cohort of first-year medical residents in the United States.
{"title":"A meta-learning method for estimation of causal excursion effects to assess time-varying moderation.","authors":"Jieru Shi, Walter Dempsey","doi":"10.1093/biomtc/ujaf129","DOIUrl":"https://doi.org/10.1093/biomtc/ujaf129","url":null,"abstract":"<p><p>Advances in wearable technologies and health interventions delivered by smartphones have greatly increased the accessibility of mobile health (mHealth) interventions. Micro-randomized trials (MRTs) are designed to assess the effectiveness of the mHealth intervention and introduce a novel class of causal estimands called \"causal excursion effects.\" These estimands enable the evaluation of how intervention effects change over time and are influenced by individual characteristics or context. Existing methods for analyzing causal excursion effects assume known randomization probabilities, complete observations, and a linear nuisance function with prespecified features of the high-dimensional observed history. However, in complex mobile systems, these assumptions often fall short: randomization probabilities can be uncertain, observations may be incomplete, and the granularity of mHealth data makes linear modeling difficult. To address this issue, we propose a flexible and doubly robust inferential procedure, called \"DR-WCLS,\" for estimating causal excursion effects from a meta-learner perspective. We present the bidirectional asymptotic properties of the proposed estimators and compare them with existing methods both theoretically and through extensive simulations. The results show a consistent and more efficient estimate, even with missing observations or uncertain treatment randomization probabilities. Finally, the practical utility of the proposed methods is demonstrated by analyzing data from a multi-institution cohort of first-year medical residents in the United States.</p>","PeriodicalId":8930,"journal":{"name":"Biometrics","volume":"81 4","pages":""},"PeriodicalIF":1.7,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145249539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}