We study estimation and testing in the Poisson regression model with noisy high dimensional covariates, which has wide applications in analyzing noisy big data. Correcting for the estimation bias due to the covariate noise leads to a non-convex target function to minimize. Treating the high dimensional issue further leads us to augment an amenable penalty term to the target function. We propose to estimate the regression parameter through minimizing the penalized target function. We derive the L1 and L2 convergence rates of the estimator and prove the variable selection consistency. We further establish the asymptotic normality of any subset of the parameters, where the subset can have infinitely many components as long as its cardinality grows sufficiently slow. We develop Wald and score tests based on the asymptotic normality of the estimator, which permits testing of linear functions of the members if the subset. We examine the finite sample performance of the proposed tests by extensive simulation. Finally, the proposed method is successfully applied to the Alzheimer's Disease Neuroimaging Initiative study, which motivated this work initially.
We consider the batch (off-line) policy learning problem in the infinite horizon Markov Decision Process. Motivated by mobile health applications, we focus on learning a policy that maximizes the long-term average reward. We propose a doubly robust estimator for the average reward and show that it achieves semiparametric efficiency. Further we develop an optimization algorithm to compute the optimal policy in a parameterized stochastic policy class. The performance of the estimated policy is measured by the difference between the optimal average reward in the policy class and the average reward of the estimated policy and we establish a finite-sample regret guarantee. The performance of the method is illustrated by simulation studies and an analysis of a mobile health study promoting physical activity.
Multiple biomarkers are often combined to improve disease diagnosis. The uniformly optimal combination, i.e., with respect to all reasonable performance metrics, unfortunately requires excessive distributional modeling, to which the estimation can be sensitive. An alternative strategy is rather to pursue local optimality with respect to a specific performance metric. Nevertheless, existing methods may not target clinical utility of the intended medical test, which usually needs to operate above a certain sensitivity or specificity level, or do not have their statistical properties well studied and understood. In this article, we develop and investigate a linear combination method to maximize the clinical utility empirically for such a constrained classification. The combination coefficient is shown to have cube root asymptotics. The convergence rate and limiting distribution of the predictive performance are subsequently established, exhibiting robustness of the method in comparison with others. An algorithm with sound statistical justification is devised for efficient and high-quality computation. Simulations corroborate the theoretical results, and demonstrate good statistical and computational performance. Illustration with a clinical study on aggressive prostate cancer detection is provided.
Purpose: The purpose of this study is to prospectively evaluate surgical and quality of life (QoL) outcomes of robotic retromuscular ventral hernia repair (rRMVHR) using a new hybrid mesh in high-risk patients.
Methods: Data was prospectively collected for patients classified as high-risk based on the modified ventral hernia working group (VHWG) grading system, who underwent rRMVHR using Synecor™ Pre hybrid mesh in a single center, between 2019 and 2020. Pre-, intra- and postoperative variables including hernia recurrence, surgical site events (SSE), hernia-specific quality of life (QoL), and financial costs were analyzed. QoL assessments were obtained from preoperative and postoperative patient visits. Kaplan-Meier survival analysis was performed to analyze the estimated recurrence-free time.
Results: Fifty-two high-risk patients, with a mean (±SD) age of 58.6 ± 13.7 years and BMI of 36.9 ± 6.6 kg/m2, were followed for a mean (±SD) period of 22.4 ± 7.1 months. A total of 11 (21.2%) patients experienced postoperative complications, out of which eight were SSEs, including 7 (13.5%) seromas, 1 (1.9%) hematoma, and no infections. Procedural interventions were required for 2 (3.8%) surgical site occurrences. Recurrence was seen in 1 (1.9%) patient. The estimated mean (95% confidence interval) recurrence-free time was 33 (32.3-34.5) months. Postoperative QoL assessments demonstrated significant improvements in comparison to preoperative QoL, with a minimum ∆mean (±SD) of -15.5 ± 2.2 at one month (p < 0.001). The mean (±SD) procedure cost was $13,924.18 ± 7856.95 which includes the average mesh cost ($5390.12 ± 3817.03).
Conclusion: Our study showed favorable early and mid-term outcomes, in addition to significant improvements in QoL, after rRMVHR using Synecor™ hybrid mesh in high-risk patients.
Inferring causal relationships or related associations from observational data can be invalidated by the existence of hidden confounding. We focus on a high-dimensional linear regression setting, where the measured covariates are affected by hidden confounding and propose the Doubly Debiased Lasso estimator for individual components of the regression coefficient vector. Our advocated method simultaneously corrects both the bias due to estimation of high-dimensional parameters as well as the bias caused by the hidden confounding. We establish its asymptotic normality and also prove that it is efficient in the Gauss-Markov sense. The validity of our methodology relies on a dense confounding assumption, i.e. that every confounding variable affects many covariates. The finite sample performance is illustrated with an extensive simulation study and a genomic application.
Large-scale multiple testing is a fundamental problem in high dimensional statistical inference. It is increasingly common that various types of auxiliary information, reflecting the structural relationship among the hypotheses, are available. Exploiting such auxiliary information can boost statistical power. To this end, we propose a framework based on a two-group mixture model with varying probabilities of being null for different hypotheses a priori, where a shape-constrained relationship is imposed between the auxiliary information and the prior probabilities of being null. An optimal rejection rule is designed to maximize the expected number of true positives when average false discovery rate is controlled. Focusing on the ordered structure, we develop a robust EM algorithm to estimate the prior probabilities of being null and the distribution of p-values under the alternative hypothesis simultaneously. We show that the proposed method has better power than state-of-the-art competitors while controlling the false discovery rate, both empirically and theoretically. Extensive simulations demonstrate the advantage of the proposed method. Datasets from genome-wide association studies are used to illustrate the new methodology.