{"title":"利用高维生存数据优化治疗方案的无模型变量筛选法","authors":"Cheng-Han Yang, Yu-Jen Cheng","doi":"10.1093/biomet/asae022","DOIUrl":null,"url":null,"abstract":"Summary We propose a model-free variable screening method for the optimal treatment regime with high-dimensional survival data. The proposed screening method provides a unified framework to select the active variables in a prespecified target population, including the treated group as a special case. Based on this framework, the optimal treatment regime is exactly the optimal classifier that minimizes a weighted misclassification error rate, with weights associated with survival outcome variables, the censoring distribution, and a prespecified target population. Our main contribution involves reformulating the weighted classification problem into a classification problem within a hypothetical population, where the observed data can be viewed as a sample obtained from outcome-dependent sampling, with the selection probability inversely proportional to the weights. Consequently, we introduce the weighted Kolmogorov–Smirnov approach for selecting active variables in the optimal treatment regime, extending the conventional Kolmogorov–Smirnov method for binary classification. Additionally, the proposed screening method exhibits two levels of robustness. The first level of robustness is achieved because the proposed method does not require any model assumptions for survival outcome on treatment and covariates, whereas the other is attained as the form of treatment regimes is allowed to be unspecified even without requiring convex surrogate loss, such as logit loss or hinge loss. As a result, the proposed screening method is robust to model misspecifications, and nonparametric learning methods such as random forests and boosting can be applied to those selected variables for further analysis. The theoretical properties of the proposed method are established. The performance of the proposed method is examined through simulation studies and illustrated by a real dataset.","PeriodicalId":9001,"journal":{"name":"Biometrika","volume":"46 1","pages":""},"PeriodicalIF":2.4000,"publicationDate":"2024-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A model-free variable screening method for optimal treatment regimes with high-dimensional survival data\",\"authors\":\"Cheng-Han Yang, Yu-Jen Cheng\",\"doi\":\"10.1093/biomet/asae022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Summary We propose a model-free variable screening method for the optimal treatment regime with high-dimensional survival data. The proposed screening method provides a unified framework to select the active variables in a prespecified target population, including the treated group as a special case. Based on this framework, the optimal treatment regime is exactly the optimal classifier that minimizes a weighted misclassification error rate, with weights associated with survival outcome variables, the censoring distribution, and a prespecified target population. Our main contribution involves reformulating the weighted classification problem into a classification problem within a hypothetical population, where the observed data can be viewed as a sample obtained from outcome-dependent sampling, with the selection probability inversely proportional to the weights. Consequently, we introduce the weighted Kolmogorov–Smirnov approach for selecting active variables in the optimal treatment regime, extending the conventional Kolmogorov–Smirnov method for binary classification. Additionally, the proposed screening method exhibits two levels of robustness. The first level of robustness is achieved because the proposed method does not require any model assumptions for survival outcome on treatment and covariates, whereas the other is attained as the form of treatment regimes is allowed to be unspecified even without requiring convex surrogate loss, such as logit loss or hinge loss. As a result, the proposed screening method is robust to model misspecifications, and nonparametric learning methods such as random forests and boosting can be applied to those selected variables for further analysis. The theoretical properties of the proposed method are established. The performance of the proposed method is examined through simulation studies and illustrated by a real dataset.\",\"PeriodicalId\":9001,\"journal\":{\"name\":\"Biometrika\",\"volume\":\"46 1\",\"pages\":\"\"},\"PeriodicalIF\":2.4000,\"publicationDate\":\"2024-05-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biometrika\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://doi.org/10.1093/biomet/asae022\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biometrika","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1093/biomet/asae022","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
A model-free variable screening method for optimal treatment regimes with high-dimensional survival data
Summary We propose a model-free variable screening method for the optimal treatment regime with high-dimensional survival data. The proposed screening method provides a unified framework to select the active variables in a prespecified target population, including the treated group as a special case. Based on this framework, the optimal treatment regime is exactly the optimal classifier that minimizes a weighted misclassification error rate, with weights associated with survival outcome variables, the censoring distribution, and a prespecified target population. Our main contribution involves reformulating the weighted classification problem into a classification problem within a hypothetical population, where the observed data can be viewed as a sample obtained from outcome-dependent sampling, with the selection probability inversely proportional to the weights. Consequently, we introduce the weighted Kolmogorov–Smirnov approach for selecting active variables in the optimal treatment regime, extending the conventional Kolmogorov–Smirnov method for binary classification. Additionally, the proposed screening method exhibits two levels of robustness. The first level of robustness is achieved because the proposed method does not require any model assumptions for survival outcome on treatment and covariates, whereas the other is attained as the form of treatment regimes is allowed to be unspecified even without requiring convex surrogate loss, such as logit loss or hinge loss. As a result, the proposed screening method is robust to model misspecifications, and nonparametric learning methods such as random forests and boosting can be applied to those selected variables for further analysis. The theoretical properties of the proposed method are established. The performance of the proposed method is examined through simulation studies and illustrated by a real dataset.
期刊介绍:
Biometrika is primarily a journal of statistics in which emphasis is placed on papers containing original theoretical contributions of direct or potential value in applications. From time to time, papers in bordering fields are also published.