{"title":"Privacy enhanced collaborative inference in the Cox proportional hazards model for distributed data","authors":"Mengtong Hu, Xu Shi, Peter X. -K. Song","doi":"arxiv-2409.04716","DOIUrl":null,"url":null,"abstract":"Data sharing barriers are paramount challenges arising from multicenter\nclinical studies where multiple data sources are stored in a distributed\nfashion at different local study sites. Particularly in the case of\ntime-to-event analysis when global risk sets are needed for the Cox\nproportional hazards model, access to a centralized database is typically\nnecessary. Merging such data sources into a common data storage for a\ncentralized statistical analysis requires a data use agreement, which is often\ntime-consuming. Furthermore, the construction and distribution of risk sets to\nparticipating clinical centers for subsequent calculations may pose a risk of\nrevealing individual-level information. We propose a new collaborative Cox\nmodel that eliminates the need for accessing the centralized database and\nconstructing global risk sets but needs only the sharing of summary statistics\nwith significantly smaller dimensions than risk sets. Thus, the proposed\ncollaborative inference enjoys maximal protection of data privacy. We show\ntheoretically and numerically that the new distributed proportional hazards\nmodel approach has little loss of statistical power when compared to the\ncentralized method that requires merging the entire data. We present a\nrenewable sieve method to establish large-sample properties for the proposed\nmethod. We illustrate its performance through simulation experiments and a\nreal-world data example from patients with kidney transplantation in the Organ\nProcurement and Transplantation Network (OPTN) to understand the factors\nassociated with the 5-year death-censored graft failure (DCGF) for patients who\nunderwent kidney transplants in the US.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"33 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Statistics Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.04716","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Data sharing barriers are paramount challenges arising from multicenter
clinical studies where multiple data sources are stored in a distributed
fashion at different local study sites. Particularly in the case of
time-to-event analysis when global risk sets are needed for the Cox
proportional hazards model, access to a centralized database is typically
necessary. Merging such data sources into a common data storage for a
centralized statistical analysis requires a data use agreement, which is often
time-consuming. Furthermore, the construction and distribution of risk sets to
participating clinical centers for subsequent calculations may pose a risk of
revealing individual-level information. We propose a new collaborative Cox
model that eliminates the need for accessing the centralized database and
constructing global risk sets but needs only the sharing of summary statistics
with significantly smaller dimensions than risk sets. Thus, the proposed
collaborative inference enjoys maximal protection of data privacy. We show
theoretically and numerically that the new distributed proportional hazards
model approach has little loss of statistical power when compared to the
centralized method that requires merging the entire data. We present a
renewable sieve method to establish large-sample properties for the proposed
method. We illustrate its performance through simulation experiments and a
real-world data example from patients with kidney transplantation in the Organ
Procurement and Transplantation Network (OPTN) to understand the factors
associated with the 5-year death-censored graft failure (DCGF) for patients who
underwent kidney transplants in the US.