Zhaohui Tang, Thi Phuoc Van Nguyen, Wencheng Yang, Xiaoyu Xia, Huaming Chen, Amy B Mullens, Judith A Dean, Sonya R Osborne, Yan Li
{"title":"High security and privacy protection model for STI/HIV risk prediction.","authors":"Zhaohui Tang, Thi Phuoc Van Nguyen, Wencheng Yang, Xiaoyu Xia, Huaming Chen, Amy B Mullens, Judith A Dean, Sonya R Osborne, Yan Li","doi":"10.1177/20552076241298425","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Applying and leveraging artificial intelligence within the healthcare domain has emerged as a fundamental pursuit to advance health. Data-driven models rooted in deep learning have become powerful tools for use in healthcare informatics. Nevertheless, healthcare data are highly sensitive and must be safeguarded, particularly information related to sexually transmissible infections (STIs) and human immunodeficiency virus (HIV).</p><p><strong>Methods: </strong>We employed federated learning (FL) in combination with homomorphic encryption (HE) for STI/HIV prediction to train deep learning models on decentralized data while upholding rigorous privacy. The dataset included 168,459 data entries collected from eight countries between 2013 and 2018. The data for each country was split into two groups, with 70% allocated for training and 30% for testing. Our strategy was based on two-step aggregation to enhance model performance and leverage the area under the curve (AUC) and accuracy metrics and involved a secondary aggregation at the local level before utilizing the global model for each client. We introduced a dropout approach as an effective client-side solution to mitigate computational costs.</p><p><strong>Results: </strong>Model performance was progressively enhanced from an AUC of 0.78 and an accuracy of 74.4% using the local model to an AUC of 0.94 and an accuracy of 90.7% using the more advanced model.</p><p><strong>Conclusion: </strong>Our proposed model for STI/HIV risk prediction surpasses those achieved by local models and those constructed from centralized data sources, highlighting the potential of our approach to improve healthcare outcomes while safeguarding sensitive patient information.</p>","PeriodicalId":51333,"journal":{"name":"DIGITAL HEALTH","volume":"10 ","pages":"20552076241298425"},"PeriodicalIF":2.9000,"publicationDate":"2024-11-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11580078/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"DIGITAL HEALTH","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1177/20552076241298425","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction: Applying and leveraging artificial intelligence within the healthcare domain has emerged as a fundamental pursuit to advance health. Data-driven models rooted in deep learning have become powerful tools for use in healthcare informatics. Nevertheless, healthcare data are highly sensitive and must be safeguarded, particularly information related to sexually transmissible infections (STIs) and human immunodeficiency virus (HIV).
Methods: We employed federated learning (FL) in combination with homomorphic encryption (HE) for STI/HIV prediction to train deep learning models on decentralized data while upholding rigorous privacy. The dataset included 168,459 data entries collected from eight countries between 2013 and 2018. The data for each country was split into two groups, with 70% allocated for training and 30% for testing. Our strategy was based on two-step aggregation to enhance model performance and leverage the area under the curve (AUC) and accuracy metrics and involved a secondary aggregation at the local level before utilizing the global model for each client. We introduced a dropout approach as an effective client-side solution to mitigate computational costs.
Results: Model performance was progressively enhanced from an AUC of 0.78 and an accuracy of 74.4% using the local model to an AUC of 0.94 and an accuracy of 90.7% using the more advanced model.
Conclusion: Our proposed model for STI/HIV risk prediction surpasses those achieved by local models and those constructed from centralized data sources, highlighting the potential of our approach to improve healthcare outcomes while safeguarding sensitive patient information.