Ensuring privacy in distributed machine learning while computing the Area Under the Curve (AUC) is a significant challenge because pooling sensitive test data is often not allowed. Although cryptographic methods can address some of these concerns, they may compromise either scalability or accuracy. In this paper, we present two privacy-preserving solutions for secure AUC computation across multiple institutions: (1) an exact global AUC method that handles ties in prediction scores and scales linearly with the number of samples, and (2) an approximation method that substantially reduces runtime while maintaining acceptable accuracy. Our protocols leverage a combination of homomorphic encryption (modified Paillier), symmetric and asymmetric cryptography, and randomized encoding to preserve the confidentiality of true labels and model predictions. We integrate these methods into the Personal Health Train (PHT)-meDIC platform, a distributed machine learning environment designed for healthcare, to demonstrate their correctness and feasibility. Results using both real-world and synthetic datasets confirm the accuracy of our approach: the exact method computes the true AUC without revealing private inputs, and the approximation provides a balanced trade-off between computational efficiency and precision. All relevant code and data is publicly available at https://github.com/PHT-meDIC/PP-AUC, facilitating straightforward adoption and further development within broader distributed learning ecosystems.
扫码关注我们
求助内容:
应助结果提醒方式:
