The increasing reliance on High-Performance Computing (HPC) systems to execute complex scientific and industrial workloads raises significant security concerns related to the misuse of HPC resources for unauthorized or malicious activities. Rogue job executions can threaten the integrity, confidentiality, and availability of HPC infrastructures. Given the scale and heterogeneity of HPC job submissions, manual or ad hoc monitoring is inadequate to effectively detect such misuse. Therefore, automated solutions capable of systematically analyzing job submissions are essential to detect rogue workloads. To address this challenge, we present RoWD (Rogue Workload Detector), the first framework for automated and systematic security screening of the HPC job-submission pipeline. RoWD is composed of modular plug-ins that classify different types of workloads and enable the detection of rogue jobs through the analysis of job scripts and associated metadata. We deploy RoWD on the Supercomputer Fugaku to classify AI workloads and release SCRIPT-AI, the first dataset of annotated job scripts labeled with workload characteristics. We evaluate RoWD on approximately 50K previously unseen jobs executed on Fugaku between 2021 and 2025. Our results show that RoWD accurately classifies AI jobs (achieving an F1 score of 95%), is robust against adversarial behavior, and incurs low runtime overhead, making it suitable for strengthening the security of HPC environments and for real-time deployment in production systems.
扫码关注我们
求助内容:
应助结果提醒方式:
