Pub Date : 2024-04-12DOI: 10.1109/TPDS.2024.3387720
Xin Du;Minglong Wang;Zhihui Lu;Qiang Duan;Yuhao Liu;Jianfeng Feng;Huarui Wang
Brain simulation is one of the most important measures to understand how information is represented and processed in the brain, which usually needs to be realized in supercomputers with a large number of interconnected graphical processing units (GPUs). For the whole human brain simulation, tens of thousands of GPUs are utilized to simulate tens of billions of neurons and tens of trillions of synapses for the living brain to reveal functional connectivity patterns. However, as an application of the irregular spares communication problem on a large-scale system, the sparse and imbalanced communication patterns of the human brain make it particularly challenging to design a communication system for supporting large-scale brain simulations. To face this challenge, this paper proposes a hierarchical regularized communication mechanism, HRCM. The HRCM maintains a hierarchical virtual communication topology (HVCT) with a merge-forward algorithm that exploits the sparsity of neuron interactions to regularize inter-process communications in brain simulations. HRCM also provides a neuron-level partition scheme for assigning neurons to simulation processes to balance the communication load while improving resource utilization. In HRCM, neuron partition is formulated as a k-way graph partition problem and solved efficiently by the proposed hybrid multi-constraint greedy (HMCG) algorithm. HRCM has been implemented in human brain simulations at the scale of up to 86 billion neurons running on 10000 GPUs. Results obtained from extensive simulation experiments verify the effectiveness of HRCM in significantly reducing communication delay, increasing resource usage, and shortening simulation time for large-scale human brain models.
{"title":"HRCM: A Hierarchical Regularizing Mechanism for Sparse and Imbalanced Communication in Whole Human Brain Simulations","authors":"Xin Du;Minglong Wang;Zhihui Lu;Qiang Duan;Yuhao Liu;Jianfeng Feng;Huarui Wang","doi":"10.1109/TPDS.2024.3387720","DOIUrl":"10.1109/TPDS.2024.3387720","url":null,"abstract":"Brain simulation is one of the most important measures to understand how information is represented and processed in the brain, which usually needs to be realized in supercomputers with a large number of interconnected graphical processing units (GPUs). For the whole human brain simulation, tens of thousands of GPUs are utilized to simulate tens of billions of neurons and tens of trillions of synapses for the living brain to reveal functional connectivity patterns. However, as an application of the irregular spares communication problem on a large-scale system, the sparse and imbalanced communication patterns of the human brain make it particularly challenging to design a communication system for supporting large-scale brain simulations. To face this challenge, this paper proposes a hierarchical regularized communication mechanism, HRCM. The HRCM maintains a hierarchical virtual communication topology (HVCT) with a merge-forward algorithm that exploits the sparsity of neuron interactions to regularize inter-process communications in brain simulations. HRCM also provides a neuron-level partition scheme for assigning neurons to simulation processes to balance the communication load while improving resource utilization. In HRCM, neuron partition is formulated as a k-way graph partition problem and solved efficiently by the proposed hybrid multi-constraint greedy (HMCG) algorithm. HRCM has been implemented in human brain simulations at the scale of up to 86 billion neurons running on 10000 GPUs. Results obtained from extensive simulation experiments verify the effectiveness of HRCM in significantly reducing communication delay, increasing resource usage, and shortening simulation time for large-scale human brain models.","PeriodicalId":13257,"journal":{"name":"IEEE Transactions on Parallel and Distributed Systems","volume":"35 6","pages":"901-918"},"PeriodicalIF":5.3,"publicationDate":"2024-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140561605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hyper-parameter tuning (HPT) for deep learning (DL) models is prohibitively expensive. Sequential model-based optimization (SMBO) emerges as the state-of-the-art (SOTA) approach to automatically optimize HPT performance due to its heuristic advantages. Unfortunately, focusing on algorithm optimization rather than a large-scale parallel HPT system, existing SMBO-based approaches still cannot effectively remove their strong sequential nature, posing two performance problems: (1) extremely low tuning speed