The streaming multiprocessor (SM) count in GPUs continues to increase to provide high computing power. To construct a scalable crossbar network that connects the SMs to the LLC slices and memory controllers, a cluster structure is exploited in GPUs where a group of SMs shares a network port. Unfortunately, current GPU spatial multitasking is unaware of this underlying network-on-chip infrastructure which poses the challenges and also the opportunities for the performance. In this paper, we observe that compared to the cluster-unaware multitasking, considering the cluster structure, the SM partition within a cluster and also the injecting policy of sharing the network port can bring significant performance improvement. Next, we propose a low-cost online profiling and scheduling policy that consists of two steps. The cluster-aware scheduling first determines the best SM partition within a cluster and then finds the proper injecting policy between the two co-executing applications. Both steps are achieved in online profiling which only incurs limited runtime overhead. The evaluation results show that for all workloads, our cluster-aware multitasking increases the system throughput by 12.9% on average (and up to 76.5%).
{"title":"Cluster-aware scheduling in multitasking GPUs","authors":"Xia Zhao, Huiquan Wang, Anwen Huang, Dongsheng Wang, Guangda Zhang","doi":"10.1007/s11241-023-09409-x","DOIUrl":"https://doi.org/10.1007/s11241-023-09409-x","url":null,"abstract":"<p>The streaming multiprocessor (SM) count in GPUs continues to increase to provide high computing power. To construct a scalable crossbar network that connects the SMs to the LLC slices and memory controllers, a cluster structure is exploited in GPUs where a group of SMs shares a network port. Unfortunately, current GPU spatial multitasking is unaware of this underlying network-on-chip infrastructure which poses the challenges and also the opportunities for the performance. In this paper, we observe that compared to the cluster-unaware multitasking, considering the cluster structure, the SM partition within a cluster and also the injecting policy of sharing the network port can bring significant performance improvement. Next, we propose a low-cost online profiling and scheduling policy that consists of two steps. The cluster-aware scheduling first determines the best SM partition within a cluster and then finds the proper injecting policy between the two co-executing applications. Both steps are achieved in online profiling which only incurs limited runtime overhead. The evaluation results show that for all workloads, our cluster-aware multitasking increases the system throughput by 12.9% on average (and up to 76.5%).\u0000</p>","PeriodicalId":54507,"journal":{"name":"Real-Time Systems","volume":"205 1","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138496266","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-14DOI: 10.1007/s11241-023-09411-3
Felipe Lisboa Malaquias, Mihail Asavoae, Florian Brandner
Abstract In order to prove conformance to memory standards and bound memory access latency, recently proposed real-time DRAM controllers rely on paper and pencil proofs, which can be troubling: they are difficult to read and review, they are often shown only partially and/or rely on abstractions for the sake of conciseness, and they can easily diverge from the controller implementation, as no formal link is established between both. We propose a new framework written in Coq, in which we model a DRAM controller and its expected behaviour as a formal specification. The trustworthiness in our solution is two-fold: (1) proofs that are typically done on paper and pencil are now done in Coq and thus certified by its kernel, and (2) the reviewer’s job develops into making sure that the formal specification matches the standards—instead of performing a thorough check of the mathematical formalism. Our framework provides a generic DRAM model capturing a set of controller properties as proof obligations, which all implementations must comply with. We focus on properties related to the assertiveness that timing constraints are respected, every incoming request is handled in bounded time, and the DRAM command protocol is respected. We refine our specification with two implementations based on widely-known arbitration policies— First-in First-Out (FIFO) and Time-Division Multiplexing (TDM). We extract proved code from our model and use it as a “trusted core” on a cycle-accurate DRAM simulator.
{"title":"A formal framework to design and prove trustworthy memory controllers","authors":"Felipe Lisboa Malaquias, Mihail Asavoae, Florian Brandner","doi":"10.1007/s11241-023-09411-3","DOIUrl":"https://doi.org/10.1007/s11241-023-09411-3","url":null,"abstract":"Abstract In order to prove conformance to memory standards and bound memory access latency, recently proposed real-time DRAM controllers rely on paper and pencil proofs, which can be troubling: they are difficult to read and review, they are often shown only partially and/or rely on abstractions for the sake of conciseness, and they can easily diverge from the controller implementation, as no formal link is established between both. We propose a new framework written in Coq, in which we model a DRAM controller and its expected behaviour as a formal specification. The trustworthiness in our solution is two-fold: (1) proofs that are typically done on paper and pencil are now done in Coq and thus certified by its kernel, and (2) the reviewer’s job develops into making sure that the formal specification matches the standards—instead of performing a thorough check of the mathematical formalism. Our framework provides a generic DRAM model capturing a set of controller properties as proof obligations, which all implementations must comply with. We focus on properties related to the assertiveness that timing constraints are respected, every incoming request is handled in bounded time, and the DRAM command protocol is respected. We refine our specification with two implementations based on widely-known arbitration policies— First-in First-Out (FIFO) and Time-Division Multiplexing (TDM). We extract proved code from our model and use it as a “trusted core” on a cycle-accurate DRAM simulator.","PeriodicalId":54507,"journal":{"name":"Real-Time Systems","volume":"116 6","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134957472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-13DOI: 10.1007/s11241-023-09416-y
Giorgio Buttazzo, Daniela De Venuto, Eugenio Di Sciascio, Toni Mancini
{"title":"Special issue on embedded real-time applications","authors":"Giorgio Buttazzo, Daniela De Venuto, Eugenio Di Sciascio, Toni Mancini","doi":"10.1007/s11241-023-09416-y","DOIUrl":"https://doi.org/10.1007/s11241-023-09416-y","url":null,"abstract":"","PeriodicalId":54507,"journal":{"name":"Real-Time Systems","volume":"64 43","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136282479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-10DOI: 10.1007/s11241-023-09415-z
Geoffrey Nelissen, Laurent Pautet
{"title":"Special issue on reliable data transmission in real-time systems","authors":"Geoffrey Nelissen, Laurent Pautet","doi":"10.1007/s11241-023-09415-z","DOIUrl":"https://doi.org/10.1007/s11241-023-09415-z","url":null,"abstract":"","PeriodicalId":54507,"journal":{"name":"Real-Time Systems","volume":"122 17","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135136611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-11-01DOI: 10.1007/s11241-023-09410-4
Marcello Cinque, Luigi De Simone, Nicola Mazzocca, Daniele Ottaviano, Francesco Vitale
Abstract Technological advances in embedded systems and the advent of fog computing led to improved quality of service of applications of cyber-physical systems. In fact, the deployment of such applications on powerful and heterogeneous embedded systems, such as multiprocessors system-on-chips (MPSoCs), allows them to meet latency requirements and real-time operation. Highly relevant to the industry and our reference case-study, the challenging field of nuclear fusion deploys the aforementioned applications, involving high-frequency control with hard real-time and safety constraints. The use of fog computing and MPSoCs is promising to achieve safety, low latency, and timeliness of such control. Indeed, on one hand, applications designed according to fog computing distribute computation across hierarchically organized and geographically distributed edge devices, enabling timely anomaly detection during high-frequency sampling of time series, and, on the other hand, MPSoCs allow leveraging fog computing and integrating monitoring by deploying tasks on a flexible platform suited for mixed-criticality software, leading to so-called mixed criticality systems (MCSs). However, the integration of such software on the same MPSoC opens challenges related to predictability and reliability guarantees, as tasks interfering with each other when accessing the same shared MPSoC resources may introduce non-deterministic latency, possibly leading to failures on account of deadline overruns. Addressing the design, deployment, and evaluation of MCSs on MPSoCs, we propose a model-based system development process that facilitates the integration of real-time and monitoring software on the same platform by means of a formal notation for modeling the design and deployment of MPSoCs. The proposed notation allows developers to leverage embedded hypervisors for monitoring real-time applications and guaranteeing predictability by isolation of hardware resources. Providing evidence of the feasibility of our system development process and evaluating the industry-relevant class of nuclear fusion applications, we experiment with a safety-critical case-study in the context of the ITER nuclear fusion reactor. Our experimentation involves the design and evaluation of several prototypes deployed as MCSs on a virtualized MPSoC, showing that deployment choices linked to the monitor placement and virtualization configurations (e.g., resource allocation, partitioning, and scheduling policies) can significantly impact the predictability of MCSs in terms of Worst-Case Execution Times and other related metrics.
{"title":"Evaluating virtualization for fog monitoring of real-time applications in mixed-criticality systems","authors":"Marcello Cinque, Luigi De Simone, Nicola Mazzocca, Daniele Ottaviano, Francesco Vitale","doi":"10.1007/s11241-023-09410-4","DOIUrl":"https://doi.org/10.1007/s11241-023-09410-4","url":null,"abstract":"Abstract Technological advances in embedded systems and the advent of fog computing led to improved quality of service of applications of cyber-physical systems. In fact, the deployment of such applications on powerful and heterogeneous embedded systems, such as multiprocessors system-on-chips (MPSoCs), allows them to meet latency requirements and real-time operation. Highly relevant to the industry and our reference case-study, the challenging field of nuclear fusion deploys the aforementioned applications, involving high-frequency control with hard real-time and safety constraints. The use of fog computing and MPSoCs is promising to achieve safety, low latency, and timeliness of such control. Indeed, on one hand, applications designed according to fog computing distribute computation across hierarchically organized and geographically distributed edge devices, enabling timely anomaly detection during high-frequency sampling of time series, and, on the other hand, MPSoCs allow leveraging fog computing and integrating monitoring by deploying tasks on a flexible platform suited for mixed-criticality software, leading to so-called mixed criticality systems (MCSs). However, the integration of such software on the same MPSoC opens challenges related to predictability and reliability guarantees, as tasks interfering with each other when accessing the same shared MPSoC resources may introduce non-deterministic latency, possibly leading to failures on account of deadline overruns. Addressing the design, deployment, and evaluation of MCSs on MPSoCs, we propose a model-based system development process that facilitates the integration of real-time and monitoring software on the same platform by means of a formal notation for modeling the design and deployment of MPSoCs. The proposed notation allows developers to leverage embedded hypervisors for monitoring real-time applications and guaranteeing predictability by isolation of hardware resources. Providing evidence of the feasibility of our system development process and evaluating the industry-relevant class of nuclear fusion applications, we experiment with a safety-critical case-study in the context of the ITER nuclear fusion reactor. Our experimentation involves the design and evaluation of several prototypes deployed as MCSs on a virtualized MPSoC, showing that deployment choices linked to the monitor placement and virtualization configurations (e.g., resource allocation, partitioning, and scheduling policies) can significantly impact the predictability of MCSs in terms of Worst-Case Execution Times and other related metrics.","PeriodicalId":54507,"journal":{"name":"Real-Time Systems","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135325841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-29DOI: 10.1007/s11241-023-09405-1
Miguel Alcon, Axel Brando, E. Mezzetti, J. Abella, F. Cazorla
{"title":"Main sources of variability and non-determinism in AD software: taxonomy and prospects to handle them","authors":"Miguel Alcon, Axel Brando, E. Mezzetti, J. Abella, F. Cazorla","doi":"10.1007/s11241-023-09405-1","DOIUrl":"https://doi.org/10.1007/s11241-023-09405-1","url":null,"abstract":"","PeriodicalId":54507,"journal":{"name":"Real-Time Systems","volume":"59 1","pages":"438 - 478"},"PeriodicalIF":1.3,"publicationDate":"2023-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42219760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-28DOI: 10.1007/s11241-023-09407-z
Iryna De Albuquerque Silva, Thomas Carle, A. Gauffriau, C. Pagetti
{"title":"Extending a predictable machine learning framework with efficient gemm-based convolution routines","authors":"Iryna De Albuquerque Silva, Thomas Carle, A. Gauffriau, C. Pagetti","doi":"10.1007/s11241-023-09407-z","DOIUrl":"https://doi.org/10.1007/s11241-023-09407-z","url":null,"abstract":"","PeriodicalId":54507,"journal":{"name":"Real-Time Systems","volume":"59 1","pages":"408 - 437"},"PeriodicalIF":1.3,"publicationDate":"2023-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47668338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2023-08-21DOI: 10.1007/s11241-023-09404-2
D. Ferraro, Lucas Palazzi, Federico Gavioli, Michele Guzzinati, A. Bernardi, Benjamin Rouxel, P. Burgio, M. Solieri
{"title":"Time-sensitive autonomous architectures","authors":"D. Ferraro, Lucas Palazzi, Federico Gavioli, Michele Guzzinati, A. Bernardi, Benjamin Rouxel, P. Burgio, M. Solieri","doi":"10.1007/s11241-023-09404-2","DOIUrl":"https://doi.org/10.1007/s11241-023-09404-2","url":null,"abstract":"","PeriodicalId":54507,"journal":{"name":"Real-Time Systems","volume":" ","pages":""},"PeriodicalIF":1.3,"publicationDate":"2023-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45002820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}