首页 > 最新文献

International Conference on Hardware/Software Codesign and System Synthesis最新文献

英文 中文
Furion: alleviating overheads for deep learning framework on single machine (work-in-progress) Furion:减轻单机上深度学习框架的开销(正在开发中)
Pub Date : 2018-09-30 DOI: 10.5555/3283568.3283582
L. Jin, Chao Wang, Lei Gong, Chongchong Xu, Yahui Hu, Luchao Tan, Xuehai Zhou
Deep learning has been successful at solving many kinds of tasks. Hardware accelerators with high performance and parallelism have become mainstream to implement deep neural networks. In order to increase hardware utilization, multiple applications will share the same compute resource. However, different applications may use different deep learning frameworks and occupy different amounts of resources. If there are no scheduling platforms that are compatible with different frameworks, resources competition will result in longer response time, run out of memory, and other errors. When the resources of the system cannot satisfy all the applications at the same time, application switching overhead will be excessive without reasonable resource management strategy.In this paper, we propose Furion - a middleware alleviates overheads for deep learning framework on a single machine. Furion schedules tasks, overlaps the execution of different computing resource, and batches unknown inputs to increase the hardware accelerator utilization. It dynamically manages memory usage for each application to alleviate the overhead of application switching and make a complex model enable implement in a low-end GPU. Our experiment proved that Furion achieves 2.2x-2.7x speedup on the GTX1060.
深度学习已经成功地解决了许多类型的任务。具有高性能和并行性的硬件加速器已经成为实现深度神经网络的主流。为了提高硬件利用率,多个应用程序将共享相同的计算资源。然而,不同的应用程序可能使用不同的深度学习框架并占用不同数量的资源。如果没有与不同框架兼容的调度平台,资源竞争将导致更长的响应时间、内存耗尽和其他错误。当系统的资源不能同时满足所有应用的需求时,如果没有合理的资源管理策略,应用的切换开销就会过大。在本文中,我们提出了一种中间件Furion,它可以减轻单机上深度学习框架的开销。Furion调度任务,重叠不同计算资源的执行,并批量处理未知输入,以提高硬件加速器的利用率。它动态地管理每个应用程序的内存使用,以减轻应用程序切换的开销,并使复杂的模型能够在低端GPU中实现。我们的实验证明,Furion在GTX1060上实现了2.2 -2.7倍的加速。
{"title":"Furion: alleviating overheads for deep learning framework on single machine (work-in-progress)","authors":"L. Jin, Chao Wang, Lei Gong, Chongchong Xu, Yahui Hu, Luchao Tan, Xuehai Zhou","doi":"10.5555/3283568.3283582","DOIUrl":"https://doi.org/10.5555/3283568.3283582","url":null,"abstract":"Deep learning has been successful at solving many kinds of tasks. Hardware accelerators with high performance and parallelism have become mainstream to implement deep neural networks. In order to increase hardware utilization, multiple applications will share the same compute resource. However, different applications may use different deep learning frameworks and occupy different amounts of resources. If there are no scheduling platforms that are compatible with different frameworks, resources competition will result in longer response time, run out of memory, and other errors. When the resources of the system cannot satisfy all the applications at the same time, application switching overhead will be excessive without reasonable resource management strategy.In this paper, we propose Furion - a middleware alleviates overheads for deep learning framework on a single machine. Furion schedules tasks, overlaps the execution of different computing resource, and batches unknown inputs to increase the hardware accelerator utilization. It dynamically manages memory usage for each application to alleviate the overhead of application switching and make a complex model enable implement in a low-end GPU. Our experiment proved that Furion achieves 2.2x-2.7x speedup on the GTX1060.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"166 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122561609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A chip-level security framework for assessing sensor data integrity: work-in-progress 用于评估传感器数据完整性的芯片级安全框架:正在进行的工作
Pub Date : 2018-09-30 DOI: 10.5555/3283568.3283588
Taimour Wehbe, V. Mooney, D. Keezer
The continuously increasing inter-connectivity of sensor nodes that form the basis of the Internet-of-Things results in new avenues of attack exploitable by adversaries to maliciously modify data captured by these nodes. In this work, we present a framework for detecting malicious hardware alterations that attempt to attack state-of-the-art microchips that form these sensor nodes. Specifically, we focus on extremely small Hardware Trojans (HTs) that attempt to modify sensor data right away as the data is received on a state-of-the-art chip fabricated in an untrusted facility. We present a dual-chip approach composed of an untrusted state-of-the-art prover chip and a trusted verifier chip, where the verifier continuously challenges the prover at run-time to ensure correct operation and assess the integrity of the captured data. Our preliminary analysis of the proposed mechanism shows that HT attacks anywhere on the untrusted state-of-the-art chip are detected and flagged preventing maliciously altered data to be transmitted out of the sensor node.
构成物联网基础的传感器节点的互联性不断增加,导致攻击者可以利用新的攻击途径恶意修改这些节点捕获的数据。在这项工作中,我们提出了一个框架,用于检测恶意硬件更改,试图攻击形成这些传感器节点的最先进的微芯片。具体来说,我们关注的是非常小的硬件木马(ht),当数据在不受信任的设施中制造的最先进的芯片上接收时,它们会试图立即修改传感器数据。我们提出了一种双芯片方法,由一个不受信任的最先进的证明者芯片和一个受信任的验证者芯片组成,其中验证者在运行时不断挑战证明者,以确保正确的操作并评估捕获数据的完整性。我们对提议机制的初步分析表明,在不受信任的最先进芯片上的任何地方都可以检测到高温攻击,并标记防止恶意更改的数据从传感器节点传输出来。
{"title":"A chip-level security framework for assessing sensor data integrity: work-in-progress","authors":"Taimour Wehbe, V. Mooney, D. Keezer","doi":"10.5555/3283568.3283588","DOIUrl":"https://doi.org/10.5555/3283568.3283588","url":null,"abstract":"The continuously increasing inter-connectivity of sensor nodes that form the basis of the Internet-of-Things results in new avenues of attack exploitable by adversaries to maliciously modify data captured by these nodes. In this work, we present a framework for detecting malicious hardware alterations that attempt to attack state-of-the-art microchips that form these sensor nodes. Specifically, we focus on extremely small Hardware Trojans (HTs) that attempt to modify sensor data right away as the data is received on a state-of-the-art chip fabricated in an untrusted facility. We present a dual-chip approach composed of an untrusted state-of-the-art prover chip and a trusted verifier chip, where the verifier continuously challenges the prover at run-time to ensure correct operation and assess the integrity of the captured data. Our preliminary analysis of the proposed mechanism shows that HT attacks anywhere on the untrusted state-of-the-art chip are detected and flagged preventing maliciously altered data to be transmitted out of the sensor node.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"73 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127657539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic data management for automotive ECUs with hybrid RAM-NVM memory: work-in-progress 带有混合RAM-NVM内存的汽车ecu动态数据管理:正在进行的工作
Pub Date : 2018-09-30 DOI: 10.5555/3283568.3283573
Jinyu Zhan, Junhuan Yang, Wei Jiang, Yixin Li
Non-Volatile Memory (NVM) can be utilized to improve performance of automotive electronic systems, but frequent writings on NVM will decrease its lifetime. In this paper, we propose a Vehicle Dynamic Data Management (VDDM) scheme, which can distinguish the hot or cold data generated by vehicle Electronic Control Units (ECUs) and sensors, and manage the data to reduce the writing operations on NVM of the hybrid main memory. Experimental results show that VDDM can significantly reduce writing operations and prolong lifetime of NVM compared with other approaches.
非易失性存储器(NVM)可用于提高汽车电子系统的性能,但频繁使用NVM会降低其使用寿命。本文提出了一种车辆动态数据管理(VDDM)方案,该方案能够区分汽车电子控制单元(ecu)和传感器产生的热数据和冷数据,并对数据进行管理,以减少混合主存储器NVM的写入操作。实验结果表明,与其他方法相比,VDDM可以显著减少NVM的写入操作,延长NVM的使用寿命。
{"title":"Dynamic data management for automotive ECUs with hybrid RAM-NVM memory: work-in-progress","authors":"Jinyu Zhan, Junhuan Yang, Wei Jiang, Yixin Li","doi":"10.5555/3283568.3283573","DOIUrl":"https://doi.org/10.5555/3283568.3283573","url":null,"abstract":"Non-Volatile Memory (NVM) can be utilized to improve performance of automotive electronic systems, but frequent writings on NVM will decrease its lifetime. In this paper, we propose a Vehicle Dynamic Data Management (VDDM) scheme, which can distinguish the hot or cold data generated by vehicle Electronic Control Units (ECUs) and sensors, and manage the data to reduce the writing operations on NVM of the hybrid main memory. Experimental results show that VDDM can significantly reduce writing operations and prolong lifetime of NVM compared with other approaches.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130582831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamically utilizing computation accelerators for extensible processors in a software approach 在软件方法中动态利用可扩展处理器的计算加速器
Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629443
Yashuai Lü, Li Shen, Zhiying Wang, Nong Xiao
In recent years, it is increasingly common to see using application specific instruction set processors (ASIPs) in embedded system designs. These ASIPs can offer the ability of customizing hardware computation accelerators for an application domain. Along with instruction set extensions (ISEs), the customized accelerators can significantly improve the performance of embedded processors, which has already been exemplified in previous research work and industrial products. However, these accelerators in ASIPs can only accelerate the applications that are compiled with ISEs. Those applications compiled without ISEs can not benefit from the hardware accelerators at all. In this paper, we propose using software dynamic binary translation to overcome this problem, i.e. dynamically utilizing the accelerators. Unlike a static approach, dynamically utilizing accelerator poses many new problems. This paper comprehensively explores the techniques and design choices for solving these problems, and demonstrates the effectiveness by the results of experiments.
近年来,在嵌入式系统设计中使用特定于应用程序的指令集处理器(asip)越来越普遍。这些api可以提供为应用程序域定制硬件计算加速器的能力。与指令集扩展(ISEs)一起,定制加速器可以显着提高嵌入式处理器的性能,这已经在以前的研究工作和工业产品中得到了例证。然而,apis中的这些加速器只能加速使用ise编译的应用程序。那些没有使用ise编译的应用程序根本无法从硬件加速器中获益。在本文中,我们建议使用软件动态二进制转换来克服这个问题,即动态地利用加速器。与静态方法不同,动态利用加速器会产生许多新的问题。本文全面探讨了解决这些问题的技术和设计选择,并通过实验结果证明了其有效性。
{"title":"Dynamically utilizing computation accelerators for extensible processors in a software approach","authors":"Yashuai Lü, Li Shen, Zhiying Wang, Nong Xiao","doi":"10.1145/1629435.1629443","DOIUrl":"https://doi.org/10.1145/1629435.1629443","url":null,"abstract":"In recent years, it is increasingly common to see using application specific instruction set processors (ASIPs) in embedded system designs. These ASIPs can offer the ability of customizing hardware computation accelerators for an application domain. Along with instruction set extensions (ISEs), the customized accelerators can significantly improve the performance of embedded processors, which has already been exemplified in previous research work and industrial products. However, these accelerators in ASIPs can only accelerate the applications that are compiled with ISEs. Those applications compiled without ISEs can not benefit from the hardware accelerators at all. In this paper, we propose using software dynamic binary translation to overcome this problem, i.e. dynamically utilizing the accelerators. Unlike a static approach, dynamically utilizing accelerator poses many new problems. This paper comprehensively explores the techniques and design choices for solving these problems, and demonstrates the effectiveness by the results of experiments.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"70 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123154989","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Native MPSoC co-simulation environment for software performance estimation 用于软件性能估计的原生MPSoC联合仿真环境
Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629490
P. Gerin, M. M. Hamayun, F. Pétrot
Performance estimation of Multi-Processor System-On-Chip (MPSoC) at a high abstraction level is required in order to perform early architecture exploration and accurate design validations. Although abstract executable models provide interesting functional validation capabilities, they quickly become unsuitable when timing becomes an issue - Native software simulation, a good candidate from the speed point of view, suffers from this issue. In this paper, we present a transactional level simulation environment that allows reliable performance estimation with a specific focus on software timing estimation on multi processor architectures. The embedded software is compiled natively on the host running the simulation and instrumented to reflect its execution on a specific target processor and then executed on a simulation model of the underlying hardware. The key contribution of this work is the use of both static and dynamic analysis, that allow realistic timing measurements in native software simulation. Experimental results show the efficiency of the proposed method to accurately estimate software performance in co-simulation environments.
为了进行早期的架构探索和准确的设计验证,需要在高抽象层次上对多处理器片上系统(MPSoC)进行性能评估。尽管抽象的可执行模型提供了有趣的功能验证功能,但当时间成为问题时,它们很快就变得不合适了——从速度的角度来看,本机软件仿真是一个很好的候选者,但也存在这个问题。在本文中,我们提出了一个事务级仿真环境,该环境允许可靠的性能估计,并特别关注多处理器架构上的软件时序估计。嵌入式软件在运行仿真的主机上进行本机编译,并对其进行检测以反映其在特定目标处理器上的执行,然后在底层硬件的仿真模型上执行。这项工作的关键贡献是使用静态和动态分析,允许在本地软件仿真中进行真实的定时测量。实验结果表明,该方法能够准确估计协同仿真环境下的软件性能。
{"title":"Native MPSoC co-simulation environment for software performance estimation","authors":"P. Gerin, M. M. Hamayun, F. Pétrot","doi":"10.1145/1629435.1629490","DOIUrl":"https://doi.org/10.1145/1629435.1629490","url":null,"abstract":"Performance estimation of Multi-Processor System-On-Chip (MPSoC) at a high abstraction level is required in order to perform early architecture exploration and accurate design validations. Although abstract executable models provide interesting functional validation capabilities, they quickly become unsuitable when timing becomes an issue - Native software simulation, a good candidate from the speed point of view, suffers from this issue.\u0000 In this paper, we present a transactional level simulation environment that allows reliable performance estimation with a specific focus on software timing estimation on multi processor architectures. The embedded software is compiled natively on the host running the simulation and instrumented to reflect its execution on a specific target processor and then executed on a simulation model of the underlying hardware.\u0000 The key contribution of this work is the use of both static and dynamic analysis, that allow realistic timing measurements in native software simulation. Experimental results show the efficiency of the proposed method to accurately estimate software performance in co-simulation environments.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"79 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124283347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Automated technique for design of NoC with minimal communication latency 最小通信延迟NoC设计的自动化技术
Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629499
G. Leary, Karam S. Chatha
Many embedded SoC architectures require minimal on-chip communication latency and jitter. Further, each communication transaction is expected to display some jitter in its start time due to dynamic events. The paper presents a novel synthesis technique that generates an optimized NoC architecture composed of best effort traffic class routers. The technique minimizes both the average packet latency and jitter in the presence of transaction initiation jitter. In comparison to an existing approach the designs generated by our technique demonstrated 41% reduction in average latency, 39% reduction in standard deviation of latency, identical power consumption and 24% increase in router resources.
许多嵌入式SoC架构需要最小的片上通信延迟和抖动。此外,由于动态事件,预计每个通信事务在其开始时间会显示一些抖动。本文提出了一种新的合成技术,生成了由最佳努力流量级路由器组成的优化NoC体系结构。该技术在事务启动抖动存在的情况下将平均数据包延迟和抖动最小化。与现有方法相比,我们的技术产生的设计显示平均延迟减少41%,延迟标准偏差减少39%,功耗相同,路由器资源增加24%。
{"title":"Automated technique for design of NoC with minimal communication latency","authors":"G. Leary, Karam S. Chatha","doi":"10.1145/1629435.1629499","DOIUrl":"https://doi.org/10.1145/1629435.1629499","url":null,"abstract":"Many embedded SoC architectures require minimal on-chip communication latency and jitter. Further, each communication transaction is expected to display some jitter in its start time due to dynamic events. The paper presents a novel synthesis technique that generates an optimized NoC architecture composed of best effort traffic class routers. The technique minimizes both the average packet latency and jitter in the presence of transaction initiation jitter. In comparison to an existing approach the designs generated by our technique demonstrated 41% reduction in average latency, 39% reduction in standard deviation of latency, identical power consumption and 24% increase in router resources.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126756452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Exploiting data-redundancy in reliability-aware networked embedded system design 利用数据冗余在可靠性感知网络嵌入式系统设计中的应用
Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629468
M. Lukasiewycz, M. Glaß, J. Teich
This paper presents a system-level design methodology for networked embedded systems that exploits existing data-redundancy to increase their reliability. The presented approach not only supports a reliability-aware embedded system design from scratch, but also enables the redesign of existing systems to increase the reliability with a minimal communication overhead. The proposed approach contributes (a) algorithms to automatically identify inherent data-redundancy and (b) an automatic design space exploration that is capable of exploiting the revealed data-redundancy. A symbolic analysis is presented that quantifies the reliability of a system, enabling the usage of reliability as one of multiple conflicting optimization objectives. The proposed approach is applied to a realworld case study from the automotive area, showing a significantly increased reliability with a negligible communication overhead.
本文提出了一种利用现有数据冗余来提高网络嵌入式系统可靠性的系统级设计方法。所提出的方法不仅支持从头开始的可靠性感知嵌入式系统设计,而且还允许对现有系统进行重新设计,以最小的通信开销提高可靠性。所提出的方法有助于(a)自动识别固有数据冗余的算法和(b)能够利用揭示的数据冗余的自动设计空间探索。提出了一种量化系统可靠性的符号分析方法,使可靠性成为多个相互冲突的优化目标之一。所提出的方法应用于汽车领域的实际案例研究,显示出在通信开销可以忽略不计的情况下显著提高了可靠性。
{"title":"Exploiting data-redundancy in reliability-aware networked embedded system design","authors":"M. Lukasiewycz, M. Glaß, J. Teich","doi":"10.1145/1629435.1629468","DOIUrl":"https://doi.org/10.1145/1629435.1629468","url":null,"abstract":"This paper presents a system-level design methodology for networked embedded systems that exploits existing data-redundancy to increase their reliability. The presented approach not only supports a reliability-aware embedded system design from scratch, but also enables the redesign of existing systems to increase the reliability with a minimal communication overhead. The proposed approach contributes (a) algorithms to automatically identify inherent data-redundancy and (b) an automatic design space exploration that is capable of exploiting the revealed data-redundancy. A symbolic analysis is presented that quantifies the reliability of a system, enabling the usage of reliability as one of multiple conflicting optimization objectives. The proposed approach is applied to a realworld case study from the automotive area, showing a significantly increased reliability with a negligible communication overhead.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122457181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 8
A monitoring and adaptive routing mechanism for QoS traffic on mesh NoC architectures 一种基于网状NoC架构的QoS流量监控和自适应路由机制
Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629451
Leonel Tedesco, F. Clermidy, F. Moraes
The development of MPSoCs targeting embedded systems with a dynamic workload of applications constitutes an important challenge. The growing number of applications running on these systems produces a considerable utilization of resources, implying a high demand of computation and communication in the different MPSoC parts. The heterogeneity of processing elements brings to the application traffic a dynamic and unpredictable nature, due to the variability on data injection rates. NoCs are the communication infrastructure to be used in such systems, due to its performance, reliability and scalability. Different strategies may be employed to deal with traffic congestion, such as adaptive routing, buffer sizing, and even task migration. The goal of this work is to investigate the use of adaptive routing algorithms, where the path between source and target PEs may be modified due to congestion events. The major part of the state of art proposals have a limited view of NoCs, since each NoC router takes decisions based on few neighbors' congestion status. Such local decision may lead packets to other congested regions, therefore being inefficient. This paper presents a new method, where congestion analysis considers information of all routers in the source-target path. This method relies on a protocol for QoS session establishment, followed by distributed monitoring and re-route to non-congested regions. The set of experiments present results concerning performance and amount of time spent by packets on routers when the proposed method is applied.
针对具有动态应用负载的嵌入式系统开发mpsoc是一个重要的挑战。在这些系统上运行的越来越多的应用程序产生了相当大的资源利用率,这意味着不同MPSoC部分的计算和通信需求很高。由于数据注入速率的可变性,处理元素的异构性给应用程序流量带来了动态性和不可预测性。由于其性能、可靠性和可扩展性,noc是此类系统中使用的通信基础设施。可以采用不同的策略来处理流量拥塞,例如自适应路由、缓冲区大小,甚至任务迁移。这项工作的目标是研究自适应路由算法的使用,其中源和目标pe之间的路径可能由于拥塞事件而被修改。现有提案的主要部分对NoC的看法有限,因为每个NoC路由器根据少数邻居的拥塞状态做出决策。这样的本地决策可能会导致数据包流向其他拥塞区域,因此效率低下。本文提出了一种考虑源-目标路径上所有路由器信息的拥塞分析方法。该方法依赖于建立QoS会话的协议,然后进行分布式监控并重新路由到非拥塞区域。当采用所提出的方法时,这组实验给出了有关性能和数据包在路由器上花费的时间的结果。
{"title":"A monitoring and adaptive routing mechanism for QoS traffic on mesh NoC architectures","authors":"Leonel Tedesco, F. Clermidy, F. Moraes","doi":"10.1145/1629435.1629451","DOIUrl":"https://doi.org/10.1145/1629435.1629451","url":null,"abstract":"The development of MPSoCs targeting embedded systems with a dynamic workload of applications constitutes an important challenge. The growing number of applications running on these systems produces a considerable utilization of resources, implying a high demand of computation and communication in the different MPSoC parts. The heterogeneity of processing elements brings to the application traffic a dynamic and unpredictable nature, due to the variability on data injection rates. NoCs are the communication infrastructure to be used in such systems, due to its performance, reliability and scalability. Different strategies may be employed to deal with traffic congestion, such as adaptive routing, buffer sizing, and even task migration. The goal of this work is to investigate the use of adaptive routing algorithms, where the path between source and target PEs may be modified due to congestion events. The major part of the state of art proposals have a limited view of NoCs, since each NoC router takes decisions based on few neighbors' congestion status. Such local decision may lead packets to other congested regions, therefore being inefficient. This paper presents a new method, where congestion analysis considers information of all routers in the source-target path. This method relies on a protocol for QoS session establishment, followed by distributed monitoring and re-route to non-congested regions. The set of experiments present results concerning performance and amount of time spent by packets on routers when the proposed method is applied.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126571310","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
A compositional modelling framework for exploring MPSoC systems 一个用于探索MPSoC系统的组成建模框架
Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629437
Anders Sejer Tranberg-Hansen, J. Madsen
This paper presents a novel compositional framework for system level performance estimation and exploration of Multi-Processor System On Chip (MPSoC) based systems. The main contributions are the definition of a compositional model which allows quantitative performance estimation to be carried out throughout all design phases ranging from early functional specification to actual cycle accurate and bit true descriptions of the system. This is possible, because a seamless refinement of models is supported by allowing the existence of models described at multiple levels of abstraction to co-exist and communicate. In order to illustrate the use of the framework, a mobile digital audio processing platform, supplied by the company Bang & Olufsen ICEpower a/s, is considered.
本文提出了一种新的组成框架,用于系统级性能评估和基于多处理器片上系统(MPSoC)的探索。主要贡献是定义了一个组合模型,该模型允许在所有设计阶段进行定量性能评估,从早期的功能规格说明到实际周期的精确和真实的系统描述。这是可能的,因为通过允许在多个抽象级别上描述的模型共存和通信,可以支持模型的无缝细化。为了说明该框架的使用,以Bang & Olufsen ICEpower a/s公司提供的移动数字音频处理平台为例。
{"title":"A compositional modelling framework for exploring MPSoC systems","authors":"Anders Sejer Tranberg-Hansen, J. Madsen","doi":"10.1145/1629435.1629437","DOIUrl":"https://doi.org/10.1145/1629435.1629437","url":null,"abstract":"This paper presents a novel compositional framework for system level performance estimation and exploration of Multi-Processor System On Chip (MPSoC) based systems. The main contributions are the definition of a compositional model which allows quantitative performance estimation to be carried out throughout all design phases ranging from early functional specification to actual cycle accurate and bit true descriptions of the system. This is possible, because a seamless refinement of models is supported by allowing the existence of models described at multiple levels of abstraction to co-exist and communicate. In order to illustrate the use of the framework, a mobile digital audio processing platform, supplied by the company Bang & Olufsen ICEpower a/s, is considered.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130120697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Improving application launch times with hybrid disks 使用混合磁盘改进应用程序启动时间
Pub Date : 2009-10-11 DOI: 10.1145/1629435.1629486
Yongsoo Joo, Youngjin Cho, Kyungsoo Lee, N. Chang
Application launch times, which are important to users, are primarily bounded by disk seek times. A solid-state disk has a negligible seek time, but large solid-state disks are not cost-effective. A hybrid disk, consisting of a large disk drive and a flash memory of smaller capacity, can provide a reasonable compromise. However, there is no systematic approach to the allocation of portions of launch sequences to solid-state memory to achieve the shortest application launch time. We show how to reduce application launch times with a hybrid disk with pinning only a small portion of an application launch sequence into flash memory. We model the latency of a hybrid disk, analyze the behavior of application launch sequences, and formulate the choice of the optimal pinned set as an integer linear programming (ILP) problem. Experiments show that this approach reduces application launch times by 15% and 24% on average, while pinning between 5% and 10% of the application launch sequences into flash memory.
对用户来说很重要的应用程序启动时间主要受到磁盘查找时间的限制。固态磁盘的寻道时间可以忽略不计,但是大型固态磁盘的成本效益不高。混合磁盘由一个大磁盘驱动器和一个容量较小的闪存组成,可以提供一个合理的折衷方案。然而,目前还没有系统的方法将部分启动序列分配到固态存储器中,以实现最短的应用程序启动时间。我们将展示如何使用混合磁盘减少应用程序启动时间,只将一小部分应用程序启动序列固定到闪存中。我们建立了一个混合磁盘的延迟模型,分析了应用程序启动序列的行为,并将最优固定集的选择表述为一个整数线性规划(ILP)问题。实验表明,这种方法将应用程序的启动时间平均减少了15%到24%,同时将5%到10%的应用程序启动序列固定在闪存中。
{"title":"Improving application launch times with hybrid disks","authors":"Yongsoo Joo, Youngjin Cho, Kyungsoo Lee, N. Chang","doi":"10.1145/1629435.1629486","DOIUrl":"https://doi.org/10.1145/1629435.1629486","url":null,"abstract":"Application launch times, which are important to users, are primarily bounded by disk seek times. A solid-state disk has a negligible seek time, but large solid-state disks are not cost-effective. A hybrid disk, consisting of a large disk drive and a flash memory of smaller capacity, can provide a reasonable compromise. However, there is no systematic approach to the allocation of portions of launch sequences to solid-state memory to achieve the shortest application launch time. We show how to reduce application launch times with a hybrid disk with pinning only a small portion of an application launch sequence into flash memory. We model the latency of a hybrid disk, analyze the behavior of application launch sequences, and formulate the choice of the optimal pinned set as an integer linear programming (ILP) problem. Experiments show that this approach reduces application launch times by 15% and 24% on average, while pinning between 5% and 10% of the application launch sequences into flash memory.","PeriodicalId":300268,"journal":{"name":"International Conference on Hardware/Software Codesign and System Synthesis","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2009-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131404764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
期刊
International Conference on Hardware/Software Codesign and System Synthesis
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1