首页 > 最新文献

Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)最新文献

英文 中文
High-level software energy macro-modeling 高级软件能量宏建模
T. K. Tan, A. Raghunathan, G. Lakshminarayana, N. Jha
This paper presents an efficient and accurate high-level software energy estimation methodology using the concept of characterization-based macro-modeling. In characterization-based macro-modeling, a function or sub-routine is characterized using an accurate lower-level energy model of the target processor, to construct a macro-model that relates the energy consumed in the function under consideration to various parameters that can be easily observed or calculated from a high-level programming language description. The constructed macro-models eliminate the need for significantly slower instruction-level interpretation or hardware simulation that is required in conventional approaches to software energy estimation. We present two different approaches to macro-modeling for embedded software that offer distinct efficiency-accuracy characteristics: (i) complexity-based macro-modeling, where the variables that determine the algorithmic complexity of the function under consideration are used as macro-modeling parameters, and (ii) profiling-based macro-modeling, where internal profiling statistics for the functions are used as parameters in the energy macro-models. We have experimentally validated our software energy macro-modeling techniques on a wide range of embedded software routines and two different target processor architectures. Our experiments demonstrate that high-level macro-models constructed using the proposed techniques are able to estimate the energy consumption to within 95% accuracy on the average, while commanding speedups of one to five orders-of-magnitude over current instruction-level and architectural energy estimation techniques.
本文利用基于特征的宏观建模概念,提出了一种高效、准确的高级软件能量估计方法。在基于表征的宏建模中,使用目标处理器的精确的低层能量模型来表征函数或子程序,以构建一个宏模型,将所考虑的函数中消耗的能量与各种参数联系起来,这些参数可以很容易地从高级编程语言描述中观察或计算出来。构建的宏观模型消除了在软件能量估计的传统方法中需要的显著较慢的指令级解释或硬件模拟的需要。我们提出了两种不同的嵌入式软件宏建模方法,它们提供了不同的效率-精度特征:(i)基于复杂性的宏建模,其中决定所考虑的函数的算法复杂性的变量被用作宏建模参数;(ii)基于分析的宏建模,其中函数的内部分析统计数据被用作能量宏模型的参数。我们已经在广泛的嵌入式软件例程和两种不同的目标处理器架构上实验验证了我们的软件能量宏建模技术。我们的实验表明,使用所提出的技术构建的高级宏观模型能够平均估计能量消耗在95%以内,同时比当前的指令级和架构能量估计技术的速度提高一到五个数量级。
{"title":"High-level software energy macro-modeling","authors":"T. K. Tan, A. Raghunathan, G. Lakshminarayana, N. Jha","doi":"10.1145/378239.379033","DOIUrl":"https://doi.org/10.1145/378239.379033","url":null,"abstract":"This paper presents an efficient and accurate high-level software energy estimation methodology using the concept of characterization-based macro-modeling. In characterization-based macro-modeling, a function or sub-routine is characterized using an accurate lower-level energy model of the target processor, to construct a macro-model that relates the energy consumed in the function under consideration to various parameters that can be easily observed or calculated from a high-level programming language description. The constructed macro-models eliminate the need for significantly slower instruction-level interpretation or hardware simulation that is required in conventional approaches to software energy estimation. We present two different approaches to macro-modeling for embedded software that offer distinct efficiency-accuracy characteristics: (i) complexity-based macro-modeling, where the variables that determine the algorithmic complexity of the function under consideration are used as macro-modeling parameters, and (ii) profiling-based macro-modeling, where internal profiling statistics for the functions are used as parameters in the energy macro-models. We have experimentally validated our software energy macro-modeling techniques on a wide range of embedded software routines and two different target processor architectures. Our experiments demonstrate that high-level macro-models constructed using the proposed techniques are able to estimate the energy consumption to within 95% accuracy on the average, while commanding speedups of one to five orders-of-magnitude over current instruction-level and architectural energy estimation techniques.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132539557","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 87
Fast power/ground network optimization based on equivalent circuit modeling 基于等效电路建模的快速电源/地网络优化
S. Tan, C. Shi
This paper presents an efficient algorithm for optimizing the area of power or ground networks in integrated circuits subject to the reliability constraints. Instead of solving the original power/ground networks extracted from circuit layouts as previous methods did, the new method first builds the equivalent models for many series resistors in the original networks, then the sequence of linear programming method is used to solve the simplified networks. The solutions of the original networks then are back solved from the optimized, simplified networks. The new algorithm simply exploits the regularities in the power/ground networks. Experimental results show that the complexities of simplified networks are typically significantly smaller than that of the original circuits, which renders the new algorithm extremely fast. For instance, power/ground networks with more than one million branches can be sized in a few minutes on modern SUN workstations.
本文提出了一种基于可靠性约束的集成电路电源网或地网面积优化算法。该方法不像以前的方法那样求解从电路布置图中提取的原始电源/地网络,而是首先在原始网络中建立多个串联电阻的等效模型,然后使用序列线性规划法求解简化网络。然后从优化的、简化的网络中反求原始网络的解。新算法简单地利用了电源/地网络的规律。实验结果表明,简化后的网络复杂度明显小于原始电路的复杂度,这使得新算法的速度非常快。例如,在现代SUN工作站上,拥有超过100万个分支的电源/接地网络可以在几分钟内确定大小。
{"title":"Fast power/ground network optimization based on equivalent circuit modeling","authors":"S. Tan, C. Shi","doi":"10.1145/378239.379021","DOIUrl":"https://doi.org/10.1145/378239.379021","url":null,"abstract":"This paper presents an efficient algorithm for optimizing the area of power or ground networks in integrated circuits subject to the reliability constraints. Instead of solving the original power/ground networks extracted from circuit layouts as previous methods did, the new method first builds the equivalent models for many series resistors in the original networks, then the sequence of linear programming method is used to solve the simplified networks. The solutions of the original networks then are back solved from the optimized, simplified networks. The new algorithm simply exploits the regularities in the power/ground networks. Experimental results show that the complexities of simplified networks are typically significantly smaller than that of the original circuits, which renders the new algorithm extremely fast. For instance, power/ground networks with more than one million branches can be sized in a few minutes on modern SUN workstations.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"235 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122440388","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 59
SATIRE: A new incremental satisfiability engine 讽刺:一个新的增量满意度引擎
J. Whittemore, Joonyoung Kim, K. Sakallah
We introduce SATIRE, a new satisfiability solver that is particularly suited to verification and optimization problems in electronic design automation. SATIRE builds on the most recent advances in satisfiability research, and includes two new features to achieve even higher performance: a facility for incrementally solving sets of related problems, and the ability to handle non-CNF constraints. We provide experimental evidence showing the effectiveness of these additions to classical satisfiability solvers.
我们介绍了一种新的求解器,它特别适合于电子设计自动化中的验证和优化问题。讽刺建立在满意度研究的最新进展,并包括两个新功能,以实现更高的性能:用于增量解决相关问题集的设施,以及处理非cnf约束的能力。我们提供了实验证据,表明这些添加到经典的可满足性求解器的有效性。
{"title":"SATIRE: A new incremental satisfiability engine","authors":"J. Whittemore, Joonyoung Kim, K. Sakallah","doi":"10.1145/378239.379019","DOIUrl":"https://doi.org/10.1145/378239.379019","url":null,"abstract":"We introduce SATIRE, a new satisfiability solver that is particularly suited to verification and optimization problems in electronic design automation. SATIRE builds on the most recent advances in satisfiability research, and includes two new features to achieve even higher performance: a facility for incrementally solving sets of related problems, and the ability to handle non-CNF constraints. We provide experimental evidence showing the effectiveness of these additions to classical satisfiability solvers.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122950071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 198
Automatic generation of application-specific architectures for heterogeneous multiprocessor system-on-chip 异构多处理器片上系统的应用程序特定架构的自动生成
D. Lyonnard, S. Yoo, A. Baghdadi, A. Jerraya
We present a design flow for the generation of application-specific multiprocessor architectures. In the flow, architectural parameters are first extracted from a high-level system specification. Parameters are used to instantiate architectural components, such as processors, memory modules and communication networks. The flow includes the automatic generation of a communication coprocessor that adapts the processor to the communication network in an application-specific way. Experiments with two system examples show the effectiveness of the presented design flow.
我们提出了一个用于生成特定应用程序的多处理器体系结构的设计流程。在流程中,首先从高级系统规范中提取体系结构参数。参数用于实例化架构组件,如处理器、内存模块和通信网络。所述流包括通信协处理器的自动生成,该协处理器以特定于应用程序的方式使所述处理器适应所述通信网络。通过两个系统实例验证了设计流程的有效性。
{"title":"Automatic generation of application-specific architectures for heterogeneous multiprocessor system-on-chip","authors":"D. Lyonnard, S. Yoo, A. Baghdadi, A. Jerraya","doi":"10.1145/378239.379015","DOIUrl":"https://doi.org/10.1145/378239.379015","url":null,"abstract":"We present a design flow for the generation of application-specific multiprocessor architectures. In the flow, architectural parameters are first extracted from a high-level system specification. Parameters are used to instantiate architectural components, such as processors, memory modules and communication networks. The flow includes the automatic generation of a communication coprocessor that adapts the processor to the communication network in an application-specific way. Experiments with two system examples show the effectiveness of the presented design flow.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124134578","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 167
From architecture to layout: partitioned memory synthesis for embedded systems-on-chip 从架构到布局:嵌入式片上系统的分区存储器综合
L. Benini, L. Macchiarulo, A. Macii, E. Macii, M. Poncino
We propose an integrated front-end/back-end flow for the automatic generation of a multi-bank memory architecture for embedded systems. The flow is based on an algorithm for the automatic partitioning of on-chip SRAM. Starting from the dynamic execution profile of an embedded application running on a given processor core, we synthesize a multi-banked SRAM architecture optimally fitted to the execution profile. The partitioning algorithm is integrated with the physical design phase into a complete flow that allows the back-annotation of layout information to drive the partitioning process. Results, collected on a set of embedded applications for the ARM processor, have shown average energy savings around 34%.
我们提出了一个集成的前端/后端流程,用于自动生成嵌入式系统的多银行内存架构。该流程基于片上SRAM的自动分区算法。从运行在给定处理器核心上的嵌入式应用程序的动态执行配置文件开始,我们合成了一个最适合执行配置文件的多银行SRAM架构。分区算法与物理设计阶段集成为一个完整的流程,该流程允许对布局信息进行反向注释来驱动分区过程。在一组ARM处理器的嵌入式应用程序上收集的结果显示,平均节能约为34%。
{"title":"From architecture to layout: partitioned memory synthesis for embedded systems-on-chip","authors":"L. Benini, L. Macchiarulo, A. Macii, E. Macii, M. Poncino","doi":"10.1145/378239.379066","DOIUrl":"https://doi.org/10.1145/378239.379066","url":null,"abstract":"We propose an integrated front-end/back-end flow for the automatic generation of a multi-bank memory architecture for embedded systems. The flow is based on an algorithm for the automatic partitioning of on-chip SRAM. Starting from the dynamic execution profile of an embedded application running on a given processor core, we synthesize a multi-banked SRAM architecture optimally fitted to the execution profile. The partitioning algorithm is integrated with the physical design phase into a complete flow that allows the back-annotation of layout information to drive the partitioning process. Results, collected on a set of embedded applications for the ARM processor, have shown average energy savings around 34%.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"97 18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121239374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
Energy efficient fixed-priority scheduling for real-time systems on variable voltage processors 可变电压处理器上实时系统的高能效固定优先级调度
Gang Quan, X. Hu
Energy consumption has become an increasingly important consideration in designing many real-time embedded systems. Variable voltage processors, if used properly, can dramatically reduce such system energy consumption. In this paper, we present a technique to determine voltage settings for a variable voltage processor that utilizes a fixed priority assignment to schedule jobs. Our approach also produces the minimum constant voltage needed to feasibly schedule the entire job set. Our algorithms lead to significant energy saving compared with previously presented approaches.
在设计实时嵌入式系统时,能耗已成为越来越重要的考虑因素。可变电压处理器,如果使用得当,可以大大减少这样的系统能耗。在本文中,我们提出了一种技术来确定可变电压处理器的电压设置,该处理器利用固定优先级分配来调度作业。我们的方法还产生了可行地调度整个作业集所需的最小恒定电压。与以前提出的方法相比,我们的算法可以显著节省能源。
{"title":"Energy efficient fixed-priority scheduling for real-time systems on variable voltage processors","authors":"Gang Quan, X. Hu","doi":"10.1145/378239.379074","DOIUrl":"https://doi.org/10.1145/378239.379074","url":null,"abstract":"Energy consumption has become an increasingly important consideration in designing many real-time embedded systems. Variable voltage processors, if used properly, can dramatically reduce such system energy consumption. In this paper, we present a technique to determine voltage settings for a variable voltage processor that utilizes a fixed priority assignment to schedule jobs. Our approach also produces the minimum constant voltage needed to feasibly schedule the entire job set. Our algorithms lead to significant energy saving compared with previously presented approaches.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117042142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 235
Reducing the frequency gap between ASIC and custom designs: a custom perspective 减少ASIC和定制设计之间的频率差距:一个定制的视角
S. E. Rich, Matthew J. Parker, Jim Schwartz
This paper proposes that the ability to control the difference between the simulated and actual frequencies of a design is a key strategy to achieving high frequency in both ASIC and custom designs. We examine this principle and the methodologies that can be deployed to manage this gap.
本文提出控制设计的模拟频率和实际频率之间的差异的能力是在ASIC和定制设计中实现高频的关键策略。我们将研究这一原则和可用于管理这一差距的方法。
{"title":"Reducing the frequency gap between ASIC and custom designs: a custom perspective","authors":"S. E. Rich, Matthew J. Parker, Jim Schwartz","doi":"10.1145/378239.378548","DOIUrl":"https://doi.org/10.1145/378239.378548","url":null,"abstract":"This paper proposes that the ability to control the difference between the simulated and actual frequencies of a design is a key strategy to achieving high frequency in both ASIC and custom designs. We examine this principle and the methodologies that can be deployed to manage this gap.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115492652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 14
Speeding up control-dominated applications through microarchitectural customizations in embedded processors 通过嵌入式处理器中的微架构定制加速控制主导的应用程序
Peter Petrov, A. Orailoglu
We present a methodology for microarchitectural customization of embedded processors by exploiting application information, thus attaining the twin benefits of processor standardization and application-specific customization. Such powerful techniques enable increased application fragments to be placed on the processor, with no sacrifice in system requirements, thus reducing the custom hardware and the concomitant area requirements in SOCs. We illustrate these ideas through the branch resolution problem, known to impose severe performance degradation on control-dominated embedded applications. A low-cost late customizable hardware that uses application information to fold out a set of frequently executed branches is described. Experimental results show that for a representative set of control dominated applications a reduction in the range of 7%-22% in processor cycles can be achieved, thus extending the scope of low-cost embedded processors in complex co-designs for control intensive systems.
我们提出了一种利用应用信息对嵌入式处理器进行微架构定制的方法,从而实现了处理器标准化和特定应用定制的双重好处。这种强大的技术可以将更多的应用程序片段放在处理器上,而不会牺牲系统需求,从而减少了soc中的自定义硬件和伴随的区域需求。我们通过分支解析问题来说明这些思想,已知分支解析问题会对控制为主的嵌入式应用程序造成严重的性能下降。描述了一种低成本的后期可定制硬件,它使用应用程序信息折叠出一组经常执行的分支。实验结果表明,对于一组具有代表性的控制主导应用,处理器周期可以减少7%-22%,从而扩大了低成本嵌入式处理器在控制密集型系统复杂协同设计中的范围。
{"title":"Speeding up control-dominated applications through microarchitectural customizations in embedded processors","authors":"Peter Petrov, A. Orailoglu","doi":"10.1145/378239.379014","DOIUrl":"https://doi.org/10.1145/378239.379014","url":null,"abstract":"We present a methodology for microarchitectural customization of embedded processors by exploiting application information, thus attaining the twin benefits of processor standardization and application-specific customization. Such powerful techniques enable increased application fragments to be placed on the processor, with no sacrifice in system requirements, thus reducing the custom hardware and the concomitant area requirements in SOCs. We illustrate these ideas through the branch resolution problem, known to impose severe performance degradation on control-dominated embedded applications. A low-cost late customizable hardware that uses application information to fold out a set of frequently executed branches is described. Experimental results show that for a representative set of control dominated applications a reduction in the range of 7%-22% in processor cycles can be achieved, thus extending the scope of low-cost embedded processors in complex co-designs for control intensive systems.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126160154","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Reducing memory requirements of nested loops for embedded systems 减少嵌入式系统嵌套循环的内存需求
J. Ramanujam, Jinpyo Hong, M. Kandemir, A. Narayan
Most embedded systems have limited amount of memory. In contrast, the memory requirements of code (in particular loops) running on embedded systems is significant. This paper addresses the problem of estimating the amount of memory needed for transfers of data in embedded systems. The problem of estimating the region associated with a statement or the set of elements referenced by a statement during the execution of the entire set of nested loops is analyzed. A quantitative analysis of the number of elements referenced is presented; exact expressions for uniformly generated references and a close upper and lower bound for non-uniformly generated references are derived. In addition to presenting an algorithm that computes the total memory required, we discuss the effect of transformations on the lifetimes of array variables, i.e., the time between the first and last accesses to a given array location. A detailed analysis on the effect of unimodular transformations on data locality including the calculation of the maximum window size is discussed. The term maximum window size is introduced and quantitative expressions are derived to compute the window size. The smaller the value of the maximum window size, the higher the amount of data locality in the loop.
大多数嵌入式系统的内存都是有限的。相比之下,在嵌入式系统上运行的代码(特别是循环)的内存需求非常大。本文解决了在嵌入式系统中估计数据传输所需的内存量的问题。分析了在整个嵌套循环集的执行过程中,估计与语句或语句引用的元素集相关的区域的问题。对参考元素的数量进行了定量分析;导出了均匀生成引用的精确表达式和非均匀生成引用的接近上界和下界。除了介绍计算所需总内存的算法外,我们还讨论了转换对数组变量生命周期的影响,即对给定数组位置的第一次和最后一次访问之间的时间。详细分析了非模变换对数据局部性的影响,包括最大窗口大小的计算。引入了“最大窗口尺寸”一词,导出了计算窗口尺寸的定量表达式。最大窗口大小的值越小,循环中的数据局部性量就越高。
{"title":"Reducing memory requirements of nested loops for embedded systems","authors":"J. Ramanujam, Jinpyo Hong, M. Kandemir, A. Narayan","doi":"10.1145/378239.378523","DOIUrl":"https://doi.org/10.1145/378239.378523","url":null,"abstract":"Most embedded systems have limited amount of memory. In contrast, the memory requirements of code (in particular loops) running on embedded systems is significant. This paper addresses the problem of estimating the amount of memory needed for transfers of data in embedded systems. The problem of estimating the region associated with a statement or the set of elements referenced by a statement during the execution of the entire set of nested loops is analyzed. A quantitative analysis of the number of elements referenced is presented; exact expressions for uniformly generated references and a close upper and lower bound for non-uniformly generated references are derived. In addition to presenting an algorithm that computes the total memory required, we discuss the effect of transformations on the lifetimes of array variables, i.e., the time between the first and last accesses to a given array location. A detailed analysis on the effect of unimodular transformations on data locality including the calculation of the maximum window size is discussed. The term maximum window size is introduced and quantitative expressions are derived to compute the window size. The smaller the value of the maximum window size, the higher the amount of data locality in the loop.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127473041","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 44
High-quality operation binding for clustered VLIW datapaths 群集VLIW数据路径的高质量操作绑定
V. Lapinskii, M. Jacome, G. Veciana
Clustering is an effective method to increase the available parallelism in VLIW datapaths without incurring severe penalties associated with large numbers of register file ports. Efficient utilization of a clustered datapath requires careful binding of operations to clusters. The paper proposes a binding algorithm that effectively explores tradeoffs between in-cluster operation serialization and delays associated with data transfers between clusters. Extensive experimental evidence is provided showing that the algorithm generates high quality solutions for basic blocks, with up to 29% improvement over a state-of-the-art advanced binding algorithm.
聚类是一种有效的方法,可以增加VLIW数据路径中的可用并行性,而不会产生与大量注册文件端口相关的严重损失。有效地利用集群数据路径需要小心地将操作绑定到集群。本文提出了一种绑定算法,该算法有效地探索了集群内操作序列化与集群间数据传输相关延迟之间的权衡。大量的实验证据表明,该算法为基本块生成高质量的解决方案,比最先进的高级绑定算法提高了29%。
{"title":"High-quality operation binding for clustered VLIW datapaths","authors":"V. Lapinskii, M. Jacome, G. Veciana","doi":"10.1145/378239.379051","DOIUrl":"https://doi.org/10.1145/378239.379051","url":null,"abstract":"Clustering is an effective method to increase the available parallelism in VLIW datapaths without incurring severe penalties associated with large numbers of register file ports. Efficient utilization of a clustered datapath requires careful binding of operations to clusters. The paper proposes a binding algorithm that effectively explores tradeoffs between in-cluster operation serialization and delays associated with data transfers between clusters. Extensive experimental evidence is provided showing that the algorithm generates high quality solutions for basic blocks, with up to 29% improvement over a state-of-the-art advanced binding algorithm.","PeriodicalId":154316,"journal":{"name":"Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-06-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125667455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 12
期刊
Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1