首页 > 最新文献

2017 30th IEEE International System-on-Chip Conference (SOCC)最新文献

英文 中文
Content-aware line-based power modeling methodology for image signal processor 基于内容感知线的图像信号处理器功率建模方法
Pub Date : 2017-12-18 DOI: 10.1109/SOCC.2017.8226075
Chun-Wei Chen, Ming-Der Shieh, Juin-Ming Lu, Hsun-Lun Huang, Yao-Hua Chen
Early power modeling and analysis using electronic system-level methodology enables designers to explore energy saving opportunities more efficiently at a higher abstraction level. However, power modeling for third party IPs are challenging due to the limited observability and unknown architecture details. To model the data dependency for blackbox IPs, several works rely on adopting Hamming distance of input data to approximate the switching activity, which might be not enough for modeling complex IPs such as image signal processors (ISP). This work introduces a content-aware line-based power modeling method for ISP by training an associated energy table. To effectively estimate ISP energy consumption which involves many two-dimensional data processing, this work presents a direct energy-mapping strategy using pixel luminance and gradient. Moreover, an iterative box-constrained least-squares estimation and its associated constraint refinement scheme is proposed to increase the robustness of the trained energy table even with limited training data. Simulation results show that the proposed method can reduce at least 11.54% of average error and 55.52% of max error as compared to the existing content-blind power model.
使用电子系统级方法的早期功率建模和分析使设计人员能够在更高的抽象级别上更有效地探索节能机会。然而,由于有限的可观察性和未知的架构细节,第三方ip的功率建模是具有挑战性的。为了对黑盒ip的数据依赖性进行建模,一些工作依赖于采用输入数据的汉明距离来近似切换活动,这可能不足以对图像信号处理器(ISP)等复杂ip进行建模。本文通过训练相关的能量表,介绍了一种基于内容感知的基于线路的ISP功率建模方法。为了有效估计涉及许多二维数据处理的ISP能量消耗,本文提出了一种使用像素亮度和梯度的直接能量映射策略。此外,提出了一种迭代盒约束最小二乘估计及其相关的约束改进方案,以提高训练能量表在训练数据有限的情况下的鲁棒性。仿真结果表明,与现有的内容盲功率模型相比,所提方法可将平均误差降低11.54%,最大误差降低55.52%。
{"title":"Content-aware line-based power modeling methodology for image signal processor","authors":"Chun-Wei Chen, Ming-Der Shieh, Juin-Ming Lu, Hsun-Lun Huang, Yao-Hua Chen","doi":"10.1109/SOCC.2017.8226075","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226075","url":null,"abstract":"Early power modeling and analysis using electronic system-level methodology enables designers to explore energy saving opportunities more efficiently at a higher abstraction level. However, power modeling for third party IPs are challenging due to the limited observability and unknown architecture details. To model the data dependency for blackbox IPs, several works rely on adopting Hamming distance of input data to approximate the switching activity, which might be not enough for modeling complex IPs such as image signal processors (ISP). This work introduces a content-aware line-based power modeling method for ISP by training an associated energy table. To effectively estimate ISP energy consumption which involves many two-dimensional data processing, this work presents a direct energy-mapping strategy using pixel luminance and gradient. Moreover, an iterative box-constrained least-squares estimation and its associated constraint refinement scheme is proposed to increase the robustness of the trained energy table even with limited training data. Simulation results show that the proposed method can reduce at least 11.54% of average error and 55.52% of max error as compared to the existing content-blind power model.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"22 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125782386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Designing bio-inspired autonomous error-tolerant massively parallel computing architectures 设计仿生自主容错大规模并行计算架构
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8226057
Lizheng Liu, Yi Jin, Yi Liu, Ning Ma, Z. Zou, Lirong Zheng
The scalable and massively parallel computing systems composed of many processors, which are connected on chips that will become more and more complex and unreliable. This paper presents a bio-inspired error tolerance framework and three design principles based on the Autonomous Error Tolerant (AET) architecture. A nearby error perception mechanism is carefully designed to detect faults and an initiative evolutions strategy is studied to handle unrecoverable errors. A circuit backup mechanism is proposed for generating an effective way by setting the routing rules to bypass the failed link or node to achieve fault tolerance capabilities. The print circuit board (PCB) prototype is designed and implemented based on a reconfigurable and scalable control-centric dual-core embedded processor (ReSC). Different testing programs associating fault-detection or self-backup schemes and routing algorithms are explored in the platform. Experimental results show that error perceptron can detect the faults and reassign the task for other remaining free and healthy AET cell through Network-on-chip (NoC) when faults occur at the AET cell. The system can complete error recovery within 3 seconds, the paper shows the error-tolerant capability of the proposed architecture is better than the conventional multi-modular redundant system.
由许多处理器组成的可扩展和大规模并行计算系统将变得越来越复杂和不可靠。提出了一种基于自主容错(AET)体系结构的仿生容错框架和三种设计原则。设计了就近错误感知机制来检测故障,研究了主动进化策略来处理不可恢复的错误。提出了一种电路备份机制,通过设置路由规则绕过故障链路或节点来产生一种有效的方式,以达到容错的目的。印刷电路板(PCB)原型是基于可重构和可扩展的控制中心双核嵌入式处理器(ReSC)设计和实现的。不同的测试程序关联故障检测或自备份方案和路由算法在平台上进行了探索。实验结果表明,当AET细胞发生故障时,错误感知器可以检测到故障,并通过片上网络(Network-on-chip, NoC)将任务重新分配给其他空闲的健康AET细胞。系统可以在3秒内完成错误恢复,表明该结构的容错能力优于传统的多模块冗余系统。
{"title":"Designing bio-inspired autonomous error-tolerant massively parallel computing architectures","authors":"Lizheng Liu, Yi Jin, Yi Liu, Ning Ma, Z. Zou, Lirong Zheng","doi":"10.1109/SOCC.2017.8226057","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226057","url":null,"abstract":"The scalable and massively parallel computing systems composed of many processors, which are connected on chips that will become more and more complex and unreliable. This paper presents a bio-inspired error tolerance framework and three design principles based on the Autonomous Error Tolerant (AET) architecture. A nearby error perception mechanism is carefully designed to detect faults and an initiative evolutions strategy is studied to handle unrecoverable errors. A circuit backup mechanism is proposed for generating an effective way by setting the routing rules to bypass the failed link or node to achieve fault tolerance capabilities. The print circuit board (PCB) prototype is designed and implemented based on a reconfigurable and scalable control-centric dual-core embedded processor (ReSC). Different testing programs associating fault-detection or self-backup schemes and routing algorithms are explored in the platform. Experimental results show that error perceptron can detect the faults and reassign the task for other remaining free and healthy AET cell through Network-on-chip (NoC) when faults occur at the AET cell. The system can complete error recovery within 3 seconds, the paper shows the error-tolerant capability of the proposed architecture is better than the conventional multi-modular redundant system.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"392 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126748216","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A constant bandwidth switched-capacitor programmable-gain amplifier utilizing adaptive miller compensation technique 一种采用自适应米勒补偿技术的恒带宽开关电容可编程增益放大器
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8226051
Hyunjong Kim, Yujin Park, Han Yang, Suhwan Kim
This paper presents a constant bandwidth switched-capacitor programmable-gain amplifier (SC-PGA). By using an adaptive Miller compensation technique for the SC-PGA, our SC-PGA achieves low power consumption and high linearity at various gain conditions. The post-layout simulation results with 0.18 μm CMOS process show that power efficiency is tripled over the SC-PGA without the adaptive Miller compensation technique at 12 V/V gain without degrading performance. Power consumption is 2.8 mW at 3.3 V analog and 1.8 V digital supply voltage.
提出了一种恒带宽开关电容可编程增益放大器(SC-PGA)。通过对SC-PGA采用自适应米勒补偿技术,我们的SC-PGA在各种增益条件下实现了低功耗和高线性度。采用0.18 μm CMOS工艺的布局后仿真结果表明,在12 V/V增益下,与未采用自适应米勒补偿技术的SC-PGA相比,功率效率提高了三倍,且性能没有下降。在3.3 V模拟电压和1.8 V数字电源电压下,功耗为2.8 mW。
{"title":"A constant bandwidth switched-capacitor programmable-gain amplifier utilizing adaptive miller compensation technique","authors":"Hyunjong Kim, Yujin Park, Han Yang, Suhwan Kim","doi":"10.1109/SOCC.2017.8226051","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226051","url":null,"abstract":"This paper presents a constant bandwidth switched-capacitor programmable-gain amplifier (SC-PGA). By using an adaptive Miller compensation technique for the SC-PGA, our SC-PGA achieves low power consumption and high linearity at various gain conditions. The post-layout simulation results with 0.18 μm CMOS process show that power efficiency is tripled over the SC-PGA without the adaptive Miller compensation technique at 12 V/V gain without degrading performance. Power consumption is 2.8 mW at 3.3 V analog and 1.8 V digital supply voltage.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"431 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122873740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Approximate compressed sensing for hardware-efficient image compression 用于硬件高效图像压缩的近似压缩感知
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8226074
S. Kadiyala, V. Pudi, S. Lam
Recently, compressive sensing has attracted a lot of research interest due to its potential for realizing lightweight image compression solutions. Approximate or inexact computing on the other hand has been successfully applied to lower the complexity of hardware architectures for applications where a certain amount of performance degradation is acceptable (e.g. lossy image compression). In our work, we present a novel method for compressive sensing using approximate computing paradigm, in order to realize a hardware-efficient image compression architecture. We adopt Gaussian Random matrix based compression in our work. Library based pruning is used to realize the approximate compression architecture. Further we present a multi-objective optimization method to fine tune our pruning and increase performance of architecture. When compared to the baseline architecture that uses regular multipliers on 65-nm CMOS technology, our proposed image compression architecture achieves 43% area and 54% power savings with minimal PSNR degradation.
最近,压缩感知因其实现轻量级图像压缩解决方案的潜力而引起了许多研究兴趣。另一方面,近似或不精确计算已经成功地应用于降低硬件架构的复杂性的应用程序,其中一定量的性能下降是可以接受的(例如有损图像压缩)。在我们的工作中,我们提出了一种使用近似计算范式的压缩感知新方法,以实现硬件高效的图像压缩架构。我们采用基于高斯随机矩阵的压缩方法。采用基于库的剪枝来实现近似压缩结构。在此基础上,提出了一种多目标优化方法来微调剪枝,提高结构的性能。与在65纳米CMOS技术上使用常规乘法器的基准架构相比,我们提出的图像压缩架构在最小的PSNR下降的情况下实现了43%的面积和54%的功耗节约。
{"title":"Approximate compressed sensing for hardware-efficient image compression","authors":"S. Kadiyala, V. Pudi, S. Lam","doi":"10.1109/SOCC.2017.8226074","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226074","url":null,"abstract":"Recently, compressive sensing has attracted a lot of research interest due to its potential for realizing lightweight image compression solutions. Approximate or inexact computing on the other hand has been successfully applied to lower the complexity of hardware architectures for applications where a certain amount of performance degradation is acceptable (e.g. lossy image compression). In our work, we present a novel method for compressive sensing using approximate computing paradigm, in order to realize a hardware-efficient image compression architecture. We adopt Gaussian Random matrix based compression in our work. Library based pruning is used to realize the approximate compression architecture. Further we present a multi-objective optimization method to fine tune our pruning and increase performance of architecture. When compared to the baseline architecture that uses regular multipliers on 65-nm CMOS technology, our proposed image compression architecture achieves 43% area and 54% power savings with minimal PSNR degradation.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129940663","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
A 590MDE/s semi-global matching processor with lossless data compression 具有无损数据压缩功能的590MDE/s半全局匹配处理器
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8225998
Kyeongryeol Bong, K. Lee, H. Yoo
A tile-based semi-global matching (SGM) processor with lossless data compression is proposed. The 8×8 tile-base processing and the P2-less data compression can reduce the external memory access by 85% without any change in the processing result. In addition, the P2-less data compression can decrease on-chip SRAM size by 50%. Implemented in 65nm CMOS technology, the 6.3mm2 chip consumes 288mW and supports 590MDE/s (million disparity estimation per second) when processing 640×360 resolution with 64-disparity range at 40fps real-time operation.
提出了一种基于块的半全局匹配(SGM)数据无损压缩处理器。8×8平铺处理和P2-less数据压缩可以在不改变处理结果的情况下减少85%的外部内存访问。此外,p_2 -less数据压缩可以将片上SRAM大小减少50%。这款6.3mm2的芯片采用65nm CMOS技术,功耗288mW,在处理640×360分辨率、64视差范围、40fps实时操作时,支持590MDE/s(每秒百万视差估计)。
{"title":"A 590MDE/s semi-global matching processor with lossless data compression","authors":"Kyeongryeol Bong, K. Lee, H. Yoo","doi":"10.1109/SOCC.2017.8225998","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8225998","url":null,"abstract":"A tile-based semi-global matching (SGM) processor with lossless data compression is proposed. The 8×8 tile-base processing and the P2-less data compression can reduce the external memory access by 85% without any change in the processing result. In addition, the P2-less data compression can decrease on-chip SRAM size by 50%. Implemented in 65nm CMOS technology, the 6.3mm2 chip consumes 288mW and supports 590MDE/s (million disparity estimation per second) when processing 640×360 resolution with 64-disparity range at 40fps real-time operation.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126705363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A novel power reduction technique using wire multiplexing 一种采用线复用的新型功率降低技术
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8226026
Mostafa Said, Hossam Hassan, Hyungwon Kim, Mostafa Khamis
Power consumption reduction is a very critical challenge in nowadays nanoscale circuits. In this paper, a new power reduction approach is demonstrated. This approach is originally based on the idea of TSV multiplexing in 3D-ICs where two or more signals can flow through one TSV instead of multiple TSVs. Based on that behavior, the possibility of power reduction of this circuit is discovered and its generalization to any wire, i.e., wire multiplexing, is detailed. Also, an analytical power model for this circuit is developed to predict its power consumption behavior. Further, and by means of Cadence-Spectre simulations on 65 nm technology and also using the developed analytical model, the power reduction of wire multiplexing technique could be proved and verified.
在当今的纳米级电路中,降低功耗是一个非常关键的挑战。本文提出了一种新的降低功率的方法。这种方法最初是基于3d - ic中TSV多路复用的想法,其中两个或多个信号可以通过一个TSV而不是多个TSV流动。在此基础上,发现了该电路降低功耗的可能性,并将其推广到任何导线,即导线多路复用。此外,还建立了该电路的解析功率模型来预测其功耗行为。此外,通过在65纳米技术上的Cadence-Spectre仿真和所建立的分析模型,可以证明和验证线复用技术的功耗降低。
{"title":"A novel power reduction technique using wire multiplexing","authors":"Mostafa Said, Hossam Hassan, Hyungwon Kim, Mostafa Khamis","doi":"10.1109/SOCC.2017.8226026","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226026","url":null,"abstract":"Power consumption reduction is a very critical challenge in nowadays nanoscale circuits. In this paper, a new power reduction approach is demonstrated. This approach is originally based on the idea of TSV multiplexing in 3D-ICs where two or more signals can flow through one TSV instead of multiple TSVs. Based on that behavior, the possibility of power reduction of this circuit is discovered and its generalization to any wire, i.e., wire multiplexing, is detailed. Also, an analytical power model for this circuit is developed to predict its power consumption behavior. Further, and by means of Cadence-Spectre simulations on 65 nm technology and also using the developed analytical model, the power reduction of wire multiplexing technique could be proved and verified.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126460899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
FPGA-based CNN inference accelerator synthesized from multi-threaded C software 基于fpga的CNN推理加速器,由多线程C软件合成
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8226056
Jin Hee Kim, Brett Grady, Ruolong Lian, J. Brothers, J. Anderson
A deep-learning inference accelerator is synthesized from a C-language software program parallelized with Pthreads. The software implementation uses the well-known producer/consumer model with parallel threads interconnected by FIFO queues. The LegUp high-level synthesis (HLS) [1] tool synthesizes threads into parallel FPGA hardware, translating software parallelism into spatial parallelism. A complete system is generated where convolution, pooling and padding are realized in the synthesized accelerator, with remaining tasks executing on an embedded ARM processor. The accelerator incorporates reduced precision, and a novel approach for zero-weight-skipping in convolution. On a mid-sized Intel Arria 10 SoC FPGA, peak performance on VGG-16 is 138 effective GOPS.
利用并行化的c语言软件程序合成了一个深度学习推理加速器。软件实现使用著名的生产者/消费者模型,并行线程通过FIFO队列相互连接。LegUp高级综合(HLS)[1]工具将线程合成为并行FPGA硬件,将软件并行性转化为空间并行性。生成了一个完整的系统,其中卷积、池化和填充在合成加速器中实现,其余任务在嵌入式ARM处理器上执行。该加速器结合了降低的精度,以及一种在卷积中实现零权重跳跃的新方法。在中型英特尔Arria 10 SoC FPGA上,VGG-16的峰值性能为138有效GOPS。
{"title":"FPGA-based CNN inference accelerator synthesized from multi-threaded C software","authors":"Jin Hee Kim, Brett Grady, Ruolong Lian, J. Brothers, J. Anderson","doi":"10.1109/SOCC.2017.8226056","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226056","url":null,"abstract":"A deep-learning inference accelerator is synthesized from a C-language software program parallelized with Pthreads. The software implementation uses the well-known producer/consumer model with parallel threads interconnected by FIFO queues. The LegUp high-level synthesis (HLS) [1] tool synthesizes threads into parallel FPGA hardware, translating software parallelism into spatial parallelism. A complete system is generated where convolution, pooling and padding are realized in the synthesized accelerator, with remaining tasks executing on an embedded ARM processor. The accelerator incorporates reduced precision, and a novel approach for zero-weight-skipping in convolution. On a mid-sized Intel Arria 10 SoC FPGA, peak performance on VGG-16 is 138 effective GOPS.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124112243","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 38
BlooXY: On a non-invasive blood monitor for the IoT context 在物联网环境下的非侵入性血液监测仪上
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8226000
Daniel Flórez, Martha Johanna Sepúlveda
Cardiovascular Diseases (CVDs) are a major concern. They are responsible for 35% of deaths and for costs of billions of dollars worldwide. Prevention of CVDs has become a global priority. Comprehensive use of wearable devices operating in the context of Internet-of-Things (IoT) paradigm is the key to monitor, diagnose and treat CVDs. Most of the previous approaches propose wearables only for non-invasive blood pressure and heart rate monitoring. However, in order to improve the quality of the detection and prevention of CVDs, this measurements must be combined with oximeter monitoring (SPO2). In this work we propose BlooXY, a wearable device that operates in the context of IoT to measure the blood pressure, oximetry and heart rate. We show that BlooXY is an efficient aid in the prevention, control and treatment of CVDs.
心血管疾病(cvd)是一个主要问题。它们造成了35%的死亡,并在全球造成了数十亿美元的损失。预防心血管疾病已成为全球优先事项。在物联网(IoT)模式下全面使用可穿戴设备是监测、诊断和治疗心血管疾病的关键。之前的大多数方法都建议可穿戴设备仅用于非侵入性血压和心率监测。然而,为了提高cvd的检测和预防质量,这种测量必须与血氧仪监测(SPO2)相结合。在这项工作中,我们提出了BlooXY,这是一种在物联网环境下运行的可穿戴设备,用于测量血压、血氧仪和心率。我们发现,BlooXY是预防、控制和治疗心血管疾病的有效辅助药物。
{"title":"BlooXY: On a non-invasive blood monitor for the IoT context","authors":"Daniel Flórez, Martha Johanna Sepúlveda","doi":"10.1109/SOCC.2017.8226000","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226000","url":null,"abstract":"Cardiovascular Diseases (CVDs) are a major concern. They are responsible for 35% of deaths and for costs of billions of dollars worldwide. Prevention of CVDs has become a global priority. Comprehensive use of wearable devices operating in the context of Internet-of-Things (IoT) paradigm is the key to monitor, diagnose and treat CVDs. Most of the previous approaches propose wearables only for non-invasive blood pressure and heart rate monitoring. However, in order to improve the quality of the detection and prevention of CVDs, this measurements must be combined with oximeter monitoring (SPO2). In this work we propose BlooXY, a wearable device that operates in the context of IoT to measure the blood pressure, oximetry and heart rate. We show that BlooXY is an efficient aid in the prevention, control and treatment of CVDs.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114991318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Region based cache coherence for tiled MPSoCs 基于区域的平铺mpsoc缓存一致性
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8226059
A. Srivatsa, Sven Rheindt, Thomas Wild, A. Herkersdorf
The need for faster and more energy efficient computing has led us to the multicore era with distributed shared memory hierarchies. The primary goal is to distribute parallel tasks onto multiple processing elements to collectively achieve shorter execution times at lower frequencies and supply voltages when compared to a single-core architecture. Major challenges of this approach are how to achieve local, low latency memory accesses and low overheads for coherence and synchronization management. We believe that enabling global coherence in tiled many-core architectures does not scale in a cost efficient manner and isn't even required for applications with limited degrees of parallelism. In this paper, we propose a novel region based cache coherence scheme, where coherence is provided by hardware directories within a flexibly sized but confined set of compute and memory tiles. We also show that data placement and task mapping have a huge impact on the application performance, and hence should be considered in conjunction with region based coherence. The approach is evaluated by means of a high level simulation model using workloads from PARSEC. Experiments demonstrate that our region based approach with multiple compute tiles increases performance by a factor of up to 2.5 compared to a single tile structure with nominally identical computing and memory resources. Thus the independent local memory accesses, which are effectively increasing the memory bandwidth, usually outweigh the penalties of inter-tile remote memory accesses. Our approach also reduces the directory structures significantly compared to traditional schemes, making it scalable for large MPSoCs (eg. by 41.4% for a 16 tile system with 4 tiles per region). Considering data-to-task-placement, our investigations show that it can lead to performance variations up to a factor of 12.7.
对更快、更节能的计算的需求将我们带到了多核时代,它具有分布式共享内存层次结构。与单核架构相比,其主要目标是将并行任务分配到多个处理元素上,从而在较低的频率和供电电压下共同实现更短的执行时间。这种方法的主要挑战是如何实现本地,低延迟内存访问和低开销的一致性和同步管理。我们认为,在平铺式多核架构中启用全局一致性并不能以一种低成本的方式进行扩展,甚至对于具有有限并行度的应用程序来说也是不必要的。在本文中,我们提出了一种新的基于区域的缓存一致性方案,其中一致性由硬件目录在灵活大小但受限制的计算和内存块集内提供。我们还表明,数据放置和任务映射对应用程序性能有巨大的影响,因此应该与基于区域的一致性一起考虑。通过使用PARSEC工作负载的高级仿真模型对该方法进行了评估。实验表明,与具有相同计算和内存资源的单一计算块结构相比,我们的基于区域的多计算块方法的性能提高了2.5倍。因此,独立的本地内存访问(有效地增加了内存带宽)通常比层间远程内存访问带来的损失要大。与传统方案相比,我们的方法还大大减少了目录结构,使其可扩展到大型mpsoc(例如。对于16块瓷砖的系统(每个区域4块瓷砖),降低41.4%。考虑到数据到任务的放置,我们的调查表明,它可能导致性能变化高达12.7倍。
{"title":"Region based cache coherence for tiled MPSoCs","authors":"A. Srivatsa, Sven Rheindt, Thomas Wild, A. Herkersdorf","doi":"10.1109/SOCC.2017.8226059","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226059","url":null,"abstract":"The need for faster and more energy efficient computing has led us to the multicore era with distributed shared memory hierarchies. The primary goal is to distribute parallel tasks onto multiple processing elements to collectively achieve shorter execution times at lower frequencies and supply voltages when compared to a single-core architecture. Major challenges of this approach are how to achieve local, low latency memory accesses and low overheads for coherence and synchronization management. We believe that enabling global coherence in tiled many-core architectures does not scale in a cost efficient manner and isn't even required for applications with limited degrees of parallelism. In this paper, we propose a novel region based cache coherence scheme, where coherence is provided by hardware directories within a flexibly sized but confined set of compute and memory tiles. We also show that data placement and task mapping have a huge impact on the application performance, and hence should be considered in conjunction with region based coherence. The approach is evaluated by means of a high level simulation model using workloads from PARSEC. Experiments demonstrate that our region based approach with multiple compute tiles increases performance by a factor of up to 2.5 compared to a single tile structure with nominally identical computing and memory resources. Thus the independent local memory accesses, which are effectively increasing the memory bandwidth, usually outweigh the penalties of inter-tile remote memory accesses. Our approach also reduces the directory structures significantly compared to traditional schemes, making it scalable for large MPSoCs (eg. by 41.4% for a 16 tile system with 4 tiles per region). Considering data-to-task-placement, our investigations show that it can lead to performance variations up to a factor of 12.7.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125973391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Securing FPGA SoC configurations independent of their manufacturers 确保FPGA SoC配置独立于其制造商
Pub Date : 2017-09-01 DOI: 10.1109/SOCC.2017.8226019
Nisha Jacob, J. Wittmann, Johann Heyszl, Robert Hesselbarth, F. Wilde, Michael Pehl, G. Sigl, K. Fischer
System-on-Chips which include FPGAs are important platforms for critical applications since they provide significant software performance through multi-core CPUs as well as high versatility through integrated FPGAs. Those integrated FP-GAs allow to update the programmable hardware functionality, e.g. to include new communication interfaces or to update cryptographic accelerators during the life-time of devices. Updating software as well as hardware configuration is required for critical applications such as e.g. industrial control devices or vehicles with long life-times. Such updates must be authenticated and possibly encrypted. One way to achieve this is to rely on static FPGA manufacturer-provided cryptography and respective master keys. However, in this contribution, we show how to retrofit Xilinx Zynq FPGAs with an alternative cryptographic accelerator and how to establish device-individual keys using Physical Unclonable Function (PUF) technology. These two key aspects reduce the required trust in manufacturer-provided security features while increasing the security by binding configurations to a specific device.
包括fpga在内的片上系统是关键应用的重要平台,因为它们通过多核cpu提供重要的软件性能,并通过集成fpga提供高通用性。这些集成的FP-GAs允许更新可编程硬件功能,例如,在设备的生命周期内包括新的通信接口或更新加密加速器。对于工业控制设备或使用寿命长的车辆等关键应用,需要更新软件和硬件配置。这些更新必须经过身份验证,可能还需要加密。实现这一目标的一种方法是依赖于FPGA制造商提供的静态加密和相应的主密钥。然而,在本贡献中,我们展示了如何使用替代加密加速器改造Xilinx Zynq fpga,以及如何使用物理不可克隆功能(PUF)技术建立设备独立密钥。这两个关键方面降低了对制造商提供的安全特性的信任,同时通过将配置绑定到特定设备来提高安全性。
{"title":"Securing FPGA SoC configurations independent of their manufacturers","authors":"Nisha Jacob, J. Wittmann, Johann Heyszl, Robert Hesselbarth, F. Wilde, Michael Pehl, G. Sigl, K. Fischer","doi":"10.1109/SOCC.2017.8226019","DOIUrl":"https://doi.org/10.1109/SOCC.2017.8226019","url":null,"abstract":"System-on-Chips which include FPGAs are important platforms for critical applications since they provide significant software performance through multi-core CPUs as well as high versatility through integrated FPGAs. Those integrated FP-GAs allow to update the programmable hardware functionality, e.g. to include new communication interfaces or to update cryptographic accelerators during the life-time of devices. Updating software as well as hardware configuration is required for critical applications such as e.g. industrial control devices or vehicles with long life-times. Such updates must be authenticated and possibly encrypted. One way to achieve this is to rely on static FPGA manufacturer-provided cryptography and respective master keys. However, in this contribution, we show how to retrofit Xilinx Zynq FPGAs with an alternative cryptographic accelerator and how to establish device-individual keys using Physical Unclonable Function (PUF) technology. These two key aspects reduce the required trust in manufacturer-provided security features while increasing the security by binding configurations to a specific device.","PeriodicalId":366264,"journal":{"name":"2017 30th IEEE International System-on-Chip Conference (SOCC)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129407856","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
期刊
2017 30th IEEE International System-on-Chip Conference (SOCC)
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1