Tomás Picornell, J. Flich, Carles Hernández, J. Duato
The adoption of many-cores in safety-critical systems requires real-time capable networks on chip (NoC). In this paper we propose a new time-predictable NoC design paradigm where contention within the network is eliminated. This new paradigm builds on the Channel Dependency Graph (CDG) and guarantees by design the absence of contention. Our delayed conflict-free NoC (DCFNoC) is able to naturally inject messages using a TDM period equal to the optimal theoretical bound and without the need of using a computationally demanding offline process. Results show that DCFNoC guarantees time predictability with very low implementation cost.
{"title":"DCFNoC","authors":"Tomás Picornell, J. Flich, Carles Hernández, J. Duato","doi":"10.1145/3316781.3317794","DOIUrl":"https://doi.org/10.1145/3316781.3317794","url":null,"abstract":"The adoption of many-cores in safety-critical systems requires real-time capable networks on chip (NoC). In this paper we propose a new time-predictable NoC design paradigm where contention within the network is eliminated. This new paradigm builds on the Channel Dependency Graph (CDG) and guarantees by design the absence of contention. Our delayed conflict-free NoC (DCFNoC) is able to naturally inject messages using a TDM period equal to the optimal theoretical bound and without the need of using a computationally demanding offline process. Results show that DCFNoC guarantees time predictability with very low implementation cost.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127999921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Existing approaches to neural network compression have failed to holistically address algorithmic (training accuracy) and computational (inference performance) demands of real-world systems, particularly on resource-constrained devices. We present C3-Flow, a new approach adding non-uniformity to low-rank approximations and designed specifically to enable highly-efficient computation on common hardware architectures while retaining more accuracy than competing methods. Evaluation on two state-of-the-art acoustic models (versus existing work, empirical limit study approaches, and hand-tuned models) demonstrates up to 60% lower error. Finally, we show that our co-design approach achieves up to 14X inference speedup across three Haswell- and Broadwell-based platforms.
{"title":"C3-Flow: Compute Compression Co-Design Flow for Deep Neural Networks","authors":"Matthew Sotoudeh, Sara S. Baghsorkhi","doi":"10.1145/3316781.3317786","DOIUrl":"https://doi.org/10.1145/3316781.3317786","url":null,"abstract":"Existing approaches to neural network compression have failed to holistically address algorithmic (training accuracy) and computational (inference performance) demands of real-world systems, particularly on resource-constrained devices. We present C3-Flow, a new approach adding non-uniformity to low-rank approximations and designed specifically to enable highly-efficient computation on common hardware architectures while retaining more accuracy than competing methods. Evaluation on two state-of-the-art acoustic models (versus existing work, empirical limit study approaches, and hand-tuned models) demonstrates up to 60% lower error. Finally, we show that our co-design approach achieves up to 14X inference speedup across three Haswell- and Broadwell-based platforms.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133072291","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
To satisfy increasing computing demands, heterogeneous computing platforms are gaining attention, especially CPU-FPGA platforms. Recently, emerging tightly coupled CPU-FPGA platforms with shared coherent caches (such as the Intel HARP and IBM POWER with CAPI) have been proposed to facilitate data communication and simplify the programming model. In this work, we propose LAMA, a static analysis and dynamic control combined framework for memory access management in such platforms, to further enhance the memory access efficiency and maintain the data consistency. Based on implementation results on the real Intel HARP2 platform, LAMA is shown to improve the performance by 34% on average with low overhead.
{"title":"LAMA","authors":"Liang Feng, Jieru Zhao, Tingyuan Liang, Sharad Sinha, Wei Zhang","doi":"10.1145/3316781.3317846","DOIUrl":"https://doi.org/10.1145/3316781.3317846","url":null,"abstract":"To satisfy increasing computing demands, heterogeneous computing platforms are gaining attention, especially CPU-FPGA platforms. Recently, emerging tightly coupled CPU-FPGA platforms with shared coherent caches (such as the Intel HARP and IBM POWER with CAPI) have been proposed to facilitate data communication and simplify the programming model. In this work, we propose LAMA, a static analysis and dynamic control combined framework for memory access management in such platforms, to further enhance the memory access efficiency and maintain the data consistency. Based on implementation results on the real Intel HARP2 platform, LAMA is shown to improve the performance by 34% on average with low overhead.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128821713","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Data movement in large caches consumes a significant amount of energy in modern computer systems. Low power interfaces have been proposed to address this problem. Unfortunately, the energy-efficiency of these techniques is largely limited due to undue latency overheads of low power wires and complex coding mechanisms. This paper proposes a hybrid technique for slow-transition, fast-level (STFL) signaling that creates a balance between power and bandwidth in the last level cache interface. Combined with STFL codes, the signaling technique significantly mitigates the performance impacts of low power wires, thereby improving the energy efficiency of data movement in memory systems. When applied to the last level cache of a contemporary multicore system, STFL improves the CPU energy-delay product by 9% as compared to a voltage-frequency scaled baseline. Moreover, the proposed architecture reduces the CPU energy by 26% and achieves 98% of the performance provided by a high-performance baseline.
{"title":"STFL","authors":"Payman Behnam, M. N. Bojnordi","doi":"10.1145/3316781.3317819","DOIUrl":"https://doi.org/10.1145/3316781.3317819","url":null,"abstract":"Data movement in large caches consumes a significant amount of energy in modern computer systems. Low power interfaces have been proposed to address this problem. Unfortunately, the energy-efficiency of these techniques is largely limited due to undue latency overheads of low power wires and complex coding mechanisms. This paper proposes a hybrid technique for slow-transition, fast-level (STFL) signaling that creates a balance between power and bandwidth in the last level cache interface. Combined with STFL codes, the signaling technique significantly mitigates the performance impacts of low power wires, thereby improving the energy efficiency of data movement in memory systems. When applied to the last level cache of a contemporary multicore system, STFL improves the CPU energy-delay product by 9% as compared to a voltage-frequency scaled baseline. Moreover, the proposed architecture reduces the CPU energy by 26% and achieves 98% of the performance provided by a high-performance baseline.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"211 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128252376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Electronic hardware trust is an emerging concern for all stakeholders in the semiconductor industry. Trust issues in electronic hardware span all stages of its life cycle - from creation of intellectual property (IP) blocks to manufacturing, test and deployment of hardware components and all abstraction levels - from chips to printed circuit boards (PCBs) to systems. The trust issues originate from a horizontal business model that promotes reliance of third-party untrusted facilities, tools, and IPs in the hardware life cycle. Today, designers are tasked with verifying the integrity of third-party IPs before incorporating them into system-on-chip (SoC) designs. Existing trust metric frameworks have limited applicability since they are not comprehensive. They capture only a subset of vulnerabilities such as potential vulnerabilities introduced through design mistakes and CAD tools, or quantify features in a design that target a particular Trojan model. Therefore, current practice uses ad-hoc security analysis of IP cores. In this paper, we propose a vector-based comprehensive coverage metric that quantifies the overall trust of an IP considering both vulnerabilities and direct malicious modifications. We use a variable weighted sum of a design's functional coverage, structural coverage, and asset coverage to assess an IP's integrity. Designers can also effectively use our trust metric to compare the relative trustworthiness of functionally equivalent third-party IPs. To demonstrate the applicability and usefulness of the proposed metric, we utilize our trust metric on Trojan-free and Trojan-inserted variants of an IP. Our results demonstrate that we are able to successfully distinguish between trusted and untrusted IPs.
{"title":"The Metric Matters: The Art of Measuring Trust in Electronics","authors":"Jonathan Cruz, P. Mishra, S. Bhunia","doi":"10.1145/3316781.3323488","DOIUrl":"https://doi.org/10.1145/3316781.3323488","url":null,"abstract":"Electronic hardware trust is an emerging concern for all stakeholders in the semiconductor industry. Trust issues in electronic hardware span all stages of its life cycle - from creation of intellectual property (IP) blocks to manufacturing, test and deployment of hardware components and all abstraction levels - from chips to printed circuit boards (PCBs) to systems. The trust issues originate from a horizontal business model that promotes reliance of third-party untrusted facilities, tools, and IPs in the hardware life cycle. Today, designers are tasked with verifying the integrity of third-party IPs before incorporating them into system-on-chip (SoC) designs. Existing trust metric frameworks have limited applicability since they are not comprehensive. They capture only a subset of vulnerabilities such as potential vulnerabilities introduced through design mistakes and CAD tools, or quantify features in a design that target a particular Trojan model. Therefore, current practice uses ad-hoc security analysis of IP cores. In this paper, we propose a vector-based comprehensive coverage metric that quantifies the overall trust of an IP considering both vulnerabilities and direct malicious modifications. We use a variable weighted sum of a design's functional coverage, structural coverage, and asset coverage to assess an IP's integrity. Designers can also effectively use our trust metric to compare the relative trustworthiness of functionally equivalent third-party IPs. To demonstrate the applicability and usefulness of the proposed metric, we utilize our trust metric on Trojan-free and Trojan-inserted variants of an IP. Our results demonstrate that we are able to successfully distinguish between trusted and untrusted IPs.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"2011 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123818110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D die-stacking has enabled energy-efficient solutions for near data processing by integrating multiple dice of high-density memory layers and processor cores within the same package. One promising approach is to employ the in-package memory as a gigascale lastlevel cache for data-intensive computing. Most existing in-package cache controllers rely on the command scheduling policies borrowed from the off-chip DRAM system. Regrettably, these control policies are not specifically tailored for in-package cache traffics, which results in a limited bandwidth efficiency. This paper proposes ReTagger, a DRAM cache controller that employs repeated tags to alleviate the cost of DRAM row buffer misses. Our simulation results on a set of ten data-intensive applications indicate an average of 20% performance improvement for the proposed controller over the state-of-the-art DRAM caches.
{"title":"ReTagger","authors":"M. N. Bojnordi, Farhan Nasrullah","doi":"10.1145/3316781.3317895","DOIUrl":"https://doi.org/10.1145/3316781.3317895","url":null,"abstract":"D die-stacking has enabled energy-efficient solutions for near data processing by integrating multiple dice of high-density memory layers and processor cores within the same package. One promising approach is to employ the in-package memory as a gigascale lastlevel cache for data-intensive computing. Most existing in-package cache controllers rely on the command scheduling policies borrowed from the off-chip DRAM system. Regrettably, these control policies are not specifically tailored for in-package cache traffics, which results in a limited bandwidth efficiency. This paper proposes ReTagger, a DRAM cache controller that employs repeated tags to alleviate the cost of DRAM row buffer misses. Our simulation results on a set of ten data-intensive applications indicate an average of 20% performance improvement for the proposed controller over the state-of-the-art DRAM caches.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130876234","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neetu Jindal, Sandeep Chandran, P. Panda, Sanjiva Prasad, Arnab Mitra, Kunal Singhal, Shubham Gupta, Shikhar Tuli
Runtime verification employs dedicated hardware or software monitors to check whether program properties hold at runtime. However, these monitors often incur high area and performance overheads depending on whether they are implemented in hardware or software. In this work, we propose DHOOM, an architectural framework for runtime monitoring of program assertions, which exploits the combination of a reconfigurable fabric present alongside a processor core with the vestigial on-chip Design-for-Debug hardware. This combination of hardware features allows DHOOM to minimize the overall performance overhead of runtime verification, even when subject to a given area constraint. We present an algorithm for dynamically selecting an effective subset of assertion monitors that can be accommodated in the available programmable fabric, while instrumenting the remaining assertions in software. We show that our proposed strategy, while respecting area constraints, reduces the performance overhead of runtime verification by up to 32% when compared with a baseline of software-only monitors.
{"title":"DHOOM","authors":"Neetu Jindal, Sandeep Chandran, P. Panda, Sanjiva Prasad, Arnab Mitra, Kunal Singhal, Shubham Gupta, Shikhar Tuli","doi":"10.1145/3316781.3317799","DOIUrl":"https://doi.org/10.1145/3316781.3317799","url":null,"abstract":"Runtime verification employs dedicated hardware or software monitors to check whether program properties hold at runtime. However, these monitors often incur high area and performance overheads depending on whether they are implemented in hardware or software. In this work, we propose DHOOM, an architectural framework for runtime monitoring of program assertions, which exploits the combination of a reconfigurable fabric present alongside a processor core with the vestigial on-chip Design-for-Debug hardware. This combination of hardware features allows DHOOM to minimize the overall performance overhead of runtime verification, even when subject to a given area constraint. We present an algorithm for dynamically selecting an effective subset of assertion monitors that can be accommodated in the available programmable fabric, while instrumenting the remaining assertions in software. We show that our proposed strategy, while respecting area constraints, reduces the performance overhead of runtime verification by up to 32% when compared with a baseline of software-only monitors.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114590737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pramesh Pandey, Prabal Basu, Koushik Chakraborty, Sanghamitra Roy
The emergence of hardware accelerators has brought about several orders of magnitude improvement in the speed of the deep neural-network (DNN) inference. Among such DNN accelerators, Google Tensor Processing Unit (TPU) has transpired to be the best-in-class, offering more than $15 times$ speedup over the contemporary GPUs. However, the rapid growth in several DNN workloads conspires to escalate the energy consumptions of the TPU-based data-centers. In order to restrict the energy consumption of TPUs, we propose GreenTPU--- a low-power near-threshold (NTC) TPU design paradigm. To ensure a high inference accuracy at a low-voltage operation, GreenTPU identifies the patterns in the error-causing activation sequences in the systolic array, and prevents further timing errors from the same sequence by intermittently boosting the operating voltage of the specific multiplier-andaccumulator units in the TPU. Compared to a cutting-edge timing error mitigation technique for TPUs, GreenTPU enables 2X -3X higher performance in an NTC TPU, with a minimal loss in the prediction accuracy.
{"title":"GreenTPU","authors":"Pramesh Pandey, Prabal Basu, Koushik Chakraborty, Sanghamitra Roy","doi":"10.1145/3316781.3317835","DOIUrl":"https://doi.org/10.1145/3316781.3317835","url":null,"abstract":"The emergence of hardware accelerators has brought about several orders of magnitude improvement in the speed of the deep neural-network (DNN) inference. Among such DNN accelerators, Google Tensor Processing Unit (TPU) has transpired to be the best-in-class, offering more than $15 times$ speedup over the contemporary GPUs. However, the rapid growth in several DNN workloads conspires to escalate the energy consumptions of the TPU-based data-centers. In order to restrict the energy consumption of TPUs, we propose GreenTPU--- a low-power near-threshold (NTC) TPU design paradigm. To ensure a high inference accuracy at a low-voltage operation, GreenTPU identifies the patterns in the error-causing activation sequences in the systolic array, and prevents further timing errors from the same sequence by intermittently boosting the operating voltage of the specific multiplier-andaccumulator units in the TPU. Compared to a cutting-edge timing error mitigation technique for TPUs, GreenTPU enables 2X -3X higher performance in an NTC TPU, with a minimal loss in the prediction accuracy.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116418103","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Cheng, Daniel Mueller-Gritschneder, J. Abraham, P. Bose, A. Buyuktosunoglu, Deming Chen, Hyungmin Cho, Yanjing Li, Uzair Sharif, K. Skadron, M. Stan, Ulf Schlichtmann, S. Mitra
Resilience to errors in the underlying hardware is a key design objective for a large class of computing systems, from embedded systems all the way to the cloud. Sources of hardware errors include radiation, circuit aging, variability induced by manufacturing and operating conditions, manufacturing test escapes, and early-life failures. Many publications have suggested that cross-layer resilience, where multiple error resilience techniques from different layers of the system stack cooperate to achieve cost-effective resilience, is essential for designing cost-effective resilient digital systems. This paper presents a comprehensive overview of cross-layer resilience by addressing fundamental cross-layer resilience questions, by summarizing insights derived from recent advances in cross-layer resilience research, and by discussing future cross-layer resilience challenges.
{"title":"Cross-Layer Resilience: Challenges, Insights, and the Road Ahead","authors":"E. Cheng, Daniel Mueller-Gritschneder, J. Abraham, P. Bose, A. Buyuktosunoglu, Deming Chen, Hyungmin Cho, Yanjing Li, Uzair Sharif, K. Skadron, M. Stan, Ulf Schlichtmann, S. Mitra","doi":"10.1145/3316781.3323474","DOIUrl":"https://doi.org/10.1145/3316781.3323474","url":null,"abstract":"Resilience to errors in the underlying hardware is a key design objective for a large class of computing systems, from embedded systems all the way to the cloud. Sources of hardware errors include radiation, circuit aging, variability induced by manufacturing and operating conditions, manufacturing test escapes, and early-life failures. Many publications have suggested that cross-layer resilience, where multiple error resilience techniques from different layers of the system stack cooperate to achieve cost-effective resilience, is essential for designing cost-effective resilient digital systems. This paper presents a comprehensive overview of cross-layer resilience by addressing fundamental cross-layer resilience questions, by summarizing insights derived from recent advances in cross-layer resilience research, and by discussing future cross-layer resilience challenges.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129985413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
In recent years, machine learning research has largely shifted focus from the cloud to the edge. While the resulting algorithm-and hardware-level optimizations have enabled local execution for the majority of deep neural networks (DNNs) on edge devices, the sheer magnitude of DNNs associated with real-time video detection workloads has forced them to remain relegated to remote execution in the cloud. This problematic when combined with the strict latency requirements that are coupled with these workloads, and imposes a unique set of challenges not directly addressed in prior works. In this work, we design MobiEye, a cloud-based video detection system optimized for deployment in real-time mobile applications. MobiEye is able to achieve up to a 32% reduction in latency when compared to a conventional implementation of video detection system with only a marginal reduction in accuracy.
{"title":"MobiEye","authors":"Jiachen Mao, Qing Yang, Ang Li, H. Li, Yiran Chen","doi":"10.1145/3316781.3317865","DOIUrl":"https://doi.org/10.1145/3316781.3317865","url":null,"abstract":"In recent years, machine learning research has largely shifted focus from the cloud to the edge. While the resulting algorithm-and hardware-level optimizations have enabled local execution for the majority of deep neural networks (DNNs) on edge devices, the sheer magnitude of DNNs associated with real-time video detection workloads has forced them to remain relegated to remote execution in the cloud. This problematic when combined with the strict latency requirements that are coupled with these workloads, and imposes a unique set of challenges not directly addressed in prior works. In this work, we design MobiEye, a cloud-based video detection system optimized for deployment in real-time mobile applications. MobiEye is able to achieve up to a 32% reduction in latency when compared to a conventional implementation of video detection system with only a marginal reduction in accuracy.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123320511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}