首页 > 最新文献

ACM Transactions on Embedded Computing Systems最新文献

英文 中文
BASS: Safe Deep Tissue Optical Sensing for Wearable Embedded Systems BASS:用于可穿戴嵌入式系统的安全深层组织光学传感
3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-09 DOI: 10.1145/3607916
Kourosh Vali, Ata Vafi, Begum Kasap, Soheil Ghiasi
In wearable optical sensing applications whose target tissue is not superficial, such as deep tissue oximetry, the task of embedded system design has to strike a balance between two competing factors. On one hand, the sensing task is assisted by increasing the radiated energy into the body, which in turn, improves the signal-to-noise ratio (SNR) of the deep tissue at the sensor. On the other hand, patient safety consideration imposes a constraint on the amount of radiated energy into the body. In this paper, we study the trade-offs between the two factors by exploring the design space of the light source activation pulse. Furthermore, we propose BASS, an algorithm that leverages the activation pulse design space exploration, which further optimizes deep tissue SNR via spectral averaging, while ensuring the radiated energy into the body meets a safe upper bound. The effectiveness of the proposed technique is demonstrated via analytical derivations, simulations, and in vivo measurements in both pregnant sheep models and human subjects.
在目标组织不是表面的可穿戴光学传感应用中,如深层组织血氧测定,嵌入式系统设计的任务必须在两个竞争因素之间取得平衡。一方面,通过增加进入人体的辐射能量来辅助传感任务,这反过来又提高了传感器深层组织的信噪比(SNR)。另一方面,考虑到病人的安全,对进入人体的辐射能量的量施加了限制。本文通过探索光源激活脉冲的设计空间,研究了这两个因素之间的权衡。此外,我们提出了BASS,一种利用激活脉冲设计空间探索的算法,该算法通过频谱平均进一步优化深层组织信噪比,同时确保进入人体的辐射能量满足安全上限。通过分析推导、模拟和怀孕羊模型和人类受试者的体内测量,证明了所提出技术的有效性。
{"title":"BASS: Safe Deep Tissue Optical Sensing for Wearable Embedded Systems","authors":"Kourosh Vali, Ata Vafi, Begum Kasap, Soheil Ghiasi","doi":"10.1145/3607916","DOIUrl":"https://doi.org/10.1145/3607916","url":null,"abstract":"In wearable optical sensing applications whose target tissue is not superficial, such as deep tissue oximetry, the task of embedded system design has to strike a balance between two competing factors. On one hand, the sensing task is assisted by increasing the radiated energy into the body, which in turn, improves the signal-to-noise ratio (SNR) of the deep tissue at the sensor. On the other hand, patient safety consideration imposes a constraint on the amount of radiated energy into the body. In this paper, we study the trade-offs between the two factors by exploring the design space of the light source activation pulse. Furthermore, we propose BASS, an algorithm that leverages the activation pulse design space exploration, which further optimizes deep tissue SNR via spectral averaging, while ensuring the radiated energy into the body meets a safe upper bound. The effectiveness of the proposed technique is demonstrated via analytical derivations, simulations, and in vivo measurements in both pregnant sheep models and human subjects.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136108453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
EMS-i : An Efficient Memory System Design with Specialized Caching Mechanism for Recommendation Inference EMS-i:一种高效的内存系统设计,具有专门的推荐推理缓存机制
3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-09 DOI: 10.1145/3609384
Yitu Wang, Shiyu Li, Qilin Zheng, Andrew Chang, Hai Li, Yiran Chen
Recommendation systems have been widely embedded into many Internet services. For example, Meta’s deep learning recommendation model (DLRM) shows high prefictive accuracy of click-through rate in processing large-scale embedding tables. The SparseLengthSum (SLS) kernel of the DLRM dominates the inference time of the DLRM due to intensive irregular memory accesses to the embedding vectors. Some prior works directly adopt near data processing (NDP) solutions to obtain higher memory bandwidth to accelerate SLS. However, their inferior memory hierarchy induces low performance-cost ratio and fails to fully exploit the data locality. Although some software-managed cache policies were proposed to improve the cache hit rate, the incurred cache miss penalty is unacceptable considering the high overheads of executing the corresponding programs and the communication between the host and the accelerator. To address the issues aforementioned, we propose EMS-i , an efficient memory system design that integrates Solide State Drive (SSD) into the memory hierarchy using Compute Express Link (CXL) for recommendation system inference. We specialize the caching mechanism according to the characteristics of various DLRM workloads and propose a novel prefetching mechanism to further improve the performance. In addition, we delicately design the inference kernel and develop a customized mapping scheme for SLS operation, considering the multi-level parallelism in SLS and the data locality within a batch of queries. Compared to the state-of-the-art NDP solutions, EMS-i achieves up to 10.9× speedup over RecSSD and the performance comparable to RecNMP with 72% energy savings. EMS-i also saves up to 8.7× and 6.6 × memory cost w.r.t. RecSSD and RecNMP, respectively.
推荐系统已经被广泛地嵌入到许多互联网服务中。例如,Meta的深度学习推荐模型(DLRM)在处理大规模嵌入表时显示出很高的点击率预测准确率。DLRM的SparseLengthSum (SLS)内核由于对嵌入向量的密集不规则内存访问而支配了DLRM的推理时间。先前的一些工作直接采用近数据处理(NDP)解决方案,以获得更高的内存带宽来加速SLS。然而,它们较低的内存层次结构导致了较低的性能成本比,并且不能充分利用数据的局部性。尽管提出了一些软件管理的缓存策略来提高缓存命中率,但考虑到执行相应程序的高开销以及主机和加速器之间的通信,所产生的缓存丢失惩罚是不可接受的。为了解决上述问题,我们提出了EMS-i,这是一种高效的内存系统设计,它将固态硬盘(SSD)集成到内存层次结构中,使用Compute Express Link (CXL)进行推荐系统推理。根据不同DLRM工作负载的特点,对缓存机制进行了细化,提出了一种新的预取机制,进一步提高了性能。此外,考虑到SLS中的多级并行性和批量查询中的数据局部性,我们精心设计了推理内核,并为SLS操作开发了定制的映射方案。与最先进的NDP解决方案相比,EMS-i实现了比RecSSD高达10.9倍的加速,性能与RecNMP相当,节能72%。与RecSSD和RecNMP相比,EMS-i还可分别节省高达8.7倍和6.6倍的内存成本。
{"title":"<scp>EMS-i</scp> : An Efficient Memory System Design with Specialized Caching Mechanism for Recommendation Inference","authors":"Yitu Wang, Shiyu Li, Qilin Zheng, Andrew Chang, Hai Li, Yiran Chen","doi":"10.1145/3609384","DOIUrl":"https://doi.org/10.1145/3609384","url":null,"abstract":"Recommendation systems have been widely embedded into many Internet services. For example, Meta’s deep learning recommendation model (DLRM) shows high prefictive accuracy of click-through rate in processing large-scale embedding tables. The SparseLengthSum (SLS) kernel of the DLRM dominates the inference time of the DLRM due to intensive irregular memory accesses to the embedding vectors. Some prior works directly adopt near data processing (NDP) solutions to obtain higher memory bandwidth to accelerate SLS. However, their inferior memory hierarchy induces low performance-cost ratio and fails to fully exploit the data locality. Although some software-managed cache policies were proposed to improve the cache hit rate, the incurred cache miss penalty is unacceptable considering the high overheads of executing the corresponding programs and the communication between the host and the accelerator. To address the issues aforementioned, we propose EMS-i , an efficient memory system design that integrates Solide State Drive (SSD) into the memory hierarchy using Compute Express Link (CXL) for recommendation system inference. We specialize the caching mechanism according to the characteristics of various DLRM workloads and propose a novel prefetching mechanism to further improve the performance. In addition, we delicately design the inference kernel and develop a customized mapping scheme for SLS operation, considering the multi-level parallelism in SLS and the data locality within a batch of queries. Compared to the state-of-the-art NDP solutions, EMS-i achieves up to 10.9× speedup over RecSSD and the performance comparable to RecNMP with 72% energy savings. EMS-i also saves up to 8.7× and 6.6 × memory cost w.r.t. RecSSD and RecNMP, respectively.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":"695 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136108457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Rectifying Skewed Kernel Page Reclamation in Mobile Devices for Improving User-Perceivable Latency 纠正移动设备中倾斜的内核页面回收,以改善用户可感知的延迟
3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-09 DOI: 10.1145/3607937
Yi-Quan Chou, Lin-Wei Shen, Li-Pin Chang
A crucial design factor for users of smart mobile devices is the latency of graphical interface interaction. Switching a background app to foreground is a frequent operation on mobile devices and the latency of this process is highly perceivable to users. Based on an Android smartphone, through analysis of memory reference generated during the app-switching process, we observe that file (virtual) pages and anonymous pages are both heavily involved. However, to our surprise, the amounts of the two types of pages in the main memory are highly imbalanced, and frequent I/O operations on file pages noticeably slows down the app-switching process. In this study, we advocate to improve the app-switching latency by rectifying the skewed kernel page reclaiming. Our approach involves two parts: proactive identification of unused anonymous pages and adaptive balance between file pages and anonymous pages. As mobile apps are found inflating their anonymous pages, we propose identifying unused anonymous pages in sync with the app-switching events. In addition, Android devices replaces the swap device with RAM-based zram, and swapping on zram is much faster than file accessing on flash storage. Without causing thrashing, we propose swapping out as many anonymous pages to zram as possible for caching more file pages. We conduct experiments on a Google Pixel phone with realistic user workloads, and results confirm that our method is adaptive to different memory requirements and greatly improves the app-switching latency by up to 43% compared with the original kernel.
对于智能移动设备的用户来说,一个重要的设计因素是图形界面交互的延迟。在移动设备上切换后台应用到前台是一个频繁的操作,这个过程的延迟对用户来说是高度可感知的。基于Android智能手机,通过分析应用切换过程中产生的内存引用,我们观察到文件(虚拟)页面和匿名页面都被大量涉及。然而,令我们惊讶的是,主内存中这两种类型页面的数量是高度不平衡的,频繁的文件页面I/O操作明显减慢了应用程序切换过程。在本研究中,我们主张通过纠正扭曲的内核页面回收来改善应用程序切换延迟。我们的方法包括两个部分:主动识别未使用的匿名页面以及文件页面和匿名页面之间的自适应平衡。当发现移动应用程序膨胀其匿名页面时,我们建议与应用程序切换事件同步识别未使用的匿名页面。此外,Android设备用基于ram的zram取代了交换设备,在zram上交换比在闪存上访问文件要快得多。在不引起抖动的情况下,我们建议将尽可能多的匿名页面交换到zram,以缓存更多的文件页面。我们在Google Pixel手机上进行了实际用户工作负载的实验,结果证实了我们的方法可以适应不同的内存需求,并且与原始内核相比,我们的方法大大提高了应用程序切换延迟,最高可提高43%。
{"title":"Rectifying Skewed Kernel Page Reclamation in Mobile Devices for Improving User-Perceivable Latency","authors":"Yi-Quan Chou, Lin-Wei Shen, Li-Pin Chang","doi":"10.1145/3607937","DOIUrl":"https://doi.org/10.1145/3607937","url":null,"abstract":"A crucial design factor for users of smart mobile devices is the latency of graphical interface interaction. Switching a background app to foreground is a frequent operation on mobile devices and the latency of this process is highly perceivable to users. Based on an Android smartphone, through analysis of memory reference generated during the app-switching process, we observe that file (virtual) pages and anonymous pages are both heavily involved. However, to our surprise, the amounts of the two types of pages in the main memory are highly imbalanced, and frequent I/O operations on file pages noticeably slows down the app-switching process. In this study, we advocate to improve the app-switching latency by rectifying the skewed kernel page reclaiming. Our approach involves two parts: proactive identification of unused anonymous pages and adaptive balance between file pages and anonymous pages. As mobile apps are found inflating their anonymous pages, we propose identifying unused anonymous pages in sync with the app-switching events. In addition, Android devices replaces the swap device with RAM-based zram, and swapping on zram is much faster than file accessing on flash storage. Without causing thrashing, we propose swapping out as many anonymous pages to zram as possible for caching more file pages. We conduct experiments on a Google Pixel phone with realistic user workloads, and results confirm that our method is adaptive to different memory requirements and greatly improves the app-switching latency by up to 43% compared with the original kernel.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136107349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ZPP: A Dynamic Technique to Eliminate Cache Pollution in NoC based MPSoCs 一种消除基于NoC的mpsoc缓存污染的动态技术
3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-09 DOI: 10.1145/3609113
Dipika Deb, John Jose
Data prefetching efficiently reduces the memory access latency in NUCA architectures as the Last Level Cache (LLC) is shared and distributed across multiple cores. But cache pollution generated by prefetcher reduces its efficiency by causing contention for shared resources such as LLC and the underlying network. The paper proposes Zero Pollution Prefetcher (ZPP) that eliminates cache pollution for NUCA architecture. For this purpose, ZPP uses L1 prefetcher and places the prefetched blocks in the data locations of LLC where modified blocks are stored. Since modified blocks in LLC are stale and request for such blocks are served from the exclusively owned private cache, their space unnecessary consumes power to maintain such stale data in the cache. The benefits of ZPP are (a) Eliminates cache pollution in L1 and LLC by storing prefetched blocks in LLC locations where stale blocks are stored. (b) Insufficient cache space is solved by placing prefetched blocks in LLC as LLCs are larger in size than L1 cache. This helps in prefetching more cache blocks, thereby increasing prefetch aggressiveness. (c) Increasing prefetch aggressiveness increases its coverage. (d) It also maintains an equivalent lookup latency to L1 cache for prefetched blocks. Experimentally it has been found that ZPP increases weighted speedup by 2.19x as compared to a system with no prefetching while prefetch coverage and prefetch accuracy increases by 50%, and 12%, respectively compared to the baseline.1
数据预取有效地减少了NUCA架构中的内存访问延迟,因为最后一级缓存(LLC)是跨多个核心共享和分布的。但是由预取器产生的缓存污染会导致共享资源(如LLC和底层网络)的争用,从而降低其效率。提出了零污染预取器(Zero Pollution Prefetcher, ZPP)来消除NUCA架构中的缓存污染。为此,ZPP使用L1预取器,并将预取的块放置在LLC中存储修改块的数据位置。由于在LLC中修改的块是陈旧的,并且对这些块的请求是从独占的私有缓存中提供的,因此它们不必要的空间消耗了在缓存中维护这些陈旧数据的功率。ZPP的好处是:(a)通过将预取的块存储在存储陈旧块的LLC位置,消除L1和LLC中的缓存污染。(b)由于LLC的大小大于L1缓存,通过在LLC中放置预取块来解决缓存空间不足的问题。这有助于预取更多的缓存块,从而提高预取的主动性。(c)增加预取积极性增加其覆盖范围。(d)对于预取的块,它还保持与L1缓存相同的查找延迟。实验发现,与没有预取的系统相比,ZPP将加权加速提高了2.19倍,而预取覆盖率和预取精度分别比基线提高了50%和12%。1
{"title":"ZPP: A Dynamic Technique to Eliminate Cache Pollution in NoC based MPSoCs","authors":"Dipika Deb, John Jose","doi":"10.1145/3609113","DOIUrl":"https://doi.org/10.1145/3609113","url":null,"abstract":"Data prefetching efficiently reduces the memory access latency in NUCA architectures as the Last Level Cache (LLC) is shared and distributed across multiple cores. But cache pollution generated by prefetcher reduces its efficiency by causing contention for shared resources such as LLC and the underlying network. The paper proposes Zero Pollution Prefetcher (ZPP) that eliminates cache pollution for NUCA architecture. For this purpose, ZPP uses L1 prefetcher and places the prefetched blocks in the data locations of LLC where modified blocks are stored. Since modified blocks in LLC are stale and request for such blocks are served from the exclusively owned private cache, their space unnecessary consumes power to maintain such stale data in the cache. The benefits of ZPP are (a) Eliminates cache pollution in L1 and LLC by storing prefetched blocks in LLC locations where stale blocks are stored. (b) Insufficient cache space is solved by placing prefetched blocks in LLC as LLCs are larger in size than L1 cache. This helps in prefetching more cache blocks, thereby increasing prefetch aggressiveness. (c) Increasing prefetch aggressiveness increases its coverage. (d) It also maintains an equivalent lookup latency to L1 cache for prefetched blocks. Experimentally it has been found that ZPP increases weighted speedup by 2.19x as compared to a system with no prefetching while prefetch coverage and prefetch accuracy increases by 50%, and 12%, respectively compared to the baseline.1","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136108463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Verified Compilation of Synchronous Dataflow with State Machines 状态机同步数据流的验证编译
3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-09 DOI: 10.1145/3608102
Timothy Bourke, Basile Pesin, Marc Pouzet
Safety-critical embedded software is routinely programmed in block-diagram languages. Recent work in the Vélus project specifies such a language and its compiler in the Coq proof assistant. It builds on the CompCert verified C compiler to give an end-to-end proof linking the dataflow semantics of source programs to traces of the generated assembly code. We extend this work with switched blocks, shared variables, reset blocks, and state machines; define a relational semantics to integrate these block- and mode-based constructions into the existing stream-based model; adapt the standard source-to-source rewriting scheme to compile the new constructions; and reestablish the correctness theorem.
安全关键型嵌入式软件通常是用框图语言编写的。vsamus项目最近的工作在Coq证明助手中指定了这样一种语言及其编译器。它建立在经过CompCert验证的C编译器上,提供端到端的证明,将源程序的数据流语义链接到生成的汇编代码的跟踪。我们用交换块、共享变量、重置块和状态机扩展了这项工作;定义一个关系语义,将这些基于块和模型的结构集成到现有的基于流的模型中;采用标准的源到源重写方案来编译新结构;重新建立正确性定理。
{"title":"Verified Compilation of Synchronous Dataflow with State Machines","authors":"Timothy Bourke, Basile Pesin, Marc Pouzet","doi":"10.1145/3608102","DOIUrl":"https://doi.org/10.1145/3608102","url":null,"abstract":"Safety-critical embedded software is routinely programmed in block-diagram languages. Recent work in the Vélus project specifies such a language and its compiler in the Coq proof assistant. It builds on the CompCert verified C compiler to give an end-to-end proof linking the dataflow semantics of source programs to traces of the generated assembly code. We extend this work with switched blocks, shared variables, reset blocks, and state machines; define a relational semantics to integrate these block- and mode-based constructions into the existing stream-based model; adapt the standard source-to-source rewriting scheme to compile the new constructions; and reestablish the correctness theorem.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136108725","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Florets for Chiplets: Data Flow-aware High-Performance and Energy-efficient Network-on-Interposer for CNN Inference Tasks 面向小芯片的小花:用于CNN推理任务的数据流感知的高性能节能网络-中介器
3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-09 DOI: 10.1145/3608098
Harsh Sharma, Lukas Pfromm, Rasit Onur Topaloglu, Janardhan Rao Doppa, Umit Y. Ogras, Ananth Kalyanraman, Partha Pratim Pande
Recent advances in 2.5D chiplet platforms provide a new avenue for compact scale-out implementations of emerging compute- and data-intensive applications including machine learning. Network-on-Interposer (NoI) enables integration of multiple chiplets on a 2.5D system. While these manycore platforms can deliver high computational throughput and energy efficiency by running multiple specialized tasks concurrently, conventional NoI architectures have a limited computational throughput due to their inherent multi-hop topologies. In this paper, we propose Floret, a novel NoI architecture based on space-filling curves (SFCs). The Floret architecture leverages suitable task mapping, exploits the data flow pattern, and optimizes the inter-chiplet data exchange to extract high performance for multiple types of convolutional neural network (CNN) inference tasks running concurrently. We demonstrate that the Floret architecture reduces the latency and energy up to 58% and 64%, respectively, compared to state-of-the-art NoI architectures while executing datacenter-scale workloads involving multiple CNN tasks simultaneously. Floret achieves high performance and significant energy savings with much lower fabrication cost by exploiting the data-flow awareness of the CNN inference tasks.
2.5D芯片平台的最新进展为新兴的计算和数据密集型应用(包括机器学习)的紧凑扩展实现提供了新的途径。中间层网络(NoI)可以在2.5D系统上集成多个小芯片。虽然这些多核平台可以通过并发运行多个专门任务来提供高计算吞吐量和能源效率,但传统的NoI架构由于其固有的多跳拓扑结构而具有有限的计算吞吐量。在本文中,我们提出了一种基于空间填充曲线(sfc)的新型NoI架构小花。小花架构利用适当的任务映射,利用数据流模式,优化芯片间数据交换,为并发运行的多种类型卷积神经网络(CNN)推理任务提取高性能。我们证明,在同时执行涉及多个CNN任务的数据中心规模的工作负载时,与最先进的NoI架构相比,小花架构将延迟和能量分别降低了58%和64%。Floret通过利用CNN推理任务的数据流感知,实现了高性能和显著的节能,并且降低了制造成本。
{"title":"Florets for Chiplets: Data Flow-aware High-Performance and Energy-efficient Network-on-Interposer for CNN Inference Tasks","authors":"Harsh Sharma, Lukas Pfromm, Rasit Onur Topaloglu, Janardhan Rao Doppa, Umit Y. Ogras, Ananth Kalyanraman, Partha Pratim Pande","doi":"10.1145/3608098","DOIUrl":"https://doi.org/10.1145/3608098","url":null,"abstract":"Recent advances in 2.5D chiplet platforms provide a new avenue for compact scale-out implementations of emerging compute- and data-intensive applications including machine learning. Network-on-Interposer (NoI) enables integration of multiple chiplets on a 2.5D system. While these manycore platforms can deliver high computational throughput and energy efficiency by running multiple specialized tasks concurrently, conventional NoI architectures have a limited computational throughput due to their inherent multi-hop topologies. In this paper, we propose Floret, a novel NoI architecture based on space-filling curves (SFCs). The Floret architecture leverages suitable task mapping, exploits the data flow pattern, and optimizes the inter-chiplet data exchange to extract high performance for multiple types of convolutional neural network (CNN) inference tasks running concurrently. We demonstrate that the Floret architecture reduces the latency and energy up to 58% and 64%, respectively, compared to state-of-the-art NoI architectures while executing datacenter-scale workloads involving multiple CNN tasks simultaneously. Floret achieves high performance and significant energy savings with much lower fabrication cost by exploiting the data-flow awareness of the CNN inference tasks.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136108722","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Overflow-free Compute Memories for Edge AI Acceleration 边缘AI加速的无溢出计算内存
3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-09 DOI: 10.1145/3609387
Flavio Ponzina, Marco Rios, Alexandre Levisse, Giovanni Ansaloni, David Atienza
Compute memories are memory arrays augmented with dedicated logic to support arithmetic. They support the efficient execution of data-centric computing patterns, such as those characterizing Artificial Intelligence (AI) algorithms. These architectures can provide computing capabilities as part of the memory array structures (In-Memory Computing, IMC) or at their immediate periphery (Near-Memory Computing, NMC). By bringing the processing elements inside (or very close to) storage, compute memories minimize the cost of data access. Moreover, highly parallel (and, hence, high-performance) computations are enabled by exploiting the regular structure of memory arrays. However, the regular layout of memory elements also constrains the data range of inputs and outputs, since the bitwidths of operands and results stored at each address cannot be freely varied. Addressing this challenge, we herein propose a HW/SW co-design methodology combining careful per-layer quantization and inter-layer scaling with lightweight hardware support for overflow-free computation of dot-vector operations. We demonstrate their use to implement the convolutional and fully connected layers of AI models. We embody our strategy in two implementations, based on IMC and NMC, respectively. Experimental results highlight that an area overhead of only 10.5% (for IMC) and 12.9% (for NMC) is required when interfacing with a 2KB subarray. Furthermore, inferences on benchmark CNNs show negligible accuracy degradation due to quantization for equivalent floating-point implementations.
计算存储器是扩充了专用逻辑以支持算术的存储器阵列。它们支持以数据为中心的计算模式的有效执行,例如那些表征人工智能(AI)算法的模式。这些架构可以提供计算能力作为内存阵列结构的一部分(In-Memory computing, IMC)或在其直接外围(Near-Memory computing, NMC)。通过将处理元素置于(或非常接近)存储器中,计算存储器可以将数据访问的成本降至最低。此外,通过利用内存阵列的规则结构,可以实现高度并行(因此是高性能)的计算。然而,内存元素的常规布局也限制了输入和输出的数据范围,因为存储在每个地址的操作数和结果的位宽不能自由改变。为了应对这一挑战,我们在此提出了一种硬件/软件协同设计方法,将谨慎的逐层量化和层间缩放与轻量级硬件支持相结合,以实现点向量操作的无溢出计算。我们演示了它们在实现人工智能模型的卷积和完全连接层中的应用。我们以两种实现方式体现我们的战略,分别基于IMC和NMC。实验结果表明,当与2KB子阵列接口时,仅需要10.5%(对于IMC)和12.9%(对于NMC)的面积开销。此外,对基准cnn的推断显示,由于等效浮点实现的量化,精度下降可以忽略不计。
{"title":"Overflow-free Compute Memories for Edge AI Acceleration","authors":"Flavio Ponzina, Marco Rios, Alexandre Levisse, Giovanni Ansaloni, David Atienza","doi":"10.1145/3609387","DOIUrl":"https://doi.org/10.1145/3609387","url":null,"abstract":"Compute memories are memory arrays augmented with dedicated logic to support arithmetic. They support the efficient execution of data-centric computing patterns, such as those characterizing Artificial Intelligence (AI) algorithms. These architectures can provide computing capabilities as part of the memory array structures (In-Memory Computing, IMC) or at their immediate periphery (Near-Memory Computing, NMC). By bringing the processing elements inside (or very close to) storage, compute memories minimize the cost of data access. Moreover, highly parallel (and, hence, high-performance) computations are enabled by exploiting the regular structure of memory arrays. However, the regular layout of memory elements also constrains the data range of inputs and outputs, since the bitwidths of operands and results stored at each address cannot be freely varied. Addressing this challenge, we herein propose a HW/SW co-design methodology combining careful per-layer quantization and inter-layer scaling with lightweight hardware support for overflow-free computation of dot-vector operations. We demonstrate their use to implement the convolutional and fully connected layers of AI models. We embody our strategy in two implementations, based on IMC and NMC, respectively. Experimental results highlight that an area overhead of only 10.5% (for IMC) and 12.9% (for NMC) is required when interfacing with a 2KB subarray. Furthermore, inferences on benchmark CNNs show negligible accuracy degradation due to quantization for equivalent floating-point implementations.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136108728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Consistency vs. Availability in Distributed Cyber-Physical Systems 分布式信息物理系统中的一致性与可用性
3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-09 DOI: 10.1145/3609119
Edward A. Lee, Ravi Akella, Soroush Bateni, Shaokai Lin, Marten Lohstroh, Christian Menard
In distributed applications, Brewer’s CAP theorem tells us that when networks become partitioned (P), one must give up either consistency (C) or availability (A). Consistency is agreement on the values of shared variables; availability is the ability to respond to reads and writes accessing those shared variables. Availability is a real-time property whereas consistency is a logical property. We extend consistency and availability to refer to cyber-physical properties such as the state of the physical system and delays in actuation. We have further extended the CAP theorem to relate quantitative measures of these two properties to quantitative measures of communication and computation latency (L), obtaining a relation called the CAL theorem that is linear in a max-plus algebra. This paper shows how to use the CAL theorem in various ways to help design cyber-physical systems. We develop a methodology for systematically trading off availability and consistency in application-specific ways and to guide the system designer when putting functionality in end devices, in edge computers, or in the cloud. We build on the Lingua Franca coordination language to provide system designers with concrete analysis and design tools to make the required tradeoffs in deployable embedded software.
在分布式应用程序中,Brewer的CAP定理告诉我们,当网络被分割(P)时,必须放弃一致性(C)或可用性(A)。一致性是对共享变量值的一致;可用性是对访问这些共享变量的读和写作出响应的能力。可用性是实时属性,而一致性是逻辑属性。我们将一致性和可用性扩展到网络物理属性,如物理系统的状态和驱动延迟。我们进一步扩展了CAP定理,将这两个性质的定量度量与通信和计算延迟(L)的定量度量联系起来,得到了一个称为CAL定理的关系,它在max-plus代数中是线性的。本文展示了如何以各种方式使用CAL定理来帮助设计网络物理系统。我们开发了一种方法,以特定于应用程序的方式系统地权衡可用性和一致性,并指导系统设计人员在将功能放入终端设备、边缘计算机或云中。我们以Lingua Franca协调语言为基础,为系统设计人员提供具体的分析和设计工具,以便在可部署的嵌入式软件中进行所需的权衡。
{"title":"Consistency vs. Availability in Distributed Cyber-Physical Systems","authors":"Edward A. Lee, Ravi Akella, Soroush Bateni, Shaokai Lin, Marten Lohstroh, Christian Menard","doi":"10.1145/3609119","DOIUrl":"https://doi.org/10.1145/3609119","url":null,"abstract":"In distributed applications, Brewer’s CAP theorem tells us that when networks become partitioned (P), one must give up either consistency (C) or availability (A). Consistency is agreement on the values of shared variables; availability is the ability to respond to reads and writes accessing those shared variables. Availability is a real-time property whereas consistency is a logical property. We extend consistency and availability to refer to cyber-physical properties such as the state of the physical system and delays in actuation. We have further extended the CAP theorem to relate quantitative measures of these two properties to quantitative measures of communication and computation latency (L), obtaining a relation called the CAL theorem that is linear in a max-plus algebra. This paper shows how to use the CAL theorem in various ways to help design cyber-physical systems. We develop a methodology for systematically trading off availability and consistency in application-specific ways and to guide the system designer when putting functionality in end devices, in edge computers, or in the cloud. We build on the Lingua Franca coordination language to provide system designers with concrete analysis and design tools to make the required tradeoffs in deployable embedded software.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136191886","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Kryptonite: Worst-Case Program Interference Estimation on Multi-Core Embedded Systems 多核嵌入式系统的最坏情况程序干扰估计
3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-09 DOI: 10.1145/3609128
Nikhilesh Singh, Karthikeyan Renganathan, Chester Rebeiro, Jithin Jose, Ralph Mader
Due to the low costs and energy needed, cyber-physical systems are adopting multi-core processors for their embedded computing requirements. In order to guarantee safety when the application has real-time constraints, a critical requirement is to estimate the worst-case interference from other executing programs. However, the complexity of multi-core hardware inhibits precisely determining the Worst-Case Program Interference. Existing solutions are either prone to overestimate the interference or are not scalable to different hardware sizes and designs. In this paper we present Kryptonite , an automated framework to synthesize Worst-Case Program Interference (WCPI) environments for multi-core systems. Fundamental to Kryptonite is a set of tiny hardware-specific code gadgets that are crafted to maximize interference locally. The gadgets are arranged using a greedy approach and then molded using a Reinforcement Learning algorithm to create the WCPI environment. We demonstrate Kryptonite on the automotive grade Infineon AURIX TC399 processor with a wide range of programs that includes a commercial real-time automotive application. We show that, while being easily scalable and tunable, Kryptonite creates WCPI environments increasing the runtime by up to 58% for benchmark applications and 26% for the automotive application.
由于低成本和能源需求,网络物理系统正在采用多核处理器来满足其嵌入式计算需求。当应用程序有实时限制时,为了保证安全性,一个关键的要求是估计来自其他执行程序的最坏情况干扰。然而,多核硬件的复杂性阻碍了最坏情况下程序干扰的精确确定。现有的解决方案要么容易高估干扰,要么不能扩展到不同的硬件尺寸和设计。在本文中,我们提出了Kryptonite,一个自动合成多核系统的最坏情况程序干扰(WCPI)环境的框架。Kryptonite的基础是一组特定于硬件的小型代码装置,它们被精心设计以最大限度地干扰本地。这些小工具使用贪婪方法进行排列,然后使用强化学习算法进行建模,以创建WCPI环境。我们在汽车级英飞凌AURIX TC399处理器上展示了Kryptonite,该处理器具有广泛的程序,包括商业实时汽车应用程序。我们表明,Kryptonite创建的WCPI环境虽然易于扩展和调优,但可将基准应用程序的运行时提高58%,将汽车应用程序的运行时提高26%。
{"title":"Kryptonite: Worst-Case Program Interference Estimation on Multi-Core Embedded Systems","authors":"Nikhilesh Singh, Karthikeyan Renganathan, Chester Rebeiro, Jithin Jose, Ralph Mader","doi":"10.1145/3609128","DOIUrl":"https://doi.org/10.1145/3609128","url":null,"abstract":"Due to the low costs and energy needed, cyber-physical systems are adopting multi-core processors for their embedded computing requirements. In order to guarantee safety when the application has real-time constraints, a critical requirement is to estimate the worst-case interference from other executing programs. However, the complexity of multi-core hardware inhibits precisely determining the Worst-Case Program Interference. Existing solutions are either prone to overestimate the interference or are not scalable to different hardware sizes and designs. In this paper we present Kryptonite , an automated framework to synthesize Worst-Case Program Interference (WCPI) environments for multi-core systems. Fundamental to Kryptonite is a set of tiny hardware-specific code gadgets that are crafted to maximize interference locally. The gadgets are arranged using a greedy approach and then molded using a Reinforcement Learning algorithm to create the WCPI environment. We demonstrate Kryptonite on the automotive grade Infineon AURIX TC399 processor with a wide range of programs that includes a commercial real-time automotive application. We show that, while being easily scalable and tunable, Kryptonite creates WCPI environments increasing the runtime by up to 58% for benchmark applications and 26% for the automotive application.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":"62 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136192425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FSIMR: File-system-aware Data Management for Interlaced Magnetic Recording 交错磁记录的文件系统感知数据管理
3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Pub Date : 2023-09-09 DOI: 10.1145/3607922
Yi-Han Lien, Yen-Ting Chen, Yuan-Hao Chang, Yu-Pei Liang, Wei-Kuan Shih
Interlaced Magnetic Recording (IMR) is an emerging recording technology for hard-disk drives (HDDs) that provides larger storage capacity at a lower cost. By partially overlapping (interlacing) each bottom track with two adjacent top tracks, IMR-based HDDs successfully increase the data density while incurring some hardware write constraints. To update each bottom track, the data on two adjacent top tracks must be read and rewritten to avoid losing their valid data, resulting in additional overhead for performing read-modify-write (RMW) operations. Therefore, researchers have proposed various data management schemes to mitigate such overhead in recent years, aiming at improving the write performance. However, these designs have not taken into account the data characteristics of the file system, which is a crucial layer of operating systems for storing/retrieving data into/from HDDs. Consequently, the write performance improvement is limited due to the unawareness of spatial locality and hotness of data. This paper proposes a file-system-aware data management scheme called FSIMR to improve system write performance. Noticing that data of the same directory may have higher spatial locality and are mostly updated at the same time, FSIMR logically partitions the IMR-based HDD into fixed-sized zones; data belonging to the same directory will be arranged to one zone to reduce the time of seeking to-be-updated data (seek time). Furthermore, cold data within a zone are arranged to bottom tracks and updated in an out-of-place manner to eliminate RMW operations. Our experimental results show that the proposed FSIMR could reduce the seek time by up to 14% without introducing additional RMW operations, compared to existing designs.
交错磁记录(IMR)是一种新兴的用于硬盘驱动器(hdd)的记录技术,它以更低的成本提供更大的存储容量。通过将每个底部磁道与两个相邻的顶部磁道部分重叠(交错),基于imr的hdd成功地增加了数据密度,同时产生了一些硬件写入约束。为了更新每个底部磁道,必须读取和重写两个相邻顶部磁道上的数据,以避免丢失它们的有效数据,从而导致执行读-修改-写(RMW)操作的额外开销。因此,近年来研究人员提出了各种数据管理方案来减轻这种开销,旨在提高写性能。然而,这些设计并没有考虑到文件系统的数据特性,而文件系统是操作系统中用于向hdd中存储/检索数据的关键层。因此,由于不知道数据的空间局域性和热度,写性能的提高受到限制。为了提高系统的写性能,本文提出了一种文件系统感知的数据管理方案FSIMR。注意到同一目录的数据可能具有更高的空间局部性,并且大多同时更新,FSIMR在逻辑上将基于imr的HDD划分为固定大小的区域;属于同一目录的数据将被安排到一个区域,以减少查找待更新数据的时间(查找时间)。此外,区域内的冷数据被安排到底部轨道,并以异地方式更新,以消除RMW操作。我们的实验结果表明,与现有设计相比,所提出的FSIMR可以在不引入额外RMW操作的情况下减少高达14%的寻道时间。
{"title":"FSIMR: File-system-aware Data Management for Interlaced Magnetic Recording","authors":"Yi-Han Lien, Yen-Ting Chen, Yuan-Hao Chang, Yu-Pei Liang, Wei-Kuan Shih","doi":"10.1145/3607922","DOIUrl":"https://doi.org/10.1145/3607922","url":null,"abstract":"Interlaced Magnetic Recording (IMR) is an emerging recording technology for hard-disk drives (HDDs) that provides larger storage capacity at a lower cost. By partially overlapping (interlacing) each bottom track with two adjacent top tracks, IMR-based HDDs successfully increase the data density while incurring some hardware write constraints. To update each bottom track, the data on two adjacent top tracks must be read and rewritten to avoid losing their valid data, resulting in additional overhead for performing read-modify-write (RMW) operations. Therefore, researchers have proposed various data management schemes to mitigate such overhead in recent years, aiming at improving the write performance. However, these designs have not taken into account the data characteristics of the file system, which is a crucial layer of operating systems for storing/retrieving data into/from HDDs. Consequently, the write performance improvement is limited due to the unawareness of spatial locality and hotness of data. This paper proposes a file-system-aware data management scheme called FSIMR to improve system write performance. Noticing that data of the same directory may have higher spatial locality and are mostly updated at the same time, FSIMR logically partitions the IMR-based HDD into fixed-sized zones; data belonging to the same directory will be arranged to one zone to reduce the time of seeking to-be-updated data (seek time). Furthermore, cold data within a zone are arranged to bottom tracks and updated in an out-of-place manner to eliminate RMW operations. Our experimental results show that the proposed FSIMR could reduce the seek time by up to 14% without introducing additional RMW operations, compared to existing designs.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"136192592","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
ACM Transactions on Embedded Computing Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1