Cop-Flash: Utilizing hybrid storage to construct a large, efficient, and durable computational storage for DNN training

Q1 Computer Science IEEE Cloud Computing Pub Date : 2022-07-01 DOI:10.1109/CLOUD55607.2022.00041
Chunhua Xiao, S. Qiu, Dandan Xu
{"title":"Cop-Flash: Utilizing hybrid storage to construct a large, efficient, and durable computational storage for DNN training","authors":"Chunhua Xiao, S. Qiu, Dandan Xu","doi":"10.1109/CLOUD55607.2022.00041","DOIUrl":null,"url":null,"abstract":"Traditional computing architectures that separate computing from storage face severe limitations when processing the data that is continuously produced in the cloud and at the edge. Recently, the computational storage device (CSD) is becoming one of the critical cloud infrastructures which can overcome these limitations. Many studies utilize CSD for DNN training to extract useful information and knowledge from the data quickly and efficiently. However, all previous work has used homogeneous storage, which is not fully considered the requirements of DNN training on CSD. Thus, we exploit the leverage of hybrid NAND flash memory to optimize this problem. Nevertheless, typical hybrid storage architectures have limitations when used for DNN training. Moreover, their management strategies can not fully exploit the heterogeneity of hybrid flash memory. To address this issue, we propose a novel SLC-TLC flash memory called Co-Partitioning Flash (Cop-Flash), which utilizes two different hybrid flash memory partitioning methods to divide storage into three different properties of flash memory. Meanwhile, two key technologies are included in Cop-Flash: 1) lifetime-based I/O identifier is proposed to identify data hotness according to data lifetime to maximize the benefits of heterogeneity and minimize the impact of garbage collection. 2) Erase-aware Adaptive Dual-zone Management is proposed to increase bandwidth utilization and guarantee system reliability. We compared Cop-Flash with two related state-of-the-art hybrid storage using hard partitioning and soft partitioning as well as TLC-only flash memory under real DNN training workloads. Experimental results show that Cop-Flash improves the performance by 29.1%, 38.8%, 56.6% and outperforms them by 2.3x, 1.29x, and 8.3x in terms of lifespan.","PeriodicalId":54281,"journal":{"name":"IEEE Cloud Computing","volume":"9 1","pages":"209-218"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLOUD55607.2022.00041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0

Abstract

Traditional computing architectures that separate computing from storage face severe limitations when processing the data that is continuously produced in the cloud and at the edge. Recently, the computational storage device (CSD) is becoming one of the critical cloud infrastructures which can overcome these limitations. Many studies utilize CSD for DNN training to extract useful information and knowledge from the data quickly and efficiently. However, all previous work has used homogeneous storage, which is not fully considered the requirements of DNN training on CSD. Thus, we exploit the leverage of hybrid NAND flash memory to optimize this problem. Nevertheless, typical hybrid storage architectures have limitations when used for DNN training. Moreover, their management strategies can not fully exploit the heterogeneity of hybrid flash memory. To address this issue, we propose a novel SLC-TLC flash memory called Co-Partitioning Flash (Cop-Flash), which utilizes two different hybrid flash memory partitioning methods to divide storage into three different properties of flash memory. Meanwhile, two key technologies are included in Cop-Flash: 1) lifetime-based I/O identifier is proposed to identify data hotness according to data lifetime to maximize the benefits of heterogeneity and minimize the impact of garbage collection. 2) Erase-aware Adaptive Dual-zone Management is proposed to increase bandwidth utilization and guarantee system reliability. We compared Cop-Flash with two related state-of-the-art hybrid storage using hard partitioning and soft partitioning as well as TLC-only flash memory under real DNN training workloads. Experimental results show that Cop-Flash improves the performance by 29.1%, 38.8%, 56.6% and outperforms them by 2.3x, 1.29x, and 8.3x in terms of lifespan.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
Cop-Flash:利用混合存储为深度神经网络训练构建一个大型、高效、持久的计算存储
将计算与存储分离的传统计算架构在处理云和边缘不断产生的数据时面临严重的限制。最近,计算存储设备(CSD)正成为克服这些限制的关键云基础设施之一。许多研究利用CSD进行深度神经网络训练,以快速有效地从数据中提取有用的信息和知识。然而,以往的工作都是使用同构存储,没有充分考虑DNN在CSD上训练的要求。因此,我们利用混合NAND闪存的杠杆来优化这个问题。然而,典型的混合存储架构在用于深度神经网络训练时存在局限性。此外,它们的管理策略不能充分利用混合快闪记忆体的异质性。为了解决这一问题,我们提出了一种新型的SLC-TLC闪存,称为Co-Partitioning flash (Cop-Flash),它利用两种不同的混合闪存分区方法将存储划分为三种不同属性的闪存。同时,Cop-Flash包含了两项关键技术:1)提出了基于生命周期的I/O标识符,根据数据生命周期识别数据热度,实现异构效益最大化和垃圾回收影响最小化。2)提出了擦除感知自适应双区管理,提高带宽利用率,保证系统可靠性。我们将Cop-Flash与两种相关的最先进的混合存储(使用硬分区和软分区)以及TLC-only闪存在真实DNN训练工作负载下进行了比较。实验结果表明,Cop-Flash的性能分别提高了29.1%、38.8%、56.6%,寿命分别提高了2.3倍、1.29倍和8.3倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
IEEE Cloud Computing
IEEE Cloud Computing Computer Science-Computer Networks and Communications
CiteScore
11.20
自引率
0.00%
发文量
0
期刊介绍: Cessation. IEEE Cloud Computing is committed to the timely publication of peer-reviewed articles that provide innovative research ideas, applications results, and case studies in all areas of cloud computing. Topics relating to novel theory, algorithms, performance analyses and applications of techniques are covered. More specifically: Cloud software, Cloud security, Trade-offs between privacy and utility of cloud, Cloud in the business environment, Cloud economics, Cloud governance, Migrating to the cloud, Cloud standards, Development tools, Backup and recovery, Interoperability, Applications management, Data analytics, Communications protocols, Mobile cloud, Private clouds, Liability issues for data loss on clouds, Data integration, Big data, Cloud education, Cloud skill sets, Cloud energy consumption, The architecture of cloud computing, Applications in commerce, education, and industry, Infrastructure as a Service (IaaS), Platform as a Service (PaaS), Software as a Service (SaaS), Business Process as a Service (BPaaS)
期刊最新文献
Different in different ways: A network-analysis approach to voice and prosody in Autism Spectrum Disorder. Layered Contention Mitigation for Cloud Storage Towards More Effective and Explainable Fault Management Using Cross-Layer Service Topology Bypass Container Overlay Networks with Transparent BPF-driven Socket Replacement Event-Driven Approach for Monitoring and Orchestration of Cloud and Edge-Enabled IoT Systems
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1