Optimizing the performance of in-memory file system by thread scheduling and file migration under NUMA multiprocessor systems

IF 4.1 2区 计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE Journal of Systems Architecture Pub Date : 2025-02-01 Epub Date: 2025-01-16 DOI:10.1016/j.sysarc.2025.103344
Ting Wu , Jingting He , Ying Qian , Weichen Liu
{"title":"Optimizing the performance of in-memory file system by thread scheduling and file migration under NUMA multiprocessor systems","authors":"Ting Wu ,&nbsp;Jingting He ,&nbsp;Ying Qian ,&nbsp;Weichen Liu","doi":"10.1016/j.sysarc.2025.103344","DOIUrl":null,"url":null,"abstract":"<div><div>Internet and IoT Applications generate large amounts of data that require efficient storage and processing. Emerging Compute Express Link (CXL) and Non-Volatile Memories (NVM) bring new opportunities for in-memory computing by reducing the latency of data access and processing. Many in-memory file systems based on the Hybrid DRAM/NVM are designed for high performance. However, achieving high performance under Non-Uniform Memory Access (NUMA) multiprocessor systems has significant challenges. In particular, the performance of file requests on NUMA systems varies over a disturbingly wide range, depending on the affinity of threads to file data. Moreover, memory controllers and interconnect links congestion bring excessive latency and performance loss on file accesses. Therefore, both the placement of file and thread and load balance are critical for data-intensive applications on NUMA systems. In this paper, we optimize the performance of multiple threads requesting in-memory files on NUMA systems by considering both memory congestion and data locality. First, we present the system model and formulate the problem as latency minimization on NUMA nodes. Then, we present a two-layer design to optimize the performance by properly migrating threads and dynamically adjusting the file distribution. Further, based on the design, we implement a functional NUMA-aware in-memory file system, Hydrafs-RFCT, in the Linux kernel. Experimental results show that the Hydrafs-RFCT optimizes the performance of multi-thread applications on NUMA systems. The average aggravated performance of Hydrafs-RFCT is 100.14 %, 112.7 %, 39.4 %, and 6.4 % higher than that of Ext4-DAX, PMFS, SIMFS, and Hydrafs, respectively.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"159 ","pages":"Article 103344"},"PeriodicalIF":4.1000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems Architecture","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1383762125000165","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/16 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Internet and IoT Applications generate large amounts of data that require efficient storage and processing. Emerging Compute Express Link (CXL) and Non-Volatile Memories (NVM) bring new opportunities for in-memory computing by reducing the latency of data access and processing. Many in-memory file systems based on the Hybrid DRAM/NVM are designed for high performance. However, achieving high performance under Non-Uniform Memory Access (NUMA) multiprocessor systems has significant challenges. In particular, the performance of file requests on NUMA systems varies over a disturbingly wide range, depending on the affinity of threads to file data. Moreover, memory controllers and interconnect links congestion bring excessive latency and performance loss on file accesses. Therefore, both the placement of file and thread and load balance are critical for data-intensive applications on NUMA systems. In this paper, we optimize the performance of multiple threads requesting in-memory files on NUMA systems by considering both memory congestion and data locality. First, we present the system model and formulate the problem as latency minimization on NUMA nodes. Then, we present a two-layer design to optimize the performance by properly migrating threads and dynamically adjusting the file distribution. Further, based on the design, we implement a functional NUMA-aware in-memory file system, Hydrafs-RFCT, in the Linux kernel. Experimental results show that the Hydrafs-RFCT optimizes the performance of multi-thread applications on NUMA systems. The average aggravated performance of Hydrafs-RFCT is 100.14 %, 112.7 %, 39.4 %, and 6.4 % higher than that of Ext4-DAX, PMFS, SIMFS, and Hydrafs, respectively.
查看原文
分享 分享
微信好友 朋友圈 QQ好友 复制链接
本刊更多论文
在NUMA多处理器系统下,通过线程调度和文件迁移优化内存文件系统的性能
互联网和物联网应用产生大量数据,需要高效存储和处理。新兴的计算快速链路(Compute Express Link, CXL)和非易失性存储器(Non-Volatile memory, NVM)通过减少数据访问和处理的延迟,为内存计算带来了新的机遇。许多基于混合DRAM/NVM的内存文件系统都是为高性能而设计的。然而,在非统一内存访问(NUMA)多处理器系统下实现高性能面临着重大挑战。特别是,NUMA系统上文件请求的性能变化范围之广令人不安,这取决于线程对文件数据的亲缘性。此外,内存控制器和互连链路拥塞给文件访问带来过大的延迟和性能损失。因此,对于NUMA系统上的数据密集型应用程序来说,文件和线程的放置以及负载平衡都是至关重要的。在本文中,我们通过考虑内存拥塞和数据局部性来优化NUMA系统上请求内存中文件的多线程的性能。首先,我们提出了系统模型,并将问题表述为NUMA节点上的延迟最小化。然后,我们提出了一个两层设计,通过适当的线程迁移和动态调整文件分布来优化性能。在此基础上,我们在Linux内核中实现了一个功能性的numa感知内存文件系统Hydrafs-RFCT。实验结果表明,hydrafs - rct优化了NUMA系统上多线程应用程序的性能。与Ext4-DAX、PMFS、SIMFS和Hydrafs相比,Hydrafs- rfct的平均强化性能分别提高了100.14%、112.7%、39.4%和6.4%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 去求助
来源期刊
Journal of Systems Architecture
Journal of Systems Architecture 工程技术-计算机:硬件
CiteScore
8.70
自引率
15.60%
发文量
226
审稿时长
46 days
期刊介绍: The Journal of Systems Architecture: Embedded Software Design (JSA) is a journal covering all design and architectural aspects related to embedded systems and software. It ranges from the microarchitecture level via the system software level up to the application-specific architecture level. Aspects such as real-time systems, operating systems, FPGA programming, programming languages, communications (limited to analysis and the software stack), mobile systems, parallel and distributed architectures as well as additional subjects in the computer and system architecture area will fall within the scope of this journal. Technology will not be a main focus, but its use and relevance to particular designs will be. Case studies are welcome but must contribute more than just a design for a particular piece of software. Design automation of such systems including methodologies, techniques and tools for their design as well as novel designs of software components fall within the scope of this journal. Novel applications that use embedded systems are also central in this journal. While hardware is not a part of this journal hardware/software co-design methods that consider interplay between software and hardware components with and emphasis on software are also relevant here.
期刊最新文献
Resource-efficient scheduling of parallel DAG tasks on identical multiprocessors On the power saving in high-speed Ethernet-based networks for supercomputers and data centers Distributed replica allocation and load balancing for Edge–Cloud FaaS Statistical prototype exchange and alignment for personalized federated learning FLSAMW: Mitigating backdoor attacks in federated learning based on SVD and amplified model weight
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
已复制链接
已复制链接
快去分享给好友吧!
我知道了
×
扫码分享
扫码分享
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1