Internet and IoT Applications generate large amounts of data that require efficient storage and processing. Emerging Compute Express Link (CXL) and Non-Volatile Memories (NVM) bring new opportunities for in-memory computing by reducing the latency of data access and processing. Many in-memory file systems based on the Hybrid DRAM/NVM are designed for high performance. However, achieving high performance under Non-Uniform Memory Access (NUMA) multiprocessor systems has significant challenges. In particular, the performance of file requests on NUMA systems varies over a disturbingly wide range, depending on the affinity of threads to file data. Moreover, memory controllers and interconnect links congestion bring excessive latency and performance loss on file accesses. Therefore, both the placement of file and thread and load balance are critical for data-intensive applications on NUMA systems. In this paper, we optimize the performance of multiple threads requesting in-memory files on NUMA systems by considering both memory congestion and data locality. First, we present the system model and formulate the problem as latency minimization on NUMA nodes. Then, we present a two-layer design to optimize the performance by properly migrating threads and dynamically adjusting the file distribution. Further, based on the design, we implement a functional NUMA-aware in-memory file system, Hydrafs-RFCT, in the Linux kernel. Experimental results show that the Hydrafs-RFCT optimizes the performance of multi-thread applications on NUMA systems. The average aggravated performance of Hydrafs-RFCT is 100.14 %, 112.7 %, 39.4 %, and 6.4 % higher than that of Ext4-DAX, PMFS, SIMFS, and Hydrafs, respectively.