In the context of Industry 4.0, manufacturing systems face increased complexity and uncertainty due to elevated product customisation and demand variability. This paper presents a novel framework for adaptive Work-In-Progress (WIP) control in semi-heterarchical architectures, addressing the limitations of traditional analytical methods that rely on exponential processing time distributions. Integrating Deep Reinforcement Learning (DRL) with Discrete Event Simulation (DES) enables model-free control of flow-shop production systems under non-exponential, stochastic processing times. A Deep Q-Network (DQN) agent dynamically manages WIP levels in a CONstant Work In Progress (CONWIP) environment, learning optimal control policies directly from system interactions. The framework’s effectiveness is demonstrated through extensive experiments with varying machine numbers, processing times, and system variability. The results show robust performance in tracking the target throughput and adapting the processing time variability, achieving Mean Absolute Percentual Errors (MAPE) in the throughput – calculated as the percentage difference between the actual and the target throughput – ranging from 0.3% to 2.3% with standard deviations of 5. 5% to 8. 4%. Key contributions include the development of a data-driven WIP control approach to overcome analytical methods’ limitations in stochastic environments, validating DQN agent adaptability across varying production scenarios, and demonstrating framework scalability in realistic manufacturing settings. This research bridges the gap between conventional WIP control methods and Industry 4.0 requirements, offering manufacturers an adaptive solution for enhanced production efficiency.