The integration of modern manufacturing systems has promised increased flexibility, productivity, and efficiency. In such an environment, collaboration between humans and robots in a shared workspace is essential to effectively accomplish shared tasks. Strong communication among partners is essential for collaborative efficiency. This research investigates an approach to non-verbal communication cues. The system focuses on integrating human motion detection with vision sensors. This method addresses the bias human action detection in frames and enhances the accuracy of perception as information about human activities to the robot. By interpreting spatial and temporal data, the system detects human movements through sequences of human activity frames while working together. The training and validation results confirm that the approach achieves an accuracy of 91%. The sequential testing performance showed an average detection of 83%. This research not only emphasizes the importance of advanced communication in human–robot collaboration, but also effectively promotes future developments in collaborative robotics.