Recently, embodied intelligence has emerged as a viable approach to achieving human-like perception, reasoning, decision-making, and execution capacities within human-robot collaborative (HRC) assembly contexts. Due to the lack of generalized enabling technologies and disconnections from physical control systems, embodied intelligence requires repetitive training of various functional models to operate in dynamic HRC scenarios, thereby struggling to adapt effectively to complex and evolving HRC environments. Hence, this study proposes a vision-language model (VLM)-enhanced embodied intelligence framework for digital twin (DT)-assisted human-robot collaborative assembly. Initially, the mapping between embodied agents and physical robots is established to achieve the encapsulation of embodied agents. Building upon the agent-based architecture, a VLM driven by both domain knowledge and real-time scenario data is constructed with sensory capabilities. Based on this, rapid recognition and response to dynamic HRC environments can be realized. Leveraging the strong generalization of VLMs, repetitive training of multiple perception models is circumvented. Furthermore, by utilizing the cognitive learning and intelligent reasoning capabilities of VLMs, an expert knowledge system for assembly processes is developed to provide task-oriented assistance and solution generation. To enhance the adaptability and generalization of complex HRC decision-making, VLMs support reinforcement learning through flexible configuration of HRC assembly state information processing, decision-action generation and guidance, and reward function design. In addition, a DT model of the HRC scenario is constructed to provide a simulation and deduction engine (i.e., embodied brain) for mitigating collision accidents. The decision results are then fed into the VLM as invocation parameters for corresponding sub-function code modules, generating complete collaborative robot action code to form the embodied neuron. Finally, compared with traditional decision methods (e.g., MA-A2C, DQN and GA) and VLM-enhanced MA-A2C, a series of comparative experiments conducted in a real-world HRC assembly scenario demonstrate that the proposed framework exhibits competitive advantages.
扫码关注我们
求助内容:
应助结果提醒方式:
