首页 > 最新文献

IEEE Transactions on Systems Man Cybernetics-Systems最新文献

英文 中文
Muscle-Targeted Robotic Assistive Control Using Musculoskeletal Model of the Lower Limb 基于下肢肌肉骨骼模型的肌肉定向机器人辅助控制
IF 8.6 1区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-12-05 DOI: 10.1109/TSMC.2024.3506495
Rafael J. Escarabajal;Pau Zamora-Ortiz;José L. Pulloquinga;Marina Vallés;Ángel Valera
Conventional assistive and rehabilitative robotic systems often overlook human biomechanics, particularly muscular forces, as they predominantly operate in joint or task space and focus on position and exchanged forces. Similarly, traditional manual rehabilitation techniques employed by physiotherapists struggle to obtain quantitative measurements and make precise modifications to key human variables, resulting in predominantly qualitative methods and outcomes. In response to these limitations, this article introduces an innovative assistive robot controller that operates in the muscular space, targeting specific muscles in the lower limb, and distinguishing itself from existing solutions that focus primarily on joint or task space. A key innovation of our approach is the real-time measurement of muscular forces during dynamic tasks, obtained from a calibrated musculoskeletal model. These measurements enable the establishment of a multistep closed-loop controller, with the outer loop precisely tracking the desired muscular forces. Implemented within a configurable viscous environment, the controller provides a natural response for the user. Experimental evaluations conducted using a parallel robot designed for rehabilitation demonstrate the controller’s efficacy. Incorporating the outer loop reduced the median relative error of the tracked muscular force by nearly 80% and decreased the variability of this error by over 85% compared to a pure viscous environment defined as the baseline. These findings highlight the potential applications of this control framework in areas, such as assistive robotics and precision rehabilitation. By achieving objective measurement and control, the system may enhance rehabilitation outcomes, offering tailored exercises that match the individual needs, capabilities, and engagement of each patient.
传统的辅助和康复机器人系统往往忽略了人类的生物力学,特别是肌肉力量,因为它们主要在关节或任务空间操作,专注于位置和交换力。同样,物理治疗师使用的传统手工康复技术难以获得定量测量并对关键的人类变量进行精确修改,从而导致主要的定性方法和结果。针对这些限制,本文介绍了一种创新的辅助机器人控制器,该控制器在肌肉空间中操作,针对下肢的特定肌肉,并将其与主要关注关节或任务空间的现有解决方案区分开来。我们方法的一个关键创新是动态任务中肌肉力的实时测量,从校准的肌肉骨骼模型中获得。这些测量可以建立一个多步闭环控制器,外环精确地跟踪所需的肌肉力。在可配置的粘性环境中实现,控制器为用户提供自然响应。用一种设计用于康复的并联机器人进行的实验评估证明了控制器的有效性。与定义为基线的纯粘性环境相比,结合外环将跟踪肌肉力的中位数相对误差降低了近80%,并将该误差的可变性降低了85%以上。这些发现突出了这种控制框架在辅助机器人和精确康复等领域的潜在应用。通过实现客观的测量和控制,该系统可以提高康复效果,提供与每个患者的个人需求、能力和参与相匹配的量身定制的练习。
{"title":"Muscle-Targeted Robotic Assistive Control Using Musculoskeletal Model of the Lower Limb","authors":"Rafael J. Escarabajal;Pau Zamora-Ortiz;José L. Pulloquinga;Marina Vallés;Ángel Valera","doi":"10.1109/TSMC.2024.3506495","DOIUrl":"https://doi.org/10.1109/TSMC.2024.3506495","url":null,"abstract":"Conventional assistive and rehabilitative robotic systems often overlook human biomechanics, particularly muscular forces, as they predominantly operate in joint or task space and focus on position and exchanged forces. Similarly, traditional manual rehabilitation techniques employed by physiotherapists struggle to obtain quantitative measurements and make precise modifications to key human variables, resulting in predominantly qualitative methods and outcomes. In response to these limitations, this article introduces an innovative assistive robot controller that operates in the muscular space, targeting specific muscles in the lower limb, and distinguishing itself from existing solutions that focus primarily on joint or task space. A key innovation of our approach is the real-time measurement of muscular forces during dynamic tasks, obtained from a calibrated musculoskeletal model. These measurements enable the establishment of a multistep closed-loop controller, with the outer loop precisely tracking the desired muscular forces. Implemented within a configurable viscous environment, the controller provides a natural response for the user. Experimental evaluations conducted using a parallel robot designed for rehabilitation demonstrate the controller’s efficacy. Incorporating the outer loop reduced the median relative error of the tracked muscular force by nearly 80% and decreased the variability of this error by over 85% compared to a pure viscous environment defined as the baseline. These findings highlight the potential applications of this control framework in areas, such as assistive robotics and precision rehabilitation. By achieving objective measurement and control, the system may enhance rehabilitation outcomes, offering tailored exercises that match the individual needs, capabilities, and engagement of each patient.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 2","pages":"1537-1548"},"PeriodicalIF":8.6,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993467","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Neuroadaptive Fixed-Time Synchronous Control With Composite Learning Policy for Robotic Multifingers 基于复合学习策略的机器人多指神经自适应固定时间同步控制
IF 8.6 1区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-12-05 DOI: 10.1109/TSMC.2024.3500776
Xingqiang Zhao;Yantong Zhang;Yongduan Song
Dexterous manipulation of anthropomorphic multifinger robotic hands (MFRHs) is crucial for performing diverse and intricate tasks, where collaboration among the fingers is essential. This article presents a novel neural network-based composite learning strategy tailored for the synchronous control of multiple fingers in anthropomorphic MFRHs subjected to unknown dynamics and disturbances. By leveraging graph theory, the interconnections among fingers are delineated and integrated into the dynamic equations. The modified nonsingular terminal sliding mode (TSM) technique is employed to achieve fixed-time convergence of error variables without triggering singularity. Within the framework of composite learning, a novel computable prediction error is formulated by harnessing online historical data alongside the regression matrix. The combination of prediction errors and the regression matrix is utilized for parameter estimation, which, under a milder interval excitation (IE) condition, facilitates accurate parameter estimation without the requirement for the stringent persistent excitation (PE) condition. The feasibility and effectiveness of the proposed technique are demonstrated through simulation experiments.
拟人化多指机器人手(MFRHs)的灵巧操作对于执行各种复杂任务至关重要,其中手指之间的协作是必不可少的。本文提出了一种新的基于神经网络的复合学习策略,用于拟人化mfrh中受未知动态和干扰的多指同步控制。利用图论,手指之间的相互联系被描绘并集成到动态方程中。采用改进的非奇异终端滑模(TSM)技术,在不触发奇异的情况下实现误差变量的定时收敛。在复合学习的框架内,通过利用在线历史数据和回归矩阵制定了一种新的可计算预测误差。采用预测误差与回归矩阵相结合的方法进行参数估计,在较温和的区间激励(IE)条件下,不需要严格的持续激励(PE)条件,就能准确估计参数。仿真实验验证了该方法的可行性和有效性。
{"title":"Neuroadaptive Fixed-Time Synchronous Control With Composite Learning Policy for Robotic Multifingers","authors":"Xingqiang Zhao;Yantong Zhang;Yongduan Song","doi":"10.1109/TSMC.2024.3500776","DOIUrl":"https://doi.org/10.1109/TSMC.2024.3500776","url":null,"abstract":"Dexterous manipulation of anthropomorphic multifinger robotic hands (MFRHs) is crucial for performing diverse and intricate tasks, where collaboration among the fingers is essential. This article presents a novel neural network-based composite learning strategy tailored for the synchronous control of multiple fingers in anthropomorphic MFRHs subjected to unknown dynamics and disturbances. By leveraging graph theory, the interconnections among fingers are delineated and integrated into the dynamic equations. The modified nonsingular terminal sliding mode (TSM) technique is employed to achieve fixed-time convergence of error variables without triggering singularity. Within the framework of composite learning, a novel computable prediction error is formulated by harnessing online historical data alongside the regression matrix. The combination of prediction errors and the regression matrix is utilized for parameter estimation, which, under a milder interval excitation (IE) condition, facilitates accurate parameter estimation without the requirement for the stringent persistent excitation (PE) condition. The feasibility and effectiveness of the proposed technique are demonstrated through simulation experiments.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 2","pages":"1230-1240"},"PeriodicalIF":8.6,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Predictor-Based Fixed-Time Neural Dynamics Surface Tracking Control for Nonlinear Systems With Unknown Backlash-Like Hysteresis 未知类逆激滞后非线性系统的基于预测器的固定时间神经动力学曲面跟踪控制
IF 8.6 1区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-12-04 DOI: 10.1109/TSMC.2024.3505152
Huaguang Zhang;Jiawei Ma;Juan Zhang;Le Wang
The issue of predictor-based neural fixed-time dynamic surface control for the nonlinear systems with unknown backlash-like hysteresis is the research focus of this article. By applying the predictor-based neural control scheme, the system nonlinear functions can be smoothly estimated. In addition, an improved dynamics surface is proposed to decrease the difficulty of the controller design procedure while ensuring that the dynamic surface compensating signals can satisfy the fixed-time stability. Further, on the basis of fixed-time theorem and backstepping control technology, the designed controller can ensure all signals of the considered closed-loop systems are fixed-time bounded in the presence of unknown backlash-like hysteresis. Eventually, the simulation cases are given to imply the effectiveness of the designed method.
本文的研究重点是基于预测器的非线性系统的定时动态面控制问题。采用基于预测器的神经网络控制方案,可以对系统的非线性函数进行平滑估计。此外,还提出了一种改进的动态面,在保证动态面补偿信号满足定时稳定性的同时,降低了控制器设计过程的难度。进一步,基于固定时间定理和反演控制技术,所设计的控制器可以保证在存在未知类逆激滞后的情况下,所考虑的闭环系统的所有信号都是固定时间有界的。最后通过仿真实例验证了所设计方法的有效性。
{"title":"Predictor-Based Fixed-Time Neural Dynamics Surface Tracking Control for Nonlinear Systems With Unknown Backlash-Like Hysteresis","authors":"Huaguang Zhang;Jiawei Ma;Juan Zhang;Le Wang","doi":"10.1109/TSMC.2024.3505152","DOIUrl":"https://doi.org/10.1109/TSMC.2024.3505152","url":null,"abstract":"The issue of predictor-based neural fixed-time dynamic surface control for the nonlinear systems with unknown backlash-like hysteresis is the research focus of this article. By applying the predictor-based neural control scheme, the system nonlinear functions can be smoothly estimated. In addition, an improved dynamics surface is proposed to decrease the difficulty of the controller design procedure while ensuring that the dynamic surface compensating signals can satisfy the fixed-time stability. Further, on the basis of fixed-time theorem and backstepping control technology, the designed controller can ensure all signals of the considered closed-loop systems are fixed-time bounded in the presence of unknown backlash-like hysteresis. Eventually, the simulation cases are given to imply the effectiveness of the designed method.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 2","pages":"1506-1515"},"PeriodicalIF":8.6,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993470","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Resilient Distributed Control and Target Tracking in Multiagent Systems Against Composite Attacks 多智能体系统抗复合攻击的弹性分布式控制与目标跟踪
IF 8.6 1区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-12-04 DOI: 10.1109/TSMC.2024.3494769
Yukang Cui;Ahmadreza Jenabzadeh;Zahoor Ahmed;Weidong Zhang;Tingwen Huang
This article copes with the distributed control and target tracking (DCTT) problem in general linear and Lipschitz multiagent systems (MASs). In comparison to the traditional DCTT algorithms that were developed for MASs in ideal conditions, two schemes based upon a resilient protocol are proposed for linear and nonlinear MASs to estimate and track a mobile target where all agents are subject to composite attacks, including camouflage attacks, DoS attacks, sensor attacks, and actuator attacks. Based on the digital twin approach, a twin layer (TL) with high privacy and security is introduced to separate the problem of DCTT into two tasks: 1) handling DoS attacks on the TL and defending against sensor and 2) actuator attacks on the cyber-physical layer (CPL). First, two distributed estimation algorithms are established to reconstruct the agents and target dynamics for every agent on the TL in the presence of DoS attacks. Second, using the reconstructed agents and target dynamics on the TL, a resilient distributed control protocol is designed to resist sensor and actuator attacks on the CPL. The current scheme guarantees the achievement of control and target tracking such that the DCTT error of the proposed design is ultimately bounded in terms of linear matrix inequality. By applying two simulation examples, the presented algorithms are also validated.
本文研究了一般线性和Lipschitz多智能体系统的分布式控制和目标跟踪问题。与理想条件下针对MASs开发的传统DCTT算法相比,提出了两种基于弹性协议的线性和非线性MASs估计和跟踪移动目标的方案,其中所有agent都受到复合攻击,包括伪装攻击、DoS攻击、传感器攻击和执行器攻击。基于数字孪生方法,引入具有高隐私性和安全性的孪生层(TL),将DCTT问题分解为两个任务:1)处理TL上的DoS攻击和防御传感器攻击;2)网络物理层(CPL)上的执行器攻击。首先,建立了两种分布式估计算法,用于重建TL上存在DoS攻击时每个agent的代理和目标动态。其次,利用重构的智能体和TL上的目标动力学,设计了一种弹性分布式控制协议,以抵抗传感器和执行器对cpll的攻击。目前的方案保证了控制和目标跟踪的实现,使得所提设计的DCTT误差最终以线性矩阵不等式有界。通过两个仿真实例,验证了所提算法的有效性。
{"title":"Resilient Distributed Control and Target Tracking in Multiagent Systems Against Composite Attacks","authors":"Yukang Cui;Ahmadreza Jenabzadeh;Zahoor Ahmed;Weidong Zhang;Tingwen Huang","doi":"10.1109/TSMC.2024.3494769","DOIUrl":"https://doi.org/10.1109/TSMC.2024.3494769","url":null,"abstract":"This article copes with the distributed control and target tracking (DCTT) problem in general linear and Lipschitz multiagent systems (MASs). In comparison to the traditional DCTT algorithms that were developed for MASs in ideal conditions, two schemes based upon a resilient protocol are proposed for linear and nonlinear MASs to estimate and track a mobile target where all agents are subject to composite attacks, including camouflage attacks, DoS attacks, sensor attacks, and actuator attacks. Based on the digital twin approach, a twin layer (TL) with high privacy and security is introduced to separate the problem of DCTT into two tasks: 1) handling DoS attacks on the TL and defending against sensor and 2) actuator attacks on the cyber-physical layer (CPL). First, two distributed estimation algorithms are established to reconstruct the agents and target dynamics for every agent on the TL in the presence of DoS attacks. Second, using the reconstructed agents and target dynamics on the TL, a resilient distributed control protocol is designed to resist sensor and actuator attacks on the CPL. The current scheme guarantees the achievement of control and target tracking such that the DCTT error of the proposed design is ultimately bounded in terms of linear matrix inequality. By applying two simulation examples, the presented algorithms are also validated.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 2","pages":"1252-1263"},"PeriodicalIF":8.6,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Soft Bilinear Inverted Pendulum: A Model to Enable Locomotion With Soft Contacts 软双线性倒立摆:实现软接触运动的模型
IF 8.6 1区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-12-04 DOI: 10.1109/TSMC.2024.3504342
Davide De Benedittis;Franco Angelini;Manolo Garabini
The robotics research community has developed several effective techniques for quadrupedal locomotion. Most of these methods ease the modeling and control problem by assuming a rigid contact between the feet and the terrain. However, in the case of compliant terrain or robots equipped with soft feet, this assumption no longer holds, as the contact point moves and the reaction forces experience a delay. This article presents a novel approach for quadrupedal locomotion in the presence of soft contacts. The control architecture consists of two blocks: 1) upstream, the motion planner (MP) computes a feasible trajectory using model predictive control (MPC) and 2) downstream, the tracking controller (TC) employs hierarchical optimization (HO) to achieve motion tracking. This choice allows the control architecture to employ a large time horizon without heavily compromising the model’s accuracy. For the first time, both blocks consider the contact compliance: in the MP, the classic linear inverted pendulum model is extended by proposing the soft bilinear inverted pendulum (SBIP) model; conversely, the TC is a whole-body controller (WBC) that considers the full dynamics model, including the soft contacts. Simulations with multiple quadrupedal robots demonstrate that the proposed approach enables traversing soft terrains with improved stability and efficiency. Furthermore, the performance benefits of including the compliance in the MP and TC are evaluated. Finally, experiments on the SOLO12 robot walking on soft terrain validate the proposed approach’s effectiveness.
机器人研究界已经开发了几种有效的四足运动技术。这些方法大多通过假设足部与地形之间的刚性接触来简化建模和控制问题。然而,在柔顺地形或机器人配备软脚的情况下,这种假设不再成立,因为接触点移动并且反作用力经历延迟。本文提出了一种新颖的四足运动在软接触的存在。控制体系结构由两个模块组成:1)上游,运动规划器(MP)使用模型预测控制(MPC)计算可行的运动轨迹;2)下游,跟踪控制器(TC)使用分层优化(HO)实现运动跟踪。这种选择允许控制架构采用大的时间范围,而不会严重影响模型的准确性。首次考虑了两块体的接触柔度:在模型中,将经典的线性倒摆模型扩展为软双线性倒摆(ship)模型;相反,TC是一个全身控制器(WBC),它考虑了全动力学模型,包括软接触。对多台四足机器人的仿真结果表明,该方法能够以更高的稳定性和效率穿越软地形。此外,还评估了在MP和TC中包含遵从性的性能优势。最后,通过在软地形上行走的SOLO12机器人实验,验证了该方法的有效性。
{"title":"Soft Bilinear Inverted Pendulum: A Model to Enable Locomotion With Soft Contacts","authors":"Davide De Benedittis;Franco Angelini;Manolo Garabini","doi":"10.1109/TSMC.2024.3504342","DOIUrl":"https://doi.org/10.1109/TSMC.2024.3504342","url":null,"abstract":"The robotics research community has developed several effective techniques for quadrupedal locomotion. Most of these methods ease the modeling and control problem by assuming a rigid contact between the feet and the terrain. However, in the case of compliant terrain or robots equipped with soft feet, this assumption no longer holds, as the contact point moves and the reaction forces experience a delay. This article presents a novel approach for quadrupedal locomotion in the presence of soft contacts. The control architecture consists of two blocks: 1) upstream, the motion planner (MP) computes a feasible trajectory using model predictive control (MPC) and 2) downstream, the tracking controller (TC) employs hierarchical optimization (HO) to achieve motion tracking. This choice allows the control architecture to employ a large time horizon without heavily compromising the model’s accuracy. For the first time, both blocks consider the contact compliance: in the MP, the classic linear inverted pendulum model is extended by proposing the soft bilinear inverted pendulum (SBIP) model; conversely, the TC is a whole-body controller (WBC) that considers the full dynamics model, including the soft contacts. Simulations with multiple quadrupedal robots demonstrate that the proposed approach enables traversing soft terrains with improved stability and efficiency. Furthermore, the performance benefits of including the compliance in the MP and TC are evaluated. Finally, experiments on the SOLO12 robot walking on soft terrain validate the proposed approach’s effectiveness.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 2","pages":"1478-1491"},"PeriodicalIF":8.6,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10777856","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Effectiveness of Regularization Methods for Soft Actor-Critic in Discrete-Action Domains 离散行为域软行为评价的正则化方法有效性研究
IF 8.6 1区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-12-04 DOI: 10.1109/TSMC.2024.3505613
Bang Giang Le;Viet Cuong Ta
Soft actor-critic (SAC) is a reinforcement learning algorithm that employs the maximum entropy framework to train a stochastic policy. This work examines a specific failure case of SAC where the stochastic policy is trained to maximize the expected entropy from a sparse reward environment. We demonstrate that the over-exploration of SAC can make the entropy temperature collapse, followed by unstable updates to the actor. Based on our analyses, we introduce Reg-SAC, an improved version of SAC, to mitigate the detrimental effects of the entropy temperature on the learning stability of the stochastic policy. Reg-SAC incorporates a clipping value to prevent the entropy temperature collapse and regularizes the gradient updates of the policy via Kullback-Leibler divergence. Through experiments on discrete benchmarks, our proposed Reg-SAC outperforms the standard SAC in spare-reward grid world environments while it is able to maintain competitive performance in the dense-reward Atari benchmark. The results highlight that our regularized version makes the stochastic policy of SAC more stable in discrete-action domains.
软行为者批评(SAC)是一种采用最大熵框架来训练随机策略的强化学习算法。这项工作考察了SAC的一个特定失败案例,其中随机策略被训练为从稀疏奖励环境中最大化期望熵。我们证明了SAC的过度探索会导致熵温崩溃,随后会对行动者进行不稳定的更新。在此基础上,我们引入了一种改进的SAC - Reg-SAC,以减轻熵温对随机策略学习稳定性的不利影响。regg - sac采用了一个剪切值来防止熵温崩溃,并通过Kullback-Leibler散度对策略的梯度更新进行了正则化。通过在离散基准测试上的实验,我们提出的Reg-SAC在低奖励网格环境中优于标准SAC,同时能够在高奖励Atari基准测试中保持竞争性能。结果表明,我们的正则化版本使SAC的随机策略在离散作用域中更加稳定。
{"title":"On the Effectiveness of Regularization Methods for Soft Actor-Critic in Discrete-Action Domains","authors":"Bang Giang Le;Viet Cuong Ta","doi":"10.1109/TSMC.2024.3505613","DOIUrl":"https://doi.org/10.1109/TSMC.2024.3505613","url":null,"abstract":"Soft actor-critic (SAC) is a reinforcement learning algorithm that employs the maximum entropy framework to train a stochastic policy. This work examines a specific failure case of SAC where the stochastic policy is trained to maximize the expected entropy from a sparse reward environment. We demonstrate that the over-exploration of SAC can make the entropy temperature collapse, followed by unstable updates to the actor. Based on our analyses, we introduce Reg-SAC, an improved version of SAC, to mitigate the detrimental effects of the entropy temperature on the learning stability of the stochastic policy. Reg-SAC incorporates a clipping value to prevent the entropy temperature collapse and regularizes the gradient updates of the policy via Kullback-Leibler divergence. Through experiments on discrete benchmarks, our proposed Reg-SAC outperforms the standard SAC in spare-reward grid world environments while it is able to maintain competitive performance in the dense-reward Atari benchmark. The results highlight that our regularized version makes the stochastic policy of SAC more stable in discrete-action domains.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 2","pages":"1425-1438"},"PeriodicalIF":8.6,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Adaptive Neural Tracking of Uncertain State-Constrained Nonlinear Systems With Unmatched Disturbances: Prescribed-Time Disturbance Observer Approach 具有不匹配扰动的不确定状态约束非线性系统的自适应神经跟踪:规定时间扰动观测器方法
IF 8.6 1区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-12-03 DOI: 10.1109/TSMC.2024.3502661
Hyeong Jin Kim;Sung Jin Yoo
We propose a prescribed-time nonlinear disturbance observer (PTNDO) approach for adaptive prescribed-time tracking of state-constrained strict-feedback systems with unmatched disturbances and nonlinearities. In contrast to existing control methods that address the state constraint problem, the key contribution of this article is the development of a neural-network-based adaptive PTNDO to compensate for unmatched disturbances within a prescribed time while dealing with unknown nonlinearities in the field of the adaptive prescribed-time tracking. Based on a nonlinear transformation function technique that eliminates the conventional feasibility conditions of virtual control laws in recursive design steps, the original state-constrained system is transformed into an unconstrained system. Subsequently, by deriving a practical prescribed-time adjustment function and its related stability lemma, a PTNDO-based adaptive control strategy is established to guarantee that the disturbance observation and tracking errors converge to the adjustable bound, including zero at a prescribed settling time, while maintaining state constraints. Simulation results verify the resulting approach.
针对具有不匹配扰动和非线性的状态约束严格反馈系统,提出了一种规定时间非线性扰动观测器(PTNDO)方法。与解决状态约束问题的现有控制方法相比,本文的关键贡献是开发了一种基于神经网络的自适应PTNDO,以补偿规定时间内的不匹配干扰,同时处理自适应规定时间跟踪领域的未知非线性。基于非线性变换函数技术,消除了递归设计步骤中虚拟控制律的常规可行性条件,将原状态约束系统转化为无约束系统。随后,通过推导一个实用的规定时间调整函数及其稳定性引理,建立了一种基于ptndo的自适应控制策略,在保持状态约束的情况下,保证扰动观测和跟踪误差在规定的沉降时间收敛到可调界,包括收敛到零。仿真结果验证了该方法的有效性。
{"title":"Adaptive Neural Tracking of Uncertain State-Constrained Nonlinear Systems With Unmatched Disturbances: Prescribed-Time Disturbance Observer Approach","authors":"Hyeong Jin Kim;Sung Jin Yoo","doi":"10.1109/TSMC.2024.3502661","DOIUrl":"https://doi.org/10.1109/TSMC.2024.3502661","url":null,"abstract":"We propose a prescribed-time nonlinear disturbance observer (PTNDO) approach for adaptive prescribed-time tracking of state-constrained strict-feedback systems with unmatched disturbances and nonlinearities. In contrast to existing control methods that address the state constraint problem, the key contribution of this article is the development of a neural-network-based adaptive PTNDO to compensate for unmatched disturbances within a prescribed time while dealing with unknown nonlinearities in the field of the adaptive prescribed-time tracking. Based on a nonlinear transformation function technique that eliminates the conventional feasibility conditions of virtual control laws in recursive design steps, the original state-constrained system is transformed into an unconstrained system. Subsequently, by deriving a practical prescribed-time adjustment function and its related stability lemma, a PTNDO-based adaptive control strategy is established to guarantee that the disturbance observation and tracking errors converge to the adjustable bound, including zero at a prescribed settling time, while maintaining state constraints. Simulation results verify the resulting approach.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 2","pages":"1439-1450"},"PeriodicalIF":8.6,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Human–Robot Collaboration: Supernumerary Robotic Limbs for Object Balance 增强人机协作:用于物体平衡的多余机器人肢体
IF 8.6 1区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-12-03 DOI: 10.1109/TSMC.2024.3501389
Jing Luo;Shiyang Liu;Weiyong Si;Chao Zeng
Supernumerary robotic limb (SRL) is recognized as being at the forefront of robotics innovation, aimed at augmenting human capabilities in complex working environments. Despite their potential to significantly enhance operational efficiency, the integration of SRL for dynamic and intricate tasks presents challenges in teleoperation, precise positioning, and dynamic balance control. To address challenges in initiating control when targets or the SRL’s end-effector are outside the camera’s visual range, a coarse teleoperation strategy is implemented. This strategy utilizes the inertial measurement unit (IMU) and the extended Kalman filter (EKF), enabling basic orientation and movement toward the target area without reliance on visual cues. Challenges in achieving fine-tuned control for accurate task completion, particularly in visual navigation and precise positioning of the SRL’s end-effector, are addressed by integrating object detection via YOLOX with the tangential artificial potential field (T-APF) method for exact path planning. This integration significantly enhances the system’s ability to fine-tune the placement of end-effector. The challenge of conducting balance tasks without force sensors is tackled by adopting a dual-spring model combined with autoregressive (AR) predictive modeling, enabling effective balance support through anticipatory motion adjustments. Experiments have demonstrated the system’s enhanced positional accuracy and maintained synchronization with human movements, underscoring the effectiveness of the integrated approach in facilitating complex human-robot collaborative tasks.
多余机器人肢体(SRL)被认为是机器人技术创新的前沿,旨在增强人类在复杂工作环境中的能力。尽管它们具有显著提高作战效率的潜力,但在动态和复杂任务中集成SRL在远程操作、精确定位和动态平衡控制方面提出了挑战。为了解决当目标或SRL末端执行器在相机视觉范围之外时启动控制的挑战,实现了一种粗远程操作策略。该策略利用惯性测量单元(IMU)和扩展卡尔曼滤波器(EKF),使基本方向和移动到目标区域,而不依赖于视觉线索。通过YOLOX将目标检测与切向人工势场(T-APF)方法相结合,解决了在精确完成任务时实现微调控制的挑战,特别是在SRL末端执行器的视觉导航和精确定位方面。这种集成显著提高了系统的能力微调末端执行器的位置。通过采用双弹簧模型结合自回归(AR)预测建模,解决了在没有力传感器的情况下进行平衡任务的挑战,通过预期运动调整实现有效的平衡支持。实验表明,该系统提高了定位精度,并与人体运动保持同步,强调了集成方法在促进复杂人机协作任务中的有效性。
{"title":"Enhancing Human–Robot Collaboration: Supernumerary Robotic Limbs for Object Balance","authors":"Jing Luo;Shiyang Liu;Weiyong Si;Chao Zeng","doi":"10.1109/TSMC.2024.3501389","DOIUrl":"https://doi.org/10.1109/TSMC.2024.3501389","url":null,"abstract":"Supernumerary robotic limb (SRL) is recognized as being at the forefront of robotics innovation, aimed at augmenting human capabilities in complex working environments. Despite their potential to significantly enhance operational efficiency, the integration of SRL for dynamic and intricate tasks presents challenges in teleoperation, precise positioning, and dynamic balance control. To address challenges in initiating control when targets or the SRL’s end-effector are outside the camera’s visual range, a coarse teleoperation strategy is implemented. This strategy utilizes the inertial measurement unit (IMU) and the extended Kalman filter (EKF), enabling basic orientation and movement toward the target area without reliance on visual cues. Challenges in achieving fine-tuned control for accurate task completion, particularly in visual navigation and precise positioning of the SRL’s end-effector, are addressed by integrating object detection via YOLOX with the tangential artificial potential field (T-APF) method for exact path planning. This integration significantly enhances the system’s ability to fine-tune the placement of end-effector. The challenge of conducting balance tasks without force sensors is tackled by adopting a dual-spring model combined with autoregressive (AR) predictive modeling, enabling effective balance support through anticipatory motion adjustments. Experiments have demonstrated the system’s enhanced positional accuracy and maintained synchronization with human movements, underscoring the effectiveness of the integrated approach in facilitating complex human-robot collaborative tasks.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 2","pages":"1334-1347"},"PeriodicalIF":8.6,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differentially Private Dynamic Average Consensus-Based Newton Method for Distributed Optimization Over General Networks 基于差分私有动态平均共识的通用网络分布优化牛顿方法
IF 8.6 1区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-12-03 DOI: 10.1109/TSMC.2024.3496488
Mingqi Xing;Dazhong Ma;Jing Zhao;Pak Kin Wong
This article investigates the issue of privacy preservation in distributed optimization, where each node possesses a local private objective function and collaborates to minimize the sum of those functions. A novel dynamic average consensus-based distributed Newton algorithm is introduced to achieve consensus, optimality, and differential privacy. Each node utilizes its local gradient and Hessian as time-varying reference signals, facilitating information exchange with neighbors for tracking the average. To safeguard privacy, persistent Laplace noise is introduced into the exchanged data, affecting the estimated optimal solution, gradient, and Hessian averages. To counteract the noise’s impact, the internode coupling strength is adaptively reduced over time through decay factors, allowing for noise attenuation as the algorithm progresses. The algorithm’s convergence to the optimal solution, assuming global function smoothness and strong convexity, is theoretically proven. The algorithm’s accurate convergence to the optimal solution, assuming global function smoothness and strong convexity, is theoretically proven. Furthermore, the efficiency and reliability of the algorithm are empirically validated through simulations of an IEEE 14-bus test system.
本文研究了分布式优化中的隐私保护问题,其中每个节点拥有一个局部私有目标函数,并协作最小化这些函数的总和。提出了一种新的基于动态平均共识的分布式牛顿算法来实现共识、最优性和差分隐私。每个节点利用其局部梯度和Hessian作为时变参考信号,便于与相邻节点交换信息,跟踪平均值。为了保护隐私,在交换的数据中引入持久的拉普拉斯噪声,影响估计的最优解、梯度和Hessian平均。为了抵消噪声的影响,节点间耦合强度通过衰减因子随时间自适应降低,从而允许随着算法的进展进行噪声衰减。从理论上证明了该算法在全局平滑和强凸性条件下收敛于最优解。从理论上证明了该算法在全局平滑和强凸性条件下收敛到最优解的准确性。通过对IEEE 14总线测试系统的仿真,验证了该算法的有效性和可靠性。
{"title":"Differentially Private Dynamic Average Consensus-Based Newton Method for Distributed Optimization Over General Networks","authors":"Mingqi Xing;Dazhong Ma;Jing Zhao;Pak Kin Wong","doi":"10.1109/TSMC.2024.3496488","DOIUrl":"https://doi.org/10.1109/TSMC.2024.3496488","url":null,"abstract":"This article investigates the issue of privacy preservation in distributed optimization, where each node possesses a local private objective function and collaborates to minimize the sum of those functions. A novel dynamic average consensus-based distributed Newton algorithm is introduced to achieve consensus, optimality, and differential privacy. Each node utilizes its local gradient and Hessian as time-varying reference signals, facilitating information exchange with neighbors for tracking the average. To safeguard privacy, persistent Laplace noise is introduced into the exchanged data, affecting the estimated optimal solution, gradient, and Hessian averages. To counteract the noise’s impact, the internode coupling strength is adaptively reduced over time through decay factors, allowing for noise attenuation as the algorithm progresses. The algorithm’s convergence to the optimal solution, assuming global function smoothness and strong convexity, is theoretically proven. The algorithm’s accurate convergence to the optimal solution, assuming global function smoothness and strong convexity, is theoretically proven. Furthermore, the efficiency and reliability of the algorithm are empirically validated through simulations of an IEEE 14-bus test system.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 2","pages":"1348-1361"},"PeriodicalIF":8.6,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993138","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Scalable Formation Control for Second-Order Multiagent Systems: An Event-Triggered Predefined-Time Strategy 二阶多智能体系统的可扩展编队控制:一种事件触发的预定义时间策略
IF 8.6 1区 计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS Pub Date : 2024-12-03 DOI: 10.1109/TSMC.2024.3504819
Mengyang Xu;Xia Chen;Fei Hao
This article investigates the problem of distributed formation tracking control for second-order multiagent systems with unknown inertias. The leader-follower longitudinal formation control is considered and the target is to make followers achieve the same speed with the leader and maintain a desired longitudinal spacing. To make sure that the transient time is within user’s preset time and the communication and computation resources are reduced, we focus on the event-triggered predefined-time control problem. To solve the problem, we design a node-based event-triggered controller in which coupling weights are updated based on an adaptive mechanism. Moreover, a state transformation is considered, and by analyzing the predefined-time stability of the transformed state, both the predefined-time longitudinal formation and the boundedness of controller are proved. Note that, with the adaptive updating mechanism, the control parameters do not depend on the information of Laplacian matrix and the bounds of unknown inertias. Thus, the formation is scalable for the case where some agents leave or join in the formation. Furthermore, to avoid updating the desired spacing manually when agents join or leave, we propose a fully distributed event-triggered predefined-time desired spacing decision algorithm based on distributed resource allocation algorithm. With the combination of the proposed spacing decision algorithm and controller, the longitudinal formation control is more scalable, time-saving and energy-saving.
研究了具有未知惯性的二阶多智能体系统的分布式编队跟踪控制问题。考虑了leader-follower纵向编队控制问题,其目标是使follower达到与leader相同的速度并保持期望的纵向间距。为了保证暂态时间在用户预设时间内,减少通信和计算资源,重点研究了事件触发的预定义时间控制问题。为了解决这个问题,我们设计了一个基于节点的事件触发控制器,其中耦合权基于自适应机制更新。在此基础上,考虑状态变换,通过分析变换状态的预定义时间稳定性,证明了控制器的预定义时间纵向形和有界性。需要注意的是,通过自适应更新机制,控制参数不依赖于拉普拉斯矩阵的信息和未知惯性的界。因此,对于某些代理离开或加入该队列的情况,该队列是可伸缩的。此外,为了避免在agent加入或离开时手动更新期望间距,我们提出了一种基于分布式资源分配算法的全分布式事件触发的预定义时间期望间距决策算法。将所提出的井距决策算法与控制器相结合,使纵向地层控制更具可扩展性、省时节能。
{"title":"Scalable Formation Control for Second-Order Multiagent Systems: An Event-Triggered Predefined-Time Strategy","authors":"Mengyang Xu;Xia Chen;Fei Hao","doi":"10.1109/TSMC.2024.3504819","DOIUrl":"https://doi.org/10.1109/TSMC.2024.3504819","url":null,"abstract":"This article investigates the problem of distributed formation tracking control for second-order multiagent systems with unknown inertias. The leader-follower longitudinal formation control is considered and the target is to make followers achieve the same speed with the leader and maintain a desired longitudinal spacing. To make sure that the transient time is within user’s preset time and the communication and computation resources are reduced, we focus on the event-triggered predefined-time control problem. To solve the problem, we design a node-based event-triggered controller in which coupling weights are updated based on an adaptive mechanism. Moreover, a state transformation is considered, and by analyzing the predefined-time stability of the transformed state, both the predefined-time longitudinal formation and the boundedness of controller are proved. Note that, with the adaptive updating mechanism, the control parameters do not depend on the information of Laplacian matrix and the bounds of unknown inertias. Thus, the formation is scalable for the case where some agents leave or join in the formation. Furthermore, to avoid updating the desired spacing manually when agents join or leave, we propose a fully distributed event-triggered predefined-time desired spacing decision algorithm based on distributed resource allocation algorithm. With the combination of the proposed spacing decision algorithm and controller, the longitudinal formation control is more scalable, time-saving and energy-saving.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 2","pages":"1466-1477"},"PeriodicalIF":8.6,"publicationDate":"2024-12-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
IEEE Transactions on Systems Man Cybernetics-Systems
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1