Pub Date : 2025-11-27DOI: 10.1007/s40747-025-02180-5
Jing Lu, Min Hu, WenBo Zhou
Shield tunneling must satisfy safety requirements while maintaining efficiency, but the dynamic interactions among subsystems and continuously evolving ground conditions make manual coordination unreliable. To address this challenge, this study proposes the Counterfactual Reinforcement Learning-Based Shield Multi-Subsystem Collaborative Optimization (CRL-SMSCO), a multi-agent reinforcement learning framework with a centralized critic and decentralized actors. CRL-SMSCO performs counterfactual credit assignment to quantify each subsystem’s marginal contribution to global tunneling outcomes and jointly optimizes subsystem parameters. Safety is enforced by a safety-oriented action masking unit that restricts the feasible action space in real time, together with a hierarchical reward that prioritizes safety over efficiency. These features enable CRL-SMSCO to achieve interpretable subsystem coordination and rigorous safety enforcement. In the Nanjing Metro case study, experiments show that CRL-SMSCO improves average training reward by 3.9% over multi-agent deep deterministic policy gradient (MADDPG) and by 3.5% over multi-agent proximal policy optimization (MAPPO), while also yielding lower variance and higher minimum rewards on the held-out test segment. In the engineering application, relative to experience-based manual operation under similar geology, CRL-SMSCO increases average tunneling speed by 13.2%, reduces equipment loads over 1.7%, and decreases ground deformation by 82%, with all indicators maintained within permissible limits. These results demonstrate that CRL-SMSCO provides significant practical value in shield tunneling, offering an effective framework for managing other safety-critical coupled systems.
{"title":"Toward safe and efficient shield tunneling: counterfactual reinforcement learning based multi-subsystem collaborative optimization","authors":"Jing Lu, Min Hu, WenBo Zhou","doi":"10.1007/s40747-025-02180-5","DOIUrl":"https://doi.org/10.1007/s40747-025-02180-5","url":null,"abstract":"Shield tunneling must satisfy safety requirements while maintaining efficiency, but the dynamic interactions among subsystems and continuously evolving ground conditions make manual coordination unreliable. To address this challenge, this study proposes the Counterfactual Reinforcement Learning-Based Shield Multi-Subsystem Collaborative Optimization (CRL-SMSCO), a multi-agent reinforcement learning framework with a centralized critic and decentralized actors. CRL-SMSCO performs counterfactual credit assignment to quantify each subsystem’s marginal contribution to global tunneling outcomes and jointly optimizes subsystem parameters. Safety is enforced by a safety-oriented action masking unit that restricts the feasible action space in real time, together with a hierarchical reward that prioritizes safety over efficiency. These features enable CRL-SMSCO to achieve interpretable subsystem coordination and rigorous safety enforcement. In the Nanjing Metro case study, experiments show that CRL-SMSCO improves average training reward by 3.9% over multi-agent deep deterministic policy gradient (MADDPG) and by 3.5% over multi-agent proximal policy optimization (MAPPO), while also yielding lower variance and higher minimum rewards on the held-out test segment. In the engineering application, relative to experience-based manual operation under similar geology, CRL-SMSCO increases average tunneling speed by 13.2%, reduces equipment loads over 1.7%, and decreases ground deformation by 82%, with all indicators maintained within permissible limits. These results demonstrate that CRL-SMSCO provides significant practical value in shield tunneling, offering an effective framework for managing other safety-critical coupled systems.","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"1 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-11-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145609037","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-25DOI: 10.1007/s40747-025-02150-x
Befekadu Bekuretsion, Wolfgang Menzel, Solomon Teferra
Bayesian optimization (BO) is a memory-intensive algorithm that requires training and evaluating an expensive objective function. In contrast to previous works that use an offline memory estimation to make BO memory-efficient, we propose a robust and simple online memory estimation method that requires training a model only for the first two iterations of the first epoch. Our memory estimation method is then integrated with a simple, performance-based surrogate model of BO in a seamless (or in sync) mode that enforces memory efficiency even if it does not bypass a preset threshold. The online memory estimation method has been evaluated on two different datasets, showing that it is more accurate than the existing offline method ( <jats:inline-formula> <jats:alternatives> <jats:tex-math>$$2.19times $$</jats:tex-math> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>2.19</mml:mn> <mml:mo>×</mml:mo> </mml:mrow> </mml:math> </jats:alternatives> </jats:inline-formula> for MNIST and <jats:inline-formula> <jats:alternatives> <jats:tex-math>$$3.51times $$</jats:tex-math> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>3.51</mml:mn> <mml:mo>×</mml:mo> </mml:mrow> </mml:math> </jats:alternatives> </jats:inline-formula> for CIFAR datasets). Furthermore, compared to a memory-unaware baseline, the enhanced BO has no loss of accuracy and is <jats:inline-formula> <jats:alternatives> <jats:tex-math>$$11.31times $$</jats:tex-math> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>11.31</mml:mn> <mml:mo>×</mml:mo> </mml:mrow> </mml:math> </jats:alternatives> </jats:inline-formula> memory-efficient for a simple CNN-based image classification, and <jats:inline-formula> <jats:alternatives> <jats:tex-math>$$5.03times $$</jats:tex-math> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>5.03</mml:mn> <mml:mo>×</mml:mo> </mml:mrow> </mml:math> </jats:alternatives> </jats:inline-formula> memory efficient but <jats:inline-formula> <jats:alternatives> <jats:tex-math>$$9.27times $$</jats:tex-math> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>9.27</mml:mn> <mml:mo>×</mml:mo> </mml:mrow> </mml:math> </jats:alternatives> </jats:inline-formula> slower for a more complex LSTM-based text classification (useful for a resource-constrained environment where delay is tolerable but memory is scarce), while it is <jats:inline-formula> <jats:alternatives> <jats:tex-math>$$2.6times $$</jats:tex-math> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>2.6</mml:mn> <mml:mo>×</mml:mo> </mml:mrow> </mml:math> </jats:alternatives> </jats:inline-formula> memory efficient but <jats:inline-formula> <jats:alternatives> <jats:tex-math>$$1.23times $$</jats:tex-math> <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>1.23</mml:mn> <mml:mo>×</mml:mo> </mml:mrow> </mml:math> </jats:alternatives> </jats:inline-fo
{"title":"A memory constrained bayesian optimization via robust online memory estimation","authors":"Befekadu Bekuretsion, Wolfgang Menzel, Solomon Teferra","doi":"10.1007/s40747-025-02150-x","DOIUrl":"https://doi.org/10.1007/s40747-025-02150-x","url":null,"abstract":"Bayesian optimization (BO) is a memory-intensive algorithm that requires training and evaluating an expensive objective function. In contrast to previous works that use an offline memory estimation to make BO memory-efficient, we propose a robust and simple online memory estimation method that requires training a model only for the first two iterations of the first epoch. Our memory estimation method is then integrated with a simple, performance-based surrogate model of BO in a seamless (or in sync) mode that enforces memory efficiency even if it does not bypass a preset threshold. The online memory estimation method has been evaluated on two different datasets, showing that it is more accurate than the existing offline method ( <jats:inline-formula> <jats:alternatives> <jats:tex-math>$$2.19times $$</jats:tex-math> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mn>2.19</mml:mn> <mml:mo>×</mml:mo> </mml:mrow> </mml:math> </jats:alternatives> </jats:inline-formula> for MNIST and <jats:inline-formula> <jats:alternatives> <jats:tex-math>$$3.51times $$</jats:tex-math> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mn>3.51</mml:mn> <mml:mo>×</mml:mo> </mml:mrow> </mml:math> </jats:alternatives> </jats:inline-formula> for CIFAR datasets). Furthermore, compared to a memory-unaware baseline, the enhanced BO has no loss of accuracy and is <jats:inline-formula> <jats:alternatives> <jats:tex-math>$$11.31times $$</jats:tex-math> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mn>11.31</mml:mn> <mml:mo>×</mml:mo> </mml:mrow> </mml:math> </jats:alternatives> </jats:inline-formula> memory-efficient for a simple CNN-based image classification, and <jats:inline-formula> <jats:alternatives> <jats:tex-math>$$5.03times $$</jats:tex-math> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mn>5.03</mml:mn> <mml:mo>×</mml:mo> </mml:mrow> </mml:math> </jats:alternatives> </jats:inline-formula> memory efficient but <jats:inline-formula> <jats:alternatives> <jats:tex-math>$$9.27times $$</jats:tex-math> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mn>9.27</mml:mn> <mml:mo>×</mml:mo> </mml:mrow> </mml:math> </jats:alternatives> </jats:inline-formula> slower for a more complex LSTM-based text classification (useful for a resource-constrained environment where delay is tolerable but memory is scarce), while it is <jats:inline-formula> <jats:alternatives> <jats:tex-math>$$2.6times $$</jats:tex-math> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mn>2.6</mml:mn> <mml:mo>×</mml:mo> </mml:mrow> </mml:math> </jats:alternatives> </jats:inline-formula> memory efficient but <jats:inline-formula> <jats:alternatives> <jats:tex-math>$$1.23times $$</jats:tex-math> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\"> <mml:mrow> <mml:mn>1.23</mml:mn> <mml:mo>×</mml:mo> </mml:mrow> </mml:math> </jats:alternatives> </jats:inline-fo","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"189 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145593746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-25DOI: 10.1007/s40747-025-02132-z
Jin Yang, Ning Zhang, Yi Zhang
{"title":"An efficient method for the support vector machine with minimax concave penalty in high dimensions","authors":"Jin Yang, Ning Zhang, Yi Zhang","doi":"10.1007/s40747-025-02132-z","DOIUrl":"https://doi.org/10.1007/s40747-025-02132-z","url":null,"abstract":"","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"107 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145593748","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-11-25DOI: 10.1007/s40747-025-02128-9
Tengda Li, Gang Wang, Qiang Fu, Minrui Zhao, Xiangyu Liu
{"title":"Hierarchical reinforcement learning with opponent modeling for command and control system","authors":"Tengda Li, Gang Wang, Qiang Fu, Minrui Zhao, Xiangyu Liu","doi":"10.1007/s40747-025-02128-9","DOIUrl":"https://doi.org/10.1007/s40747-025-02128-9","url":null,"abstract":"","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"1 1","pages":""},"PeriodicalIF":5.8,"publicationDate":"2025-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145593747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}