AutoML systems seek to assist Artificial Intelligence users in finding the best configurations for machine learning models. Following this line, recently the area of Automated Reinforcement Learning (AutoRL) has become increasingly relevant, given the growing increase in applications for reinforcement learning algorithms. However, the literature still lacks specific AutoRL systems for combinatorial optimization, especially for the Sequential Ordering Problem (SOP). Therefore, this paper aims to present a new AutoRL approach for SOP. For this, two new methods are proposed using hyperparameter optimization and metalearning: AutoRL-SOP and AutoRL-SOP-MtL. The proposed AutoRL techniques enable the combined tuning of three SARSA hyperparameters, being ϵ-greedy policy, learning rate, and discount factor. Furthermore, the new metalearning approach enables the transfer of hyperparameters between two combinatorial optimization domains: TSP (source) and SOP (target). The results show that the application of metalearning generates a reduction in computational cost in hyperparameter optimization. Furthermore, the proposed AutoRL methods achieved the best solutions in 23 out of 28 simulated TSPLIB instances compared to recent literature studies.
Autonomous driving systems (ADS) are at the forefront of technological innovation, promising enhanced safety, efficiency, and convenience in transportation. This study investigates the potential of end-to-end reinforcement learning (RL) architectures for ADS, specifically focusing on a Go-To-Point task involving lane-keeping and navigation through basic urban environments. The study uses the Proximal Policy Optimization (PPO) algorithm within the CARLA simulation environment. Traditional modular systems, which separate driving tasks into perception, decision-making, and control, provide interpretability and reliability in controlled scenarios but struggle with adaptability to dynamic, real-world conditions. In contrast, end-to-end systems offer a more integrated approach, potentially enhancing flexibility and decision-making cohesion.
This research introduces CARLA-GymDrive, a novel framework integrating the CARLA simulator with the Gymnasium API, enabling seamless RL experimentation with both discrete and continuous action spaces. Through a two-phase training regimen, the study evaluates the efficacy of PPO in an end-to-end ADS focused on basic tasks like lane-keeping and waypoint navigation. A comparative analysis with modular architectures is also provided. The findings highlight the strengths of PPO in managing continuous control tasks, achieving smoother and more adaptable driving behaviors than value-based algorithms like Deep Q-Networks. However, challenges remain in generalization and computational demands, with end-to-end systems requiring extensive training time.
While the study underscores the potential of end-to-end architectures, it also identifies limitations in scalability and real-world applicability, suggesting that modular systems may currently be more feasible for practical ADS deployment. Nonetheless, the CARLA-GymDrive framework and the insights gained from PPO-based ADS contribute significantly to the field, laying a foundation for future advancements in AD.
Modern industries dependent on reliable asset operation under constrained resources employ intelligent maintenance methods to maximize efficiency. However, classical maintenance methods rely on assumed lifetime distributions and suffer from estimation errors and computational complexity. The advent of Industry 4.0 has increased the use of sensors for monitoring systems, while deep learning (DL) models have allowed for accurate system health predictions, enabling data-driven maintenance planning. Most intelligent maintenance literature has used DL models solely for remaining useful life (RUL) point predictions, and a substantial gap exists in further using predictions to inform maintenance plan optimization. The few existing studies that have attempted to bridge this gap suffer from having used simple system configurations and non-scalable models. Hence, this paper develops a hybrid DL model using Monte Carlo dropout to generate RUL predictions which are used to construct empirical system reliability functions used for the optimization of the selective maintenance problem (SMP). The proposed framework is used to plan maintenance for a mission-oriented series k-out-of-n:G system. Numerical experiments compare the framework’s performance against prior SMP methods and highlight its strengths. When minimizing cost, maintenance plans are frequently produced that result in mission survival while avoiding unnecessary repairs. The proposed method is usable in large-scale, complex scenarios and various industrial contexts. The method finds exact solutions while avoiding the need for computationally-intensive parametric reliability functions.

