This paper introduces a bi-level optimization framework for the optimal control of nonlinear continuous-time systems with uncertain dynamics, seamlessly integrating Long Short-Term Memory (LSTM) networks with an actor-critic reinforcement learning (RL) architecture. By synergizing Hamiltonian-based optimal control with online uncertainty estimation, the proposed method achieves robust trajectory tracking without reliance on offline training. The master level optimizes control policies using an HJB-inspired formulation, while the slave level employs LSTM networks to dynamically estimate lumped uncertainties, ensuring adaptability to time-varying disturbances. Rigorous stability analysis establishes uniform ultimate boundedness of the tracking error, guaranteeing robust performance. Extensive simulations on a skid-steering tracked robot across diverse trajectories demonstrate the framework’s superior tracking precision, energy efficiency, and disturbance rejection compared to conventional adaptive control and model-based virtual reference trajectory schemes. This computationally efficient and theoretically grounded approach offers a scalable solution for autonomous systems operating in uncertain environments, advancing the paradigm of RL-based optimal control.

LSTM-empowered reinforcement learning in Bi-level optimal control for nonlinear systems with uncertain dynamics

Mohsen Jalaeian-Farimani;
2026-01-01

Abstract

This paper introduces a bi-level optimization framework for the optimal control of nonlinear continuous-time systems with uncertain dynamics, seamlessly integrating Long Short-Term Memory (LSTM) networks with an actor-critic reinforcement learning (RL) architecture. By synergizing Hamiltonian-based optimal control with online uncertainty estimation, the proposed method achieves robust trajectory tracking without reliance on offline training. The master level optimizes control policies using an HJB-inspired formulation, while the slave level employs LSTM networks to dynamically estimate lumped uncertainties, ensuring adaptability to time-varying disturbances. Rigorous stability analysis establishes uniform ultimate boundedness of the tracking error, guaranteeing robust performance. Extensive simulations on a skid-steering tracked robot across diverse trajectories demonstrate the framework’s superior tracking precision, energy efficiency, and disturbance rejection compared to conventional adaptive control and model-based virtual reference trajectory schemes. This computationally efficient and theoretically grounded approach offers a scalable solution for autonomous systems operating in uncertain environments, advancing the paradigm of RL-based optimal control.
2026
Bi-level optimal control, Reinforcement learning, Sliding mode, Actor-critic neural network, Skid-steering tracked robot
File in questo prodotto:
File Dimensione Formato  
1-s2.0-S0019057825006457-main.pdf

accesso aperto

: Publisher’s version
Dimensione 7.29 MB
Formato Adobe PDF
7.29 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1310525
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact