This paper introduces a bi-level optimization framework for the optimal control of nonlinear continuous-time systems with uncertain dynamics, seamlessly integrating Long Short-Term Memory (LSTM) networks with an actor-critic reinforcement learning (RL) architecture. By synergizing Hamiltonian-based optimal control with online uncertainty estimation, the proposed method achieves robust trajectory tracking without reliance on offline training. The master level optimizes control policies using an HJB-inspired formulation, while the slave level employs LSTM networks to dynamically estimate lumped uncertainties, ensuring adaptability to time-varying disturbances. Rigorous stability analysis establishes uniform ultimate boundedness of the tracking error, guaranteeing robust performance. Extensive simulations on a skid-steering tracked robot across diverse trajectories demonstrate the framework’s superior tracking precision, energy efficiency, and disturbance rejection compared to conventional adaptive control and model-based virtual reference trajectory schemes. This computationally efficient and theoretically grounded approach offers a scalable solution for autonomous systems operating in uncertain environments, advancing the paradigm of RL-based optimal control.
LSTM-empowered reinforcement learning in Bi-level optimal control for nonlinear systems with uncertain dynamics
Mohsen Jalaeian-Farimani;
2026-01-01
Abstract
This paper introduces a bi-level optimization framework for the optimal control of nonlinear continuous-time systems with uncertain dynamics, seamlessly integrating Long Short-Term Memory (LSTM) networks with an actor-critic reinforcement learning (RL) architecture. By synergizing Hamiltonian-based optimal control with online uncertainty estimation, the proposed method achieves robust trajectory tracking without reliance on offline training. The master level optimizes control policies using an HJB-inspired formulation, while the slave level employs LSTM networks to dynamically estimate lumped uncertainties, ensuring adaptability to time-varying disturbances. Rigorous stability analysis establishes uniform ultimate boundedness of the tracking error, guaranteeing robust performance. Extensive simulations on a skid-steering tracked robot across diverse trajectories demonstrate the framework’s superior tracking precision, energy efficiency, and disturbance rejection compared to conventional adaptive control and model-based virtual reference trajectory schemes. This computationally efficient and theoretically grounded approach offers a scalable solution for autonomous systems operating in uncertain environments, advancing the paradigm of RL-based optimal control.| File | Dimensione | Formato | |
|---|---|---|---|
|
1-s2.0-S0019057825006457-main.pdf
accesso aperto
:
Publisher’s version
Dimensione
7.29 MB
Formato
Adobe PDF
|
7.29 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


