LSTM-empowered reinforcement learning in Bi-level optimal control for nonlinear systems with uncertain dynamics

Roya Khalili Amirabadi,; Jalaeian-Farimani, Mohsen; Fard, Omid S.

doi:10.1016/j.isatra.2025.11.027

This paper introduces a bi-level optimization framework for the optimal control of nonlinear continuous-time systems with uncertain dynamics, seamlessly integrating Long Short-Term Memory (LSTM) networks with an actor-critic reinforcement learning (RL) architecture. By synergizing Hamiltonian-based optimal control with online uncertainty estimation, the proposed method achieves robust trajectory tracking without reliance on offline training. The master level optimizes control policies using an HJB-inspired formulation, while the slave level employs LSTM networks to dynamically estimate lumped uncertainties, ensuring adaptability to time-varying disturbances. Rigorous stability analysis establishes uniform ultimate boundedness of the tracking error, guaranteeing robust performance. Extensive simulations on a skid-steering tracked robot across diverse trajectories demonstrate the framework’s superior tracking precision, energy efficiency, and disturbance rejection compared to conventional adaptive control and model-based virtual reference trajectory schemes. This computationally efficient and theoretically grounded approach offers a scalable solution for autonomous systems operating in uncertain environments, advancing the paradigm of RL-based optimal control.