This article discusses the adaptive identifier–critic–actor neural optimal control for stochastic nonstrict-feedback nonlinear systems with elastic state constraints. Reinforcement learning is used to achieve optimal control, which is designed based on the identifier–critic–actor structure of neural network approximation. In this framework, the identifier, critic and actor are used to estimate unknown dynamics, evaluate system performance and execute control actions, respectively. This control scheme designs the actual control from all virtual controls and dynamic surface controls as the optimal solution to the corresponding subsystems. The update law is derived through the negative gradient of a simple positive function, which is generated by the partial derivative of the Hamilton–Jacobi-Bellman (HJB) equation. At the same time, this design can also alleviate the requirement for continuous excitation conditions in current optimal control methods. A key innovation lies in formulating an elastic constraint function with flexible capabilities, thus providing a unified framework capable of flexibly addressing custom time constraints without changing the control structure. Stability analysis shows that all signals are semi-globally uniformly ultimately bounded in probability.
Adaptive identifier–critic–actor neural optimal control of stochastic nonlinear systems with elastic state constraints
Karimi, Hamid Reza;
2025-01-01
Abstract
This article discusses the adaptive identifier–critic–actor neural optimal control for stochastic nonstrict-feedback nonlinear systems with elastic state constraints. Reinforcement learning is used to achieve optimal control, which is designed based on the identifier–critic–actor structure of neural network approximation. In this framework, the identifier, critic and actor are used to estimate unknown dynamics, evaluate system performance and execute control actions, respectively. This control scheme designs the actual control from all virtual controls and dynamic surface controls as the optimal solution to the corresponding subsystems. The update law is derived through the negative gradient of a simple positive function, which is generated by the partial derivative of the Hamilton–Jacobi-Bellman (HJB) equation. At the same time, this design can also alleviate the requirement for continuous excitation conditions in current optimal control methods. A key innovation lies in formulating an elastic constraint function with flexible capabilities, thus providing a unified framework capable of flexibly addressing custom time constraints without changing the control structure. Stability analysis shows that all signals are semi-globally uniformly ultimately bounded in probability.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


