This article discusses the adaptive identifier–critic–actor neural optimal control for stochastic nonstrict-feedback nonlinear systems with elastic state constraints. Reinforcement learning is used to achieve optimal control, which is designed based on the identifier–critic–actor structure of neural network approximation. In this framework, the identifier, critic and actor are used to estimate unknown dynamics, evaluate system performance and execute control actions, respectively. This control scheme designs the actual control from all virtual controls and dynamic surface controls as the optimal solution to the corresponding subsystems. The update law is derived through the negative gradient of a simple positive function, which is generated by the partial derivative of the Hamilton–Jacobi-Bellman (HJB) equation. At the same time, this design can also alleviate the requirement for continuous excitation conditions in current optimal control methods. A key innovation lies in formulating an elastic constraint function with flexible capabilities, thus providing a unified framework capable of flexibly addressing custom time constraints without changing the control structure. Stability analysis shows that all signals are semi-globally uniformly ultimately bounded in probability.

Adaptive identifier–critic–actor neural optimal control of stochastic nonlinear systems with elastic state constraints

Karimi, Hamid Reza;
2025-01-01

Abstract

This article discusses the adaptive identifier–critic–actor neural optimal control for stochastic nonstrict-feedback nonlinear systems with elastic state constraints. Reinforcement learning is used to achieve optimal control, which is designed based on the identifier–critic–actor structure of neural network approximation. In this framework, the identifier, critic and actor are used to estimate unknown dynamics, evaluate system performance and execute control actions, respectively. This control scheme designs the actual control from all virtual controls and dynamic surface controls as the optimal solution to the corresponding subsystems. The update law is derived through the negative gradient of a simple positive function, which is generated by the partial derivative of the Hamilton–Jacobi-Bellman (HJB) equation. At the same time, this design can also alleviate the requirement for continuous excitation conditions in current optimal control methods. A key innovation lies in formulating an elastic constraint function with flexible capabilities, thus providing a unified framework capable of flexibly addressing custom time constraints without changing the control structure. Stability analysis shows that all signals are semi-globally uniformly ultimately bounded in probability.
2025
Adaptive control
Elastic state constraints
Optimal control
Reinforcement learning
Stochastic nonlinear systems
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1310780
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 3
social impact