RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

This article discusses the adaptive identifier–critic–actor neural optimal control for stochastic nonstrict-feedback nonlinear systems with elastic state constraints. Reinforcement learning is used to achieve optimal control, which is designed based on the identifier–critic–actor structure of neural network approximation. In this framework, the identifier, critic and actor are used to estimate unknown dynamics, evaluate system performance and execute control actions, respectively. This control scheme designs the actual control from all virtual controls and dynamic surface controls as the optimal solution to the corresponding subsystems. The update law is derived through the negative gradient of a simple positive function, which is generated by the partial derivative of the Hamilton–Jacobi-Bellman (HJB) equation. At the same time, this design can also alleviate the requirement for continuous excitation conditions in current optimal control methods. A key innovation lies in formulating an elastic constraint function with flexible capabilities, thus providing a unified framework capable of flexibly addressing custom time constraints without changing the control structure. Stability analysis shows that all signals are semi-globally uniformly ultimately bounded in probability.

Adaptive identifier–critic–actor neural optimal control of stochastic nonlinear systems with elastic state constraints

Chen, Penghao;Karimi, Hamid Reza;Luan, Xiaoli;Liu, Fei

2025-01-01

Abstract

This article discusses the adaptive identifier–critic–actor neural optimal control for stochastic nonstrict-feedback nonlinear systems with elastic state constraints. Reinforcement learning is used to achieve optimal control, which is designed based on the identifier–critic–actor structure of neural network approximation. In this framework, the identifier, critic and actor are used to estimate unknown dynamics, evaluate system performance and execute control actions, respectively. This control scheme designs the actual control from all virtual controls and dynamic surface controls as the optimal solution to the corresponding subsystems. The update law is derived through the negative gradient of a simple positive function, which is generated by the partial derivative of the Hamilton–Jacobi-Bellman (HJB) equation. At the same time, this design can also alleviate the requirement for continuous excitation conditions in current optimal control methods. A key innovation lies in formulating an elastic constraint function with flexible capabilities, thus providing a unified framework capable of flexibly addressing custom time constraints without changing the control structure. Stability analysis shows that all signals are semi-globally uniformly ultimately bounded in probability.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Titolo della rivista
	
				CHAOS, SOLITONS AND FRACTALS
			
	Parole chiave
	
				Adaptive control; Elastic state constraints; Optimal control; Reinforcement learning; Stochastic nonlinear systems;
			
	Appare nelle tipologie:
	
				01.1 Articolo in Rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1310780

Citazioni

ND

4

4

social impact