Parallel data collection has redefined Reinforce-ment Learning (RL), unlocking unprecedented efficiency and powering breakthroughs in large-scale real-world applications. In this paradigm, N identical agents operate in N replicas of an envi-ronment simulator, accelerating data collection by a factor of N. A critical question arises: Does spe-cializing the policies of the parallel agents hold the key to surpass the N factor acceleration? In this paper, we introduce a novel learning frame-work that maximizes the entropy of collected data in a parallel setting. Our approach carefully balances the entropy of individual agents with inter-agent diversity, effectively minimizing re-dundancies. The latter idea is implemented with a centralized policy gradient method, which shows promise when evaluated empirically against sys-tems of identical agents, as well as synergy with batch RL techniques that can exploit data diver-sity. Finally, we provide an original concentration analysis that shows faster rates for specialized par-allel sampling distributions, which supports our methodology and may be of independent interest.

Enhancing Diversity in Parallel Agents: A Maximum State Entropy Exploration Story

Vincenzo De Paola;Riccardo Zamboni;Mirco Mutti;Marcello Restelli
2025-01-01

Abstract

Parallel data collection has redefined Reinforce-ment Learning (RL), unlocking unprecedented efficiency and powering breakthroughs in large-scale real-world applications. In this paradigm, N identical agents operate in N replicas of an envi-ronment simulator, accelerating data collection by a factor of N. A critical question arises: Does spe-cializing the policies of the parallel agents hold the key to surpass the N factor acceleration? In this paper, we introduce a novel learning frame-work that maximizes the entropy of collected data in a parallel setting. Our approach carefully balances the entropy of individual agents with inter-agent diversity, effectively minimizing re-dundancies. The latter idea is implemented with a centralized policy gradient method, which shows promise when evaluated empirically against sys-tems of identical agents, as well as synergy with batch RL techniques that can exploit data diver-sity. Finally, we provide an original concentration analysis that shows faster rates for specialized par-allel sampling distributions, which supports our methodology and may be of independent interest.
2025
42nd International Conference on Machine Learning, ICML 2025
File in questo prodotto:
File Dimensione Formato  
Enhancing_Diversity_in_Parallel_Agents_A_Maximum_State_Entropy_Exploration_Story.pdf

Accesso riservato

: Publisher’s version
Dimensione 5.04 MB
Formato Adobe PDF
5.04 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1295972
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact