Parallel data collection has redefined Reinforce-ment Learning (RL), unlocking unprecedented efficiency and powering breakthroughs in large-scale real-world applications. In this paradigm, N identical agents operate in N replicas of an envi-ronment simulator, accelerating data collection by a factor of N. A critical question arises: Does spe-cializing the policies of the parallel agents hold the key to surpass the N factor acceleration? In this paper, we introduce a novel learning frame-work that maximizes the entropy of collected data in a parallel setting. Our approach carefully balances the entropy of individual agents with inter-agent diversity, effectively minimizing re-dundancies. The latter idea is implemented with a centralized policy gradient method, which shows promise when evaluated empirically against sys-tems of identical agents, as well as synergy with batch RL techniques that can exploit data diver-sity. Finally, we provide an original concentration analysis that shows faster rates for specialized par-allel sampling distributions, which supports our methodology and may be of independent interest.
Enhancing Diversity in Parallel Agents: A Maximum State Entropy Exploration Story
Vincenzo De Paola;Riccardo Zamboni;Mirco Mutti;Marcello Restelli
2025-01-01
Abstract
Parallel data collection has redefined Reinforce-ment Learning (RL), unlocking unprecedented efficiency and powering breakthroughs in large-scale real-world applications. In this paradigm, N identical agents operate in N replicas of an envi-ronment simulator, accelerating data collection by a factor of N. A critical question arises: Does spe-cializing the policies of the parallel agents hold the key to surpass the N factor acceleration? In this paper, we introduce a novel learning frame-work that maximizes the entropy of collected data in a parallel setting. Our approach carefully balances the entropy of individual agents with inter-agent diversity, effectively minimizing re-dundancies. The latter idea is implemented with a centralized policy gradient method, which shows promise when evaluated empirically against sys-tems of identical agents, as well as synergy with batch RL techniques that can exploit data diver-sity. Finally, we provide an original concentration analysis that shows faster rates for specialized par-allel sampling distributions, which supports our methodology and may be of independent interest.| File | Dimensione | Formato | |
|---|---|---|---|
|
Enhancing_Diversity_in_Parallel_Agents_A_Maximum_State_Entropy_Exploration_Story.pdf
Accesso riservato
:
Publisher’s version
Dimensione
5.04 MB
Formato
Adobe PDF
|
5.04 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


