Improving Generalization in Federated Learning by Seeking Flat Minima

Caldarola, Debora; Caputo, Barbara; Ciccone, Marco

doi:10.1007/978-3-031-20050-2_38

Models trained in federated settings often suffer from degraded performances and fail at generalizing, especially when facing heterogeneous scenarios. In this work, we investigate such behavior through the lens of geometry of the loss and Hessian eigenspectrum, linking the model’s lack of generalization capacity to the sharpness of the solution. Motivated by prior studies connecting the sharpness of the loss surface and the generalization gap, we show that i) training clients locally with Sharpness-Aware Minimization (SAM) or its adaptive version (ASAM) and ii) averaging stochastic weights (SWA) on the server-side can substantially improve generalization in Federated Learning and help bridging the gap with centralized models. By seeking parameters in neighborhoods having uniform low loss, the model converges towards flatter minima and its generalization significantly improves in both homogeneous and heterogeneous scenarios. Empirical results demonstrate the effectiveness of those optimizers across a variety of benchmark vision datasets (e.g. Cifar10/100, Landmarks-User-160k, Idda) and tasks (large scale classification, semantic segmentation, domain generalization).

Improving Generalization in Federated Learning by Seeking Flat Minima

Caldarola, Debora;Caputo, Barbara;Ciccone, Marco

2022-01-01

Abstract

Models trained in federated settings often suffer from degraded performances and fail at generalizing, especially when facing heterogeneous scenarios. In this work, we investigate such behavior through the lens of geometry of the loss and Hessian eigenspectrum, linking the model’s lack of generalization capacity to the sharpness of the solution. Motivated by prior studies connecting the sharpness of the loss surface and the generalization gap, we show that i) training clients locally with Sharpness-Aware Minimization (SAM) or its adaptive version (ASAM) and ii) averaging stochastic weights (SWA) on the server-side can substantially improve generalization in Federated Learning and help bridging the gap with centralized models. By seeking parameters in neighborhoods having uniform low loss, the model converges towards flatter minima and its generalization significantly improves in both homogeneous and heterogeneous scenarios. Empirical results demonstrate the effectiveness of those optimizers across a variety of benchmark vision datasets (e.g. Cifar10/100, Landmarks-User-160k, Idda) and tasks (large scale classification, semantic segmentation, domain generalization).

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2022
			
	Titolo del libro
	
				Proceedings of 17th European Conference on Computer Vision – ECCV 2022
			
	Titolo della collana
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	ISBN (International Standard Book Number)
	
				9783031200496
9783031200502
			
	Parole chiave
	
				Federated Learning, Flat Minima, Machine Learning
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
978-3-031-20050-2_38.pdf Accesso riservato : Publisher’s version Dimensione 1.1 MB Formato Adobe PDF Visualizza/Apri	1.1 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1283174

Citazioni

ND

104

96

ND

RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Improving Generalization in Federated Learning by Seeking Flat Minima

Caldarola, Debora;Caputo, Barbara;Ciccone, Marco

2022-01-01

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Improving Generalization in Federated Learning by Seeking Flat Minima

Caldarola, Debora;Caputo, Barbara;Ciccone, Marco

2022-01-01

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)