RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Resource-constrained smart eyewear (SEW) devices face significant challenges when deploying deep neural networks due to limited computational capacity and battery life. Computational offloading to companion devices like smartphones and cloud servers addresses processing limitations, but data transmission becomes a critical bottleneck, consuming over 50% of total energy in some scenarios. Although lossless compression methods provide limited data reduction for intermediate tensors, lossy techniques such as Vector Quantization (VQ) offer higher compression ratios (requiring only 3.3 bits per float) at the expense of inference accuracy degradation. This paper presents an adaptive multi-stage compression framework that dynamically balances these trade-offs across the SEW-phone-cloud continuum. We employ VQ at the SEW-phone interface where aggressive compression is essential (achieving 89.6% tensor size reduction with 90% retained accuracy), followed by adaptive selection between quantization and run-length encoding for phone-to-cloud transmission based on network conditions. A Deep Q-Network (DQN) agent jointly optimizes network partitioning points and compression strategies to minimize energy consumption while preserving accuracy and meeting latency constraints. A large simulation campaign considering object detection and human pose estimation tasks demonstrate that our method achieves 55--70% energy savings and 86--91% violation reduction compared to Neurosurgeon (a dynamic partitioning baseline without compression), 45.8% energy savings versus local execution, and 61.1% savings over uncompressed offloading, with latency violation rates below 9% and acceptable accuracy loss (8.0--8.1%). These results enable practical deployment of AI applications on battery-limited SEW devices.

Energy-efficient Dynamic Partitioning and Tensors Compression of AI Applications in Smart Eyewears

A. W. Kambale;S. Shokrivahed;G. Verticale;F. Palermo;D. Trojaniello;D. Ardagna

In corso di stampa

Abstract

Resource-constrained smart eyewear (SEW) devices face significant challenges when deploying deep neural networks due to limited computational capacity and battery life. Computational offloading to companion devices like smartphones and cloud servers addresses processing limitations, but data transmission becomes a critical bottleneck, consuming over 50% of total energy in some scenarios. Although lossless compression methods provide limited data reduction for intermediate tensors, lossy techniques such as Vector Quantization (VQ) offer higher compression ratios (requiring only 3.3 bits per float) at the expense of inference accuracy degradation. This paper presents an adaptive multi-stage compression framework that dynamically balances these trade-offs across the SEW-phone-cloud continuum. We employ VQ at the SEW-phone interface where aggressive compression is essential (achieving 89.6% tensor size reduction with 90% retained accuracy), followed by adaptive selection between quantization and run-length encoding for phone-to-cloud transmission based on network conditions. A Deep Q-Network (DQN) agent jointly optimizes network partitioning points and compression strategies to minimize energy consumption while preserving accuracy and meeting latency constraints. A large simulation campaign considering object detection and human pose estimation tasks demonstrate that our method achieves 55--70% energy savings and 86--91% violation reduction compared to Neurosurgeon (a dynamic partitioning baseline without compression), 45.8% energy savings versus local execution, and 61.1% savings over uncompressed offloading, with latency violation rates below 9% and acceptable accuracy loss (8.0--8.1%). These results enable practical deployment of AI applications on battery-limited SEW devices.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				In corso di stampa
			
	Titolo del libro
	
				Proceedings of the 17th ACM/SPEC International Conference on Performance Engineering (ICPE '26)
			
	ISBN (International Standard Book Number)
	
				979-8-4007-2325-4
			
	Parole chiave
	
				Deep Reinforcement Learning; Offloading; Tensor Compression; Cloud computing
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
ICPE_RL_with_tensor_compression.pdf accesso aperto : Pre-Print (o Pre-Refereeing) Dimensione 2.23 MB Formato Adobe PDF Visualizza/Apri	2.23 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1307270

Citazioni

ND

ND

ND

social impact