RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

TinyML techniques, such as pruning and quantization, enable the deployment of machine learning (ML) models on resource-constrained embedded devices by reducing computational and memory demands. However, the impact of these techniques varies across different ML architectures, influencing accuracy, inference speed, computation cost, and memory footprint. This paper presents a sensitivity analysis of three prominent architectures - CNN, LSTM, and Transformer - to TinyML techniques: L1-structured pruning and integer 8-bit quantization within an industrial anomaly detection application scenario. A two-stage optimization methodology is applied, first pruning models to reduce computational complexity and then quantizing them to minimize memory footprint. Performance is evaluated using Mean Squared Error (MSE), Floating Point Operations (FLOPs), Execution Time (ET), and Memory Space (MS). The findings highlight the trade-offs between computational efficiency and predictive accuracy, underscoring the need for architecture-aware TinyML strategies for optimized model deployment in embedded systems.

Sensitivity Analysis Exploration of ML Architectures to TinyML Techniques

Sharifirad, Iman;Boudjadar, Jalil;Roveri, Manuel;Larsen, Peter Gorm

2025-01-01

Abstract

TinyML techniques, such as pruning and quantization, enable the deployment of machine learning (ML) models on resource-constrained embedded devices by reducing computational and memory demands. However, the impact of these techniques varies across different ML architectures, influencing accuracy, inference speed, computation cost, and memory footprint. This paper presents a sensitivity analysis of three prominent architectures - CNN, LSTM, and Transformer - to TinyML techniques: L1-structured pruning and integer 8-bit quantization within an industrial anomaly detection application scenario. A two-stage optimization methodology is applied, first pruning models to reduce computational complexity and then quantizing them to minimize memory footprint. Performance is evaluated using Mean Squared Error (MSE), Floating Point Operations (FLOPs), Execution Time (ET), and Memory Space (MS). The findings highlight the trade-offs between computational efficiency and predictive accuracy, underscoring the need for architecture-aware TinyML strategies for optimized model deployment in embedded systems.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Titolo del libro
	
				Proceedings of the 2025 International Conference on Advanced Machine Learning and Data Science, AMLDS 2025
			
	ISBN (International Standard Book Number)
	
				979-8-3315-1099-2
			
	Parole chiave
	
				Anomaly Detection
CNN
LSTM
Pruning
Quantization
TinyML
Transformer
			
	Appare nelle tipologie:
	
				02.1 Contributo in Volume

File in questo prodotto:

File	Dimensione	Formato
Sensitivity_Analysis_Exploration_of_ML_Architectures_to_TinyML_Techniques.pdf Accesso riservato Dimensione 487.58 kB Formato Adobe PDF Visualizza/Apri	487.58 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1309039

Citazioni

ND

0

ND

social impact