TinyML techniques, such as pruning and quantization, enable the deployment of machine learning (ML) models on resource-constrained embedded devices by reducing computational and memory demands. However, the impact of these techniques varies across different ML architectures, influencing accuracy, inference speed, computation cost, and memory footprint. This paper presents a sensitivity analysis of three prominent architectures - CNN, LSTM, and Transformer - to TinyML techniques: L1-structured pruning and integer 8-bit quantization within an industrial anomaly detection application scenario. A two-stage optimization methodology is applied, first pruning models to reduce computational complexity and then quantizing them to minimize memory footprint. Performance is evaluated using Mean Squared Error (MSE), Floating Point Operations (FLOPs), Execution Time (ET), and Memory Space (MS). The findings highlight the trade-offs between computational efficiency and predictive accuracy, underscoring the need for architecture-aware TinyML strategies for optimized model deployment in embedded systems.

Sensitivity Analysis Exploration of ML Architectures to TinyML Techniques

Roveri, Manuel;
2025-01-01

Abstract

TinyML techniques, such as pruning and quantization, enable the deployment of machine learning (ML) models on resource-constrained embedded devices by reducing computational and memory demands. However, the impact of these techniques varies across different ML architectures, influencing accuracy, inference speed, computation cost, and memory footprint. This paper presents a sensitivity analysis of three prominent architectures - CNN, LSTM, and Transformer - to TinyML techniques: L1-structured pruning and integer 8-bit quantization within an industrial anomaly detection application scenario. A two-stage optimization methodology is applied, first pruning models to reduce computational complexity and then quantizing them to minimize memory footprint. Performance is evaluated using Mean Squared Error (MSE), Floating Point Operations (FLOPs), Execution Time (ET), and Memory Space (MS). The findings highlight the trade-offs between computational efficiency and predictive accuracy, underscoring the need for architecture-aware TinyML strategies for optimized model deployment in embedded systems.
2025
Proceedings of the 2025 International Conference on Advanced Machine Learning and Data Science, AMLDS 2025
979-8-3315-1099-2
Anomaly Detection
CNN
LSTM
Pruning
Quantization
TinyML
Transformer
File in questo prodotto:
File Dimensione Formato  
Sensitivity_Analysis_Exploration_of_ML_Architectures_to_TinyML_Techniques.pdf

Accesso riservato

Dimensione 487.58 kB
Formato Adobe PDF
487.58 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1309039
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact