TinyML techniques, such as pruning and quantization, enable the deployment of machine learning (ML) models on resource-constrained embedded devices by reducing computational and memory demands. However, the impact of these techniques varies across different ML architectures, influencing accuracy, inference speed, computation cost, and memory footprint. This paper presents a sensitivity analysis of three prominent architectures - CNN, LSTM, and Transformer - to TinyML techniques: L1-structured pruning and integer 8-bit quantization within an industrial anomaly detection application scenario. A two-stage optimization methodology is applied, first pruning models to reduce computational complexity and then quantizing them to minimize memory footprint. Performance is evaluated using Mean Squared Error (MSE), Floating Point Operations (FLOPs), Execution Time (ET), and Memory Space (MS). The findings highlight the trade-offs between computational efficiency and predictive accuracy, underscoring the need for architecture-aware TinyML strategies for optimized model deployment in embedded systems.
Sensitivity Analysis Exploration of ML Architectures to TinyML Techniques
Roveri, Manuel;
2025-01-01
Abstract
TinyML techniques, such as pruning and quantization, enable the deployment of machine learning (ML) models on resource-constrained embedded devices by reducing computational and memory demands. However, the impact of these techniques varies across different ML architectures, influencing accuracy, inference speed, computation cost, and memory footprint. This paper presents a sensitivity analysis of three prominent architectures - CNN, LSTM, and Transformer - to TinyML techniques: L1-structured pruning and integer 8-bit quantization within an industrial anomaly detection application scenario. A two-stage optimization methodology is applied, first pruning models to reduce computational complexity and then quantizing them to minimize memory footprint. Performance is evaluated using Mean Squared Error (MSE), Floating Point Operations (FLOPs), Execution Time (ET), and Memory Space (MS). The findings highlight the trade-offs between computational efficiency and predictive accuracy, underscoring the need for architecture-aware TinyML strategies for optimized model deployment in embedded systems.| File | Dimensione | Formato | |
|---|---|---|---|
|
Sensitivity_Analysis_Exploration_of_ML_Architectures_to_TinyML_Techniques.pdf
Accesso riservato
Dimensione
487.58 kB
Formato
Adobe PDF
|
487.58 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


