With the recent advances in machine learning, Deep Convolutional Neural Networks (DCNNs) represent state-of-the-art solutions especially in image and speech recognition and classification. The most important enabler factor of deep learning consists of the massive computing power offered by programmable GPUs for training DCNNs on large amounts of data. Even the complexity of DCNN deployment scenarios, where trained models are used for inference, have started to require powerful computing systems. Especially in the embedded systems domain, the computational requirements along with ultra low-power and memory constraints exacerbate the situation even further. The STM Orlando ultra low-power processor architecture with convolutional neural network acceleration targets exactly this class of problems. Orlando SoC integrates HW-accelerated blocks together with DSPs and on-chip memory resources to enable energy-efficient convolutions for future generations of DCNNs. Although Orlando platform provides flexibility with programmable DSPs, the large variety of DCNN applications make it challenging the design space exploration of the next generations of Orlando architecture. Many hardware design parameters affect the performance and energy-efficiency of the Orlando SoC. Given the huge size of the design space, design space exploration (DSE) and cost-performance tradeoff analysis is needed to select the best set of parameters for the target DCNN application. In this work, we present the exploration results and the tradeoff analysis carried out for the Orlando architecture on the vee - 16 case study.
Design Space Exploration for Orlando Ultra Low-Power Convolutional Neural Network SoC
ERDEM, AHMET;Silvano, Cristina;
2018-01-01
Abstract
With the recent advances in machine learning, Deep Convolutional Neural Networks (DCNNs) represent state-of-the-art solutions especially in image and speech recognition and classification. The most important enabler factor of deep learning consists of the massive computing power offered by programmable GPUs for training DCNNs on large amounts of data. Even the complexity of DCNN deployment scenarios, where trained models are used for inference, have started to require powerful computing systems. Especially in the embedded systems domain, the computational requirements along with ultra low-power and memory constraints exacerbate the situation even further. The STM Orlando ultra low-power processor architecture with convolutional neural network acceleration targets exactly this class of problems. Orlando SoC integrates HW-accelerated blocks together with DSPs and on-chip memory resources to enable energy-efficient convolutions for future generations of DCNNs. Although Orlando platform provides flexibility with programmable DSPs, the large variety of DCNN applications make it challenging the design space exploration of the next generations of Orlando architecture. Many hardware design parameters affect the performance and energy-efficiency of the Orlando SoC. Given the huge size of the design space, design space exploration (DSE) and cost-performance tradeoff analysis is needed to select the best set of parameters for the target DCNN application. In this work, we present the exploration results and the tradeoff analysis carried out for the Orlando architecture on the vee - 16 case study.File | Dimensione | Formato | |
---|---|---|---|
ASAP_2018_08445096.pdf
Accesso riservato
Descrizione: Articolo pubblicato
:
Publisher’s version
Dimensione
2.36 MB
Formato
Adobe PDF
|
2.36 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.