NERONE: The Fast Way to Efficiently Execute Your Deep Learning Algorithm At the Edge

Berzoini, Raffaele; D'Arnese, Eleonora; Conficconi, Davide; Santambrogio, Marco D.

doi:10.1109/JBHI.2023.3296142

: Semantic segmentation and classification are pivotal in many clinical applications, such as radiation dose quantification and surgery planning. While manually labeling images is highly time-consuming, the advent of Deep Learning (DL) has introduced a valuable alternative. Nowadays, DL models inference is run on Graphics Processing Units (GPUs), which are power-hungry devices, and, therefore, are not the most suited solution in constrained environments where Field Programmable Gate Arrays (FPGAs) become an appealing alternative given their remarkable performance per watt ratio. Unfortunately, FPGAs are hard to use for non-experts, and the creation of tools to open their employment to the computer vision community is still limited. For these reasons, we propose NERONE, which allows end users to seamlessly benefit from FPGA acceleration and energy efficiency without modifying their DL development flows. To prove the capability of NERONE to cover different network architectures, we have developed four models, one for each of the chosen datasets (three for segmentation and one for classification), and we deployed them, thanks to NERONE, on three different embedded FPGA-powered boards achieving top average energy efficiency improvements of 3.4× and 1.9× against a mobile and a datacenter GPU devices, respectively.