About every three years, the high-end image resolution quadruples: what we called high-resolution in 2018 is turning standard now. Even simpler embedded devices can shoot videos at 4K resolution. The combination of this rush to wider frames with the blooming era of Computer Vision (CV) constantly pushes the performance requirements of the underlying processing. Template matching is one of the CV's foundations as it enables the localization of objects inside images. It exploits similarity functions such as the 2D Cross-Correlation and its variants Normalized Cross-Correlation (NCC) and Zero-mean NCC (ZNCC). However, these computations do not scale gracefully with resolution. We propose two novel FPGA-based implementations with low hardware resource consumption for the 2D NCC and ZNCC for large-scale images. We succeeded in fitting our accelerator on the 3CG class of Xilinx Zynq UltraScale+ ARM-based MPSoCs, among the smallest embedded-grade classes that do not even include dedicated CV hardware, which adds a 14% cost overhead, thus enabling accelerated template matching for local preprocessing in IoT applications. We achieve this while attaining a oldsymbol{3.52} imes speedup over non-embedded systems and remaining mathbf{43.2} imes more power efficient for NCC. Finally, to fully exploit the heterogeneous nature of our target hardware, we provide a runtime hardware selection algorithm to automatically target the proper hardware/software implementation for best performance.
FPGA-based implementation of 2D Normalized Cross-Correlation for Large Scale Signals
Salaris M.;Damiani A.;Stornaiuolo L.
2021-01-01
Abstract
About every three years, the high-end image resolution quadruples: what we called high-resolution in 2018 is turning standard now. Even simpler embedded devices can shoot videos at 4K resolution. The combination of this rush to wider frames with the blooming era of Computer Vision (CV) constantly pushes the performance requirements of the underlying processing. Template matching is one of the CV's foundations as it enables the localization of objects inside images. It exploits similarity functions such as the 2D Cross-Correlation and its variants Normalized Cross-Correlation (NCC) and Zero-mean NCC (ZNCC). However, these computations do not scale gracefully with resolution. We propose two novel FPGA-based implementations with low hardware resource consumption for the 2D NCC and ZNCC for large-scale images. We succeeded in fitting our accelerator on the 3CG class of Xilinx Zynq UltraScale+ ARM-based MPSoCs, among the smallest embedded-grade classes that do not even include dedicated CV hardware, which adds a 14% cost overhead, thus enabling accelerated template matching for local preprocessing in IoT applications. We achieve this while attaining a oldsymbol{3.52} imes speedup over non-embedded systems and remaining mathbf{43.2} imes more power efficient for NCC. Finally, to fully exploit the heterogeneous nature of our target hardware, we provide a runtime hardware selection algorithm to automatically target the proper hardware/software implementation for best performance.File | Dimensione | Formato | |
---|---|---|---|
FPGA-based_implementation_of_2D_Normalized_Cross-Correlation_for_Large_Scale_Signals.pdf
Accesso riservato
:
Publisher’s version
Dimensione
795.27 kB
Formato
Adobe PDF
|
795.27 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.