About every three years, the high-end image resolution quadruples: what we called high-resolution in 2018 is turning standard now. Even simpler embedded devices can shoot videos at 4K resolution. The combination of this rush to wider frames with the blooming era of Computer Vision (CV) constantly pushes the performance requirements of the underlying processing. Template matching is one of the CV's foundations as it enables the localization of objects inside images. It exploits similarity functions such as the 2D Cross-Correlation and its variants Normalized Cross-Correlation (NCC) and Zero-mean NCC (ZNCC). However, these computations do not scale gracefully with resolution. We propose two novel FPGA-based implementations with low hardware resource consumption for the 2D NCC and ZNCC for large-scale images. We succeeded in fitting our accelerator on the 3CG class of Xilinx Zynq UltraScale+ ARM-based MPSoCs, among the smallest embedded-grade classes that do not even include dedicated CV hardware, which adds a 14% cost overhead, thus enabling accelerated template matching for local preprocessing in IoT applications. We achieve this while attaining a oldsymbol{3.52} imes speedup over non-embedded systems and remaining mathbf{43.2} imes more power efficient for NCC. Finally, to fully exploit the heterogeneous nature of our target hardware, we provide a runtime hardware selection algorithm to automatically target the proper hardware/software implementation for best performance.

FPGA-based implementation of 2D Normalized Cross-Correlation for Large Scale Signals

Salaris M.;Damiani A.;Stornaiuolo L.
2021-01-01

Abstract

About every three years, the high-end image resolution quadruples: what we called high-resolution in 2018 is turning standard now. Even simpler embedded devices can shoot videos at 4K resolution. The combination of this rush to wider frames with the blooming era of Computer Vision (CV) constantly pushes the performance requirements of the underlying processing. Template matching is one of the CV's foundations as it enables the localization of objects inside images. It exploits similarity functions such as the 2D Cross-Correlation and its variants Normalized Cross-Correlation (NCC) and Zero-mean NCC (ZNCC). However, these computations do not scale gracefully with resolution. We propose two novel FPGA-based implementations with low hardware resource consumption for the 2D NCC and ZNCC for large-scale images. We succeeded in fitting our accelerator on the 3CG class of Xilinx Zynq UltraScale+ ARM-based MPSoCs, among the smallest embedded-grade classes that do not even include dedicated CV hardware, which adds a 14% cost overhead, thus enabling accelerated template matching for local preprocessing in IoT applications. We achieve this while attaining a oldsymbol{3.52} imes speedup over non-embedded systems and remaining mathbf{43.2} imes more power efficient for NCC. Finally, to fully exploit the heterogeneous nature of our target hardware, we provide a runtime hardware selection algorithm to automatically target the proper hardware/software implementation for best performance.
2021
6th International Forum on Research and Technology for Society and Industry, RTSI 2021 - Proceedings
978-1-6654-4135-3
FPGA
Normalized Cross-Correlation
PYNQ
SciPy
SoC
Zero-mean Normalized Cross-Correlation
Zynq
File in questo prodotto:
File Dimensione Formato  
FPGA-based_implementation_of_2D_Normalized_Cross-Correlation_for_Large_Scale_Signals.pdf

Accesso riservato

: Publisher’s version
Dimensione 795.27 kB
Formato Adobe PDF
795.27 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1203595
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact