With the ever-increasing energy-efficiency requirements for the computing platforms at the edge, precision tuning techniques highlight the possibility of improving the efficiency of floating-point computations by selectively lowering the precision of intermediate operations without affecting the accuracy of the final result. Recent trends also demonstrated the possibility of successfully employing fixed-point computations in place of floating-point ones to further optimize the efficiency in the high-performance computing (HPC) domain. However, the use of integer functional units to support the execution of fixed-point operations in embedded platforms can severely degrade the energy-delay product (EDP). This work presents a cost-effective architecture to efficiently support fixed-point computations in embedded systems with two goals. On the one hand, it allows replacing the floating-point computations, with meaningful area and EDP improvements. On the other hand, it can complement the FPU, providing flexibility in selecting the best arithmetic for the target applications depending on their accuracy and performance requirements. Experimental results were collected from a representative set of floating-point-intensive applications executed on six variants of the baseline system-on-chip (SoC). We compared the efficiency of different floating- and fixed-point architectures in terms of accuracy, area, and EDP. Our fixed-point solution demonstrated a 0.651 EDP, normalized with respect to a SoC that featured a binary32 FPU, while achieving a negligible accuracy loss, i.e., 0.0003% on average (0.004% peak), compared to binary32 execution. In contrast, a SoC employing the integer functional units for fixed-point execution reports a 1.796 normalized EDP to achieve the same accuracy loss. Compared to the baseline SoC implementing only the integer units, the proposed architecture shows an area overhead limited to 4%, while the SoC featuring binary32 floating-point hardware support requires 32% more resources.
Cost-effective fixed-point hardware support for RISC-V embedded systems
D. Zoni;A. Galimberti
2022-01-01
Abstract
With the ever-increasing energy-efficiency requirements for the computing platforms at the edge, precision tuning techniques highlight the possibility of improving the efficiency of floating-point computations by selectively lowering the precision of intermediate operations without affecting the accuracy of the final result. Recent trends also demonstrated the possibility of successfully employing fixed-point computations in place of floating-point ones to further optimize the efficiency in the high-performance computing (HPC) domain. However, the use of integer functional units to support the execution of fixed-point operations in embedded platforms can severely degrade the energy-delay product (EDP). This work presents a cost-effective architecture to efficiently support fixed-point computations in embedded systems with two goals. On the one hand, it allows replacing the floating-point computations, with meaningful area and EDP improvements. On the other hand, it can complement the FPU, providing flexibility in selecting the best arithmetic for the target applications depending on their accuracy and performance requirements. Experimental results were collected from a representative set of floating-point-intensive applications executed on six variants of the baseline system-on-chip (SoC). We compared the efficiency of different floating- and fixed-point architectures in terms of accuracy, area, and EDP. Our fixed-point solution demonstrated a 0.651 EDP, normalized with respect to a SoC that featured a binary32 FPU, while achieving a negligible accuracy loss, i.e., 0.0003% on average (0.004% peak), compared to binary32 execution. In contrast, a SoC employing the integer functional units for fixed-point execution reports a 1.796 normalized EDP to achieve the same accuracy loss. Compared to the baseline SoC implementing only the integer units, the proposed architecture shows an area overhead limited to 4%, while the SoC featuring binary32 floating-point hardware support requires 32% more resources.File | Dimensione | Formato | |
---|---|---|---|
1-s2.0-S1383762122000595-main.pdf
Accesso riservato
Descrizione: main document
:
Publisher’s version
Dimensione
1.09 MB
Formato
Adobe PDF
|
1.09 MB | Adobe PDF | Visualizza/Apri |
preprint.pdf
accesso aperto
:
Pre-Print (o Pre-Refereeing)
Dimensione
472.54 kB
Formato
Adobe PDF
|
472.54 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.