Modern embedded systems are in charge of an increasing number of tasks that extensively employ floating-point (FP)computations. The ever-increasing efficiency requirement, coupled with the additional computational effort to perform FP computations, motivates several microarchitectural optimizations of the FPU. This manuscript presents a novel modular FPU microarchitecture, which targets modern embedded systems and considers heterogeneous workloads including both best-effort and accuracy-sensitive applications. The design optimizes the EDP-accuracy-area figure of merit by allowing, at design-time, to independently configure the precision of each FP operation, while the FP dynamic range is kept common to the entire FPU to deliver a simpler microarchitecture. To ensure the correct execution of accuracy-sensitive applications, a novel compiler pass allows to substitute each FP operation for which a low-precision hardware support is offered with the corresponding soft-float function call. The assessment considers seven FPU variants encompassing three different state-of-the-art designs. The results on several representative use cases show that thebinary32FPU implementation offers an EDP gain of 15%, while, in case the FPU implements a mix ofbinary32andbfloat16operations, the EDP gain is 19%, the reduction in the resource utilization is 21% and the average accuracy loss is less than 2.5%. Moreover, the resource utilization of our FPU variants is aligned with the one of the FPU employing state-of-the-art, highly specialized FP hardware accelerators. Starting from the assessment, a set of guidelines is drawn to steer the design of the FP hardware support in modern embedded systems
An FPU design template to optimize the accuracy-efficiency-area trade-off
Davide Zoni;Andrea Galimberti;William Fornaciari
2021-01-01
Abstract
Modern embedded systems are in charge of an increasing number of tasks that extensively employ floating-point (FP)computations. The ever-increasing efficiency requirement, coupled with the additional computational effort to perform FP computations, motivates several microarchitectural optimizations of the FPU. This manuscript presents a novel modular FPU microarchitecture, which targets modern embedded systems and considers heterogeneous workloads including both best-effort and accuracy-sensitive applications. The design optimizes the EDP-accuracy-area figure of merit by allowing, at design-time, to independently configure the precision of each FP operation, while the FP dynamic range is kept common to the entire FPU to deliver a simpler microarchitecture. To ensure the correct execution of accuracy-sensitive applications, a novel compiler pass allows to substitute each FP operation for which a low-precision hardware support is offered with the corresponding soft-float function call. The assessment considers seven FPU variants encompassing three different state-of-the-art designs. The results on several representative use cases show that thebinary32FPU implementation offers an EDP gain of 15%, while, in case the FPU implements a mix ofbinary32andbfloat16operations, the EDP gain is 19%, the reduction in the resource utilization is 21% and the average accuracy loss is less than 2.5%. Moreover, the resource utilization of our FPU variants is aligned with the one of the FPU employing state-of-the-art, highly specialized FP hardware accelerators. Starting from the assessment, a set of guidelines is drawn to steer the design of the FP hardware support in modern embedded systemsFile | Dimensione | Formato | |
---|---|---|---|
suscomFPUtest.pdf
Open Access dal 01/01/2021
Descrizione: camera ready
:
Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione
2.59 MB
Formato
Adobe PDF
|
2.59 MB | Adobe PDF | Visualizza/Apri |
1-s2.0-S2210537920301761-main (1).pdf
Accesso riservato
Descrizione: versione pubblicata
:
Publisher’s version
Dimensione
2.73 MB
Formato
Adobe PDF
|
2.73 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.