In this paper, we propose the dynamic configuration of application specific implicit instructions for pipelined processors to better exploit the available parallelism at instruction level. Given the target application, the compiler selects a set of candidate instructions to be implicitly executed – i.e. their execution is controlled through a data-driven model, which avoids explicit instruction fetch. Consequently, the clock cycles usually required for the explicit issues are saved, thus improving the performance and reducing the code size. The compiler generates the reconfiguration operations to properly setup the data-path. The processor pipeline has been optimized to support the parallel execution of implicitly issued instructions, requiring a limited hardware overhead. The proposed technique has a negligible impact on the processor ISA – only reconfiguration instructions are added – which also benefits the compiler development times, since the optimization can be almost seamlessly added to an existing compilation tool-chain. The proposed approach has been applied to DSP and multimedia kernel loops, comparing its performance with those of two different baseline architectures: a scalar MIPS processor and a 4-issue VLIW processor of the LX family provided by STMicroelectronics. Experimental results show a speedup ranging from 10 to 35%, and an average code size reduction of 19%.

Dynamic Configuration of Application-Specific Implicit Instructions for Embedded Pipelined Processors

AGOSTA, GIOVANNI;SILVANO, CRISTINA;SYKORA, MARTINO
2008

Abstract

In this paper, we propose the dynamic configuration of application specific implicit instructions for pipelined processors to better exploit the available parallelism at instruction level. Given the target application, the compiler selects a set of candidate instructions to be implicitly executed – i.e. their execution is controlled through a data-driven model, which avoids explicit instruction fetch. Consequently, the clock cycles usually required for the explicit issues are saved, thus improving the performance and reducing the code size. The compiler generates the reconfiguration operations to properly setup the data-path. The processor pipeline has been optimized to support the parallel execution of implicitly issued instructions, requiring a limited hardware overhead. The proposed technique has a negligible impact on the processor ISA – only reconfiguration instructions are added – which also benefits the compiler development times, since the optimization can be almost seamlessly added to an existing compilation tool-chain. The proposed approach has been applied to DSP and multimedia kernel loops, comparing its performance with those of two different baseline architectures: a scalar MIPS processor and a 4-issue VLIW processor of the LX family provided by STMicroelectronics. Experimental results show a speedup ranging from 10 to 35%, and an average code size reduction of 19%.
Proceedings of the 2008 ACM Symposium on Applied Computing (SAC),
9781595937537
File in questo prodotto:
File Dimensione Formato  
SAC2008_p1509-sykora.pdf

Accesso riservato

: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 193.4 kB
Formato Adobe PDF
193.4 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11311/545312
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact