This paper explores optimization techniques of the synchronization mechanisms for MPSoCs based on complex interconnect (Network-on-Chip), targeted at future mobile systems. We suggest the architecture of the memory controller optimized to minimize synchronization overhead. The proposed solution is based on the idea of performing synchronization operations which require the continuous polling of a shared variable, thus featuring large contention (e.g. spin locks), locally in the memory. We introduce a HW module, which augments the memory controller, the Synchronization-operation Buffer (SB), which queues and manages the requests issued by the processors. Experimental validation has been carried out by using GRAPES, a cycle-accurate performance/power simulation platform. For an 8-processor target architecture, we show that the proposed solution achieves up to 40% performance improvement and 25% energy saving with respect to synchronization based on the caching of the synchronization variables and directory-based coherency protocol. Furthermore, we prove the scalability of the proposed approach when the number of processors increases.

An Efficient Synchronization Technique for Multiprocessor Systems on-Chip

MONCHIERO, MATTEO;PALERMO, GIANLUCA;SILVANO, CRISTINA;VILLA, ORESTE
2006-01-01

Abstract

This paper explores optimization techniques of the synchronization mechanisms for MPSoCs based on complex interconnect (Network-on-Chip), targeted at future mobile systems. We suggest the architecture of the memory controller optimized to minimize synchronization overhead. The proposed solution is based on the idea of performing synchronization operations which require the continuous polling of a shared variable, thus featuring large contention (e.g. spin locks), locally in the memory. We introduce a HW module, which augments the memory controller, the Synchronization-operation Buffer (SB), which queues and manages the requests issued by the processors. Experimental validation has been carried out by using GRAPES, a cycle-accurate performance/power simulation platform. For an 8-processor target architecture, we show that the proposed solution achieves up to 40% performance improvement and 25% energy saving with respect to synchronization based on the caching of the synchronization variables and directory-based coherency protocol. Furthermore, we prove the scalability of the proposed approach when the number of processors increases.
2006
Proceedings of the 2005 workshop on MEmory performance: DEaling with Applications , systems and architecture
File in questo prodotto:
File Dimensione Formato  
MEDEA06.pdf

Accesso riservato

: Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione 257.89 kB
Formato Adobe PDF
257.89 kB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/254361
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact