This paper explores optimization techniques of the synchronization mechanisms for MPSoCs based on complex interconnect (Network-on-Chip), targeted at future mobile systems. We suggest the architecture of the memory controller optimized to minimize synchronization overhead. The proposed solution is based on the idea of performing synchronization operations which require the continuous polling of a shared variable, thus featuring large contention (e.g. spin locks), locally in the memory. We introduce a HW module, which augments the memory controller, the Synchronization-operation Buffer (SB), which queues and manages the requests issued by the processors. Experimental validation has been carried out by using GRAPES, a cycle-accurate performance/power simulation platform. For an 8-processor target architecture, we show that the proposed solution achieves up to 40% performance improvement and 25% energy saving with respect to synchronization based on the caching of the synchronization variables and directory-based coherency protocol. Furthermore, we prove the scalability of the proposed approach when the number of processors increases.
An Efficient Synchronization Technique for Multiprocessor Systems on-Chip
MONCHIERO, MATTEO;PALERMO, GIANLUCA;SILVANO, CRISTINA;VILLA, ORESTE
2006-01-01
Abstract
This paper explores optimization techniques of the synchronization mechanisms for MPSoCs based on complex interconnect (Network-on-Chip), targeted at future mobile systems. We suggest the architecture of the memory controller optimized to minimize synchronization overhead. The proposed solution is based on the idea of performing synchronization operations which require the continuous polling of a shared variable, thus featuring large contention (e.g. spin locks), locally in the memory. We introduce a HW module, which augments the memory controller, the Synchronization-operation Buffer (SB), which queues and manages the requests issued by the processors. Experimental validation has been carried out by using GRAPES, a cycle-accurate performance/power simulation platform. For an 8-processor target architecture, we show that the proposed solution achieves up to 40% performance improvement and 25% energy saving with respect to synchronization based on the caching of the synchronization variables and directory-based coherency protocol. Furthermore, we prove the scalability of the proposed approach when the number of processors increases.File | Dimensione | Formato | |
---|---|---|---|
MEDEA06.pdf
Accesso riservato
:
Post-Print (DRAFT o Author’s Accepted Manuscript-AAM)
Dimensione
257.89 kB
Formato
Adobe PDF
|
257.89 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.