RE.PUBLIC@POLIMI pubblicazioni di ricerca del Politecnico di Milano

Designing and validating efficient cache-coherent memory subsystems is a critical yet complex task in the development of modern multi-core system-on-chip architectures. Rhea is a unified framework that streamlines the design and system-level validation of RTL cache-coherent memory subsystems. On the design side, Rhea generates synthesizable, highly configurable RTL supporting various architectural parameters. On the validation side, Rhea integrates Verilator's cycle-accurate RTL simulation with gem5's full-system simulation, allowing realistic workloads and operating systems to run alongside the actual RTL under test. We apply Rhea to design MSI-based RTL memory subsystems with one and two levels of private caches and scaling up to sixteen cores. Their evaluation with 22 applications from state-of-the-art benchmark suites shows intermediate performance relative to gem5 Ruby's MI and MOESI models. The hybrid gem5-Verilator co-simulation flow incurs a moderate simulation overhead, up to 2.7 times compared to gem5 MI, but achieves higher fidelity by simulating real RTL hardware. This overhead decreases with scale, down to 1.6 times in sixteen-core scenarios. These results demonstrate Rhea's effectiveness and scalability in enabling fast development of RTL cache-coherent memory subsystem designs.

Rhea: a Framework for Fast Design and Validation of RTL Cache-Coherent Memory Subsystems

Davide Zoni;Andrea Galimberti;Adriano Guarisco

2025-01-01

Abstract

Designing and validating efficient cache-coherent memory subsystems is a critical yet complex task in the development of modern multi-core system-on-chip architectures. Rhea is a unified framework that streamlines the design and system-level validation of RTL cache-coherent memory subsystems. On the design side, Rhea generates synthesizable, highly configurable RTL supporting various architectural parameters. On the validation side, Rhea integrates Verilator's cycle-accurate RTL simulation with gem5's full-system simulation, allowing realistic workloads and operating systems to run alongside the actual RTL under test. We apply Rhea to design MSI-based RTL memory subsystems with one and two levels of private caches and scaling up to sixteen cores. Their evaluation with 22 applications from state-of-the-art benchmark suites shows intermediate performance relative to gem5 Ruby's MI and MOESI models. The hybrid gem5-Verilator co-simulation flow incurs a moderate simulation overhead, up to 2.7 times compared to gem5 MI, but achieves higher fidelity by simulating real RTL hardware. This overhead decreases with scale, down to 1.6 times in sixteen-core scenarios. These results demonstrate Rhea's effectiveness and scalability in enabling fast development of RTL cache-coherent memory subsystem designs.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2025
			
	Titolo del libro
	
				Proceedings of the 44th International Conference on Computer-Aided Design
			
	Appare nelle tipologie:
	
				04.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
152_Final_Manuscript.pdf Accesso riservato : Post-Print (DRAFT o Author’s Accepted Manuscript-AAM) Dimensione 833.75 kB Formato Adobe PDF Visualizza/Apri	833.75 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1298053

Citazioni

ND

1

ND

social impact