Dynamic Reliability Management (DRM) is a common approach to mitigate aging and wear-out effects in multi- /many-core systems. State-of-the-art DRM approaches apply finegrained control on resource management to increase/balance the chip reliability while considering other system constraints, e.g., performance, and power budget. Such approaches, acting on various knobs such as workload mapping and scheduling, Dynamic Voltage/Frequency Scaling (DVFS) and Per-Core Power Gating (PCPG), demonstrated to work properly with the various aging mechanisms, such as electromigration, and Negative-Bias Temperature Instability (NBTI). However, we claim that they do not suffice for thermal cycling. Thus, we here propose a novel thermal-cycling-aware DRM approach for shared-memory many-core systems running multi-threaded applications. The approach applies a fine-grained control capable at reducing both temperature levels and variations. The experimental evaluations demonstrated that the proposed approach is able to achieve 39% longer lifetime than past approaches.

Thermal-Cycling-aware Dynamic Reliability Management in Many-Core System-on-Chip

Miele A.;
2020-01-01

Abstract

Dynamic Reliability Management (DRM) is a common approach to mitigate aging and wear-out effects in multi- /many-core systems. State-of-the-art DRM approaches apply finegrained control on resource management to increase/balance the chip reliability while considering other system constraints, e.g., performance, and power budget. Such approaches, acting on various knobs such as workload mapping and scheduling, Dynamic Voltage/Frequency Scaling (DVFS) and Per-Core Power Gating (PCPG), demonstrated to work properly with the various aging mechanisms, such as electromigration, and Negative-Bias Temperature Instability (NBTI). However, we claim that they do not suffice for thermal cycling. Thus, we here propose a novel thermal-cycling-aware DRM approach for shared-memory many-core systems running multi-threaded applications. The approach applies a fine-grained control capable at reducing both temperature levels and variations. The experimental evaluations demonstrated that the proposed approach is able to achieve 39% longer lifetime than past approaches.
2020
Proceedings of the 2020 Design, Automation and Test in Europe Conference and Exhibition, DATE 2020
978-3-9819263-4-7
Lifetime reliability
resource management
Thermal cycling
File in questo prodotto:
File Dimensione Formato  
thermal_cycling_in_resource_management.pdf

accesso aperto

: Pre-Print (o Pre-Refereeing)
Dimensione 902.78 kB
Formato Adobe PDF
902.78 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1150421
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 6
social impact