The silicon technology continues reducing scale following the Moore's law. Device variability increases due to a lost in controllability during silicon chip fabrication. The current methodologies based on error detection and thread re-execution (roll back) cannot be enough, when the number of errors increase and arrive to a threshold. This dynamic scenario can be very negative if we are executing programs in HPC systems where a correct, accurate and time constraints solution is expected. The objective of the paper is to show preliminary results of Barbeque OpenSource Project (BOSP) and its potential use in HPC systems.

Framework for scheduling and resource management in time-constrained HPC application

MASSARI, GIUSEPPE;FORNACIARI, WILLIAM;
2015-01-01

Abstract

The silicon technology continues reducing scale following the Moore's law. Device variability increases due to a lost in controllability during silicon chip fabrication. The current methodologies based on error detection and thread re-execution (roll back) cannot be enough, when the number of errors increase and arrive to a threshold. This dynamic scenario can be very negative if we are executing programs in HPC systems where a correct, accurate and time constraints solution is expected. The objective of the paper is to show preliminary results of Barbeque OpenSource Project (BOSP) and its potential use in HPC systems.
2015
AIP Conference Proceedings
9780735412873
computing resource management; disaster management; HPC systems; multi and many-cores systems; parallel systems; reliability models; Physics and Astronomy (all)
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1027611
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 6
social impact