In the era of Big Data, whose digital industry is facing the massive growth of data size and development of data intensive software, more and more companies are moving to use new frameworks and paradigms capable of handling data at scale. The outstanding MapRe- duce (MR) paradigm and its implementation framework, Hadoop are among the most re- ferred ones, and basis for later and more advanced frameworks like Tez and Spark. Accurate prediction of the execution time of a Big Data application helps improving design time de- cisions, reduces over allocation charges, and assists budget management. In this regard, we propose analytical models based on the Stochastic Activity Networks (SANs) to accurately model the execution of MR, Tez and Spark applications in Hadoop environments governed by the YARN Capacity scheduler. We evaluate the accuracy of the proposed models over the TPC-DS industry benchmark across different configurations. Results obtained by numeri- cally solving analytical SAN models show an average error of 6% in estimating the execution time of an application compared to the data gathered from experiments and moreover the model evaluation time is lower than simulation time of state of the art solutions.
|Titolo:||Analytical composite performance models for Big Data applications|
|Data di pubblicazione:||2019|
|Appare nelle tipologie:||01.1 Articolo in Rivista|