Predictive performance of multi-model ensemble forecasts of COVID-19 across European nations

Sherratt, Katharine; Gruson, Hugo; Grah, Rok; Johnson, Helen; Niehus, Rene; Prasse, Bastian; Sandmann, Frank; Deuschel, Jannik; Wolffram, Daniel; Abbott, Sam; Ullrich, Alexander; Gibson, Graham; Ray, Evan L; Reich, Nicholas G; Sheldon, Daniel; Wang, Yijin; Wattanachit, Nutcha; Wang, Lijing; Trnka, Jan; Obozinski, Guillaume; Sun, Tao; Thanou, Dorina; Pottier, Loic; Krymova, Ekaterina; Meinke, Jan H; Barbarossa, Maria Vittoria; Leithäuser, Neele; Mohring, Jan; Schneider, Johanna; Włazło, Jaroslaw; Fuhrmann, Jan; Lange, Berit; Rodiah, Isti; Baccam, Prasith; Gurung, Heidi; Stage, Steven; Suchoski, Bradley; Budzinski, Jozef; Walraven, Robert; Villanueva, Inmaculada; Tucek, Vit; Smid, Martin; Zajíček, Milan; Pérez Álvarez, Cesar; Reina, Borja; Bosse, Nikos I; Meakin, Sophie R; Castro, Lauren; Fairchild, Geoffrey; Michaud, Isaac; Osthus, Dave; Alaimo Di Loro, Pierfrancesco; Maruotti, Antonello; Eclerová, Veronika; Kraus, Andrea; Kraus, David; Pribylova, Lenka; Dimitris, Bertsimas; Michael Lingzhi, Li; Saksham, Soni; Dehning, Jonas; Mohr, Sebastian; Priesemann, Viola; Redlarski, Grzegorz; Bejar, Benjamin; Ardenghi, Giovanni; Parolini, Nicola; Ziarelli, Giovanni; Bock, Wolfgang; Heyder, Stefan; Hotz, Thomas; Singh, David E; Guzman-Merino, Miguel; Aznarte, Jose L; Moriña, David; Alonso, Sergio; Álvarez, Enric; López, Daniel; Prats, Clara; Burgard, Jan Pablo; Rodloff, Arne; Zimmermann, Tom; Kuhlmann, Alexander; Zibert, Janez; Pennoni, Fulvia; Divino, Fabio; Català, Marti; Lovison, Gianfranco; Giudici, Paolo; Tarantino, Barbara; Bartolucci, Francesco; Jona Lasinio, Giovanna; Mingione, Marco; Farcomeni, Alessio; Srivastava, Ajitesh; Montero-Manso, Pablo; Adiga, Aniruddha; Hurt, Benjamin; Lewis, Bryan; Marathe, Madhav; Porebski, Przemyslaw; Venkatramanan, Srinivasan; Bartczuk, Rafal P; Dreger, Filip; Gambin, Anna; Gogolewski, Krzysztof; Gruziel-Słomka, Magdalena; Krupa, Bartosz; Moszyński, Antoni; Niedzielewski, Karol; Nowosielski, Jedrzej; Radwan, Maciej; Rakowski, Franciszek; Semeniuk, Marcin; Szczurek, Ewa; Zieliński, Jakub; Kisielewski, Jan; Pabjan, Barbara; Kirsten, Holger; Kheifetz, Yuri; Scholz, Markus; Biecek, Przemyslaw; Bodych, Marcin; Filinski, Maciej; Idzikowski, Radoslaw; Krueger, Tyll; Ozanski, Tomasz; Bracher, Johannes; Funk, Sebastian

doi:10.7554/eLife.81916

: Background: Short-term forecasts of infectious disease contribute to situational awareness and capacity planning. Based on best practice in other fields and recent insights in infectious disease epidemiology, one can maximise forecasts' predictive performance by combining independent models into an ensemble. Here we report the performance of ensemble predictions of COVID-19 cases and deaths across Europe from March 2021 to March 2022. Methods: We created the European COVID-19 Forecast Hub, an online open-access platform where modellers upload weekly forecasts for 32 countries with results publicly visualised and evaluated. We created a weekly ensemble forecast from the equally-weighted average across individual models' predictive quantiles. We measured forecast accuracy using a baseline and relative Weighted Interval Score (rWIS). We retrospectively explored ensemble methods, including weighting by past performance. Results: We collected weekly forecasts from 48 models, of which we evaluated 29 models alongside the ensemble model. The ensemble had a consistently strong performance across countries over time, performing better on rWIS than 91% of forecasts for deaths (N=763 predictions from 20 models), and 83% forecasts for cases (N=886 predictions from 23 models). Performance remained stable over a 4-week horizon for death forecasts but declined with longer horizons for cases. Among ensemble methods, the most influential choice came from using a median average instead of the mean, regardless of weighting component models. Conclusions: Our results support combining independent models into an ensemble forecast to improve epidemiological predictions, and suggest that median averages yield better performance than methods based on means. We highlight that forecast consumers should place more weight on incident death forecasts than case forecasts at horizons greater than two weeks. Funding: European Commission, Ministerio de Ciencia, Innovación y Universidades, FEDER; Agència de Qualitat i Avaluació Sanitàries de Catalunya; Netzwerk Universitätsmedizin; Health Protection Research Unit; Wellcome Trust; European Centre for Disease Prevention and Control; Ministry of Science and Higher Education of Poland; Federal Ministry of Education and Research; Los Alamos National Laboratory; German Free State of Saxony; NCBiR; FISR 2020 Covid-19 I Fase; Spanish Ministry of Health / REACT-UE (FEDER); National Institutes of General Medical Sciences; Ministerio de Sanidad/ISCIII; PERISCOPE European H2020; PERISCOPE European H2021; InPresa; National Institutes of Health, NSF, US Centers for Disease Control and Prevention, Google, University of Virginia, Defense Threat Reduction Agency.