Since its emergence in late 2019, the diffusion of SARS-CoV-2 is associated with the evolution of its viral genome. The co-occurrence of specific amino acid changes, collectively named ‘virus variant’, requires scrutiny (as variants may hugely impact the agent’s transmission, pathogenesis, or antigenicity); variant evolution is studied using phylogenetics. Yet, never has this problem been tackled by digging into data with ad hoc analysis techniques. Here we show that the emergence of variants can in fact be traced through data-driven methods, further capitalizing on the value of large collections of SARS-CoV-2 sequences. For all countries with sufficient data, we compute weekly counts of amino acid changes, unveil time-varying clusters of changes with similar—rapidly growing—dynamics, and then follow their evolution. Our method succeeds in timely associating clusters to variants of interest/concern, provided their change composition is well characterized. This allows us to detect variants’ emergence, rise, peak, and eventual decline under competitive pressure of another variant. Our early warning system, exclusively relying on deposited sequences, shows the power of big data in this context, and concurs to calling for the wide spreading of public SARS-CoV-2 genome sequencing for improved surveillance and control of the COVID-19 pandemic.

Data-driven analysis of amino acid change dynamics timely reveals SARS-CoV-2 variant emergence

Bernasconi A.;Mari L.;Casagrandi R.;Ceri S.
2021-01-01

Abstract

Since its emergence in late 2019, the diffusion of SARS-CoV-2 is associated with the evolution of its viral genome. The co-occurrence of specific amino acid changes, collectively named ‘virus variant’, requires scrutiny (as variants may hugely impact the agent’s transmission, pathogenesis, or antigenicity); variant evolution is studied using phylogenetics. Yet, never has this problem been tackled by digging into data with ad hoc analysis techniques. Here we show that the emergence of variants can in fact be traced through data-driven methods, further capitalizing on the value of large collections of SARS-CoV-2 sequences. For all countries with sufficient data, we compute weekly counts of amino acid changes, unveil time-varying clusters of changes with similar—rapidly growing—dynamics, and then follow their evolution. Our method succeeds in timely associating clusters to variants of interest/concern, provided their change composition is well characterized. This allows us to detect variants’ emergence, rise, peak, and eventual decline under competitive pressure of another variant. Our early warning system, exclusively relying on deposited sequences, shows the power of big data in this context, and concurs to calling for the wide spreading of public SARS-CoV-2 genome sequencing for improved surveillance and control of the COVID-19 pandemic.
2021
virus genomics
virus evolution
SARS-CoV-2 variants
File in questo prodotto:
File Dimensione Formato  
Datadriven-analysis-of-amino-acid-change-dynamics-timely-reveals-SARSCoV2-variant-emergenceScientific-Reports.pdf

accesso aperto

: Publisher’s version
Dimensione 3.66 MB
Formato Adobe PDF
3.66 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1189402
Citazioni
  • ???jsp.display-item.citation.pmc??? 14
  • Scopus 17
  • ???jsp.display-item.citation.isi??? 11
social impact