Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

In infected individuals, viruses are present as a population consisting of dominant and minor variant genomes. Most databases contain information on the dominant genome sequence. Since the emergence of SARS-CoV-2 in late 2019, variants have been selected that are more transmissible and capable of partial immune escape. Currently, models for projecting the evolution of SARS-CoV-2 are based on using dominant genome sequences to forecast whether a known mutation will be prevalent in the future. However, novel variants of SARS-CoV-2 (and other viruses) are driven by evolutionary pressure acting on minor variant genomes, which then become dominant and form a potential next wave of infection. In this study, sequencing data from 96 209 patients, sampled over a 3-year period, were used to analyse patterns of minor variant genomes. These data were used to develop unsupervised machine learning clusters to identify amino acids that had a greater potential for mutation than others in the Spike protein. Being able to identify amino acids that may be present in future variants would better inform the design of longer-lived medical countermeasures and allow a risk-based evaluation of viral properties, including assessment of transmissibility and immune escape, thus providing candidates with early warning signals for when a new variant of SARS-CoV-2 emerges.

Original publication

DOI

10.1093/nar/gkaf077

Type

Journal

Nucleic Acids Res

Publication Date

08/02/2025

Volume

53

Keywords

SARS-CoV-2, Humans, Genome, Viral, COVID-19, Spike Glycoprotein, Coronavirus, Mutation, Machine Learning