Assignment of epidemiological lineages in an emerging pandemic using the pangolin tool.
O'Toole Á., Scher E., Underwood A., Jackson B., Hill V., McCrone JT., Colquhoun R., Ruis C., Abu-Dahab K., Taylor B., Yeats C., du Plessis L., Maloney D., Medd N., Attwood SW., Aanensen DM., Holmes EC., Pybus OG., Rambaut A.
The response of the global virus genomics community to the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has been unprecedented, with significant advances made towards the 'real-time' generation and sharing of SARS-CoV-2 genomic data. The rapid growth in virus genome data production has necessitated the development of new analytical methods that can deal with orders of magnitude of more genomes than previously available. Here, we present and describe Phylogenetic Assignment of Named Global Outbreak Lineages (pangolin), a computational tool that has been developed to assign the most likely lineage to a given SARS-CoV-2 genome sequence according to the Pango dynamic lineage nomenclature scheme. To date, nearly two million virus genomes have been submitted to the web-application implementation of pangolin, which has facilitated the SARS-CoV-2 genomic epidemiology and provided researchers with access to actionable information about the pandemic's transmission lineages.