Estimating epidemic dynamics with genomic and time series data.
Zarebski AE., Zwaans A., Gutierrez B., du Plessis L., Pybus OG.
Accurately estimating the prevalence and transmissibility of an infectious disease is an important task in genetic infectious disease epidemiology. However, generating accurate estimates of these quantities, that make use of both epidemic time series and pathogen genome sequence data, is a challenging problem. Phylogenetic birth-death processes are a popular choice for modelling the transmission of infectious diseases, but it is difficult to estimate the prevalence of infection with them. Here, we extended our approximate likelihood approach, which combines phylogenetic information from sampled pathogen genomes and epidemiological information from a time series of case counts, to estimate historical prevalence in addition to the effective reproduction number. We implement this new method in a BEAST2 package called Timtam. In a simulation study our approximation is seen to be well-calibrated and recovers the parameters of simulated data. To demonstrate how Timtam can be applied to real datasets, we carried out empirical analyses of data from two infectious disease outbreaks: the outbreak of SARS-CoV-2 onboard the Diamond Princess cruise ship in early 2020 and poliomyelitis in Tajikistan in 2010. In both cases we recover estimates consistent with previous analyses.