Automated extraction of typing information for bacterial pathogens from whole genome sequence data: Neisseria meningitidis as an exemplar.
Jolley KA., Maiden MC.
Whole genome sequence (WGS) data are increasingly used to characterise bacterial pathogens. These data provide detailed information on the genotypes and likely phenotypes of aetiological agents, enabling the relationships of samples from potential disease outbreaks to be established precisely. However, the generation of increasing quantities of sequence data does not, in itself, resolve the problems that many microbiological typing methods have addressed over the last 100 years or so; indeed, providing large volumes of unstructured data can confuse rather than resolve these issues. Here we review the nascent field of storage of WGS data for clinical application and show how curated sequence-based typing schemes on websites have generated an infrastructure that can exploit WGS for bacterial typing efficiently. We review the tools that have been implemented within the PubMLST website to extract clinically useful, strain-characterisation information that can be provided to physicians and public health professionals in a timely, concise and understandable way. These data can be used to inform medical decisions such as how to treat a patient, whether to instigate public health action, and what action might be appropriate. The information is compatible both with previous sequence-based typing data and also with data obtained in the absence of WGS, providing a flexible infrastructure for WGS-based clinical microbiology.