Integrating standardized whole genome sequence analysis with a global Mycobacterium tuberculosis antibiotic resistance knowledgebase.
Ezewudo M., Borens A., Chiner-Oms Á., Miotto P., Chindelevitch L., Starks AM., Hanna D., Liwski R., Zignol M., Gilpin C., Niemann S., Kohl TA., Warren RM., Crook D., Gagneux S., Hoffner S., Rodrigues C., Comas I., Engelthaler DM., Alland D., Rigouts L., Lange C., Dheda K., Hasan R., McNerney R., Cirillo DM., Schito M., Rodwell TC., Posey J.
Drug-resistant tuberculosis poses a persistent public health threat. The ReSeqTB platform is a collaborative, curated knowledgebase, designed to standardize and aggregate global Mycobacterium tuberculosis complex (MTBC) variant data from whole genome sequencing (WGS) with phenotypic drug susceptibility testing (DST) and clinical data. We developed a unified analysis variant pipeline (UVP) ( https://github.com/CPTR-ReSeqTB/UVP ) to identify variants and assign lineage from MTBC sequence data. Stringent thresholds and quality control measures were incorporated in this open source tool. The pipeline was validated using a well-characterized dataset of 90 diverse MTBC isolates with conventional DST and DNA Sanger sequencing data. The UVP exhibited 98.9% agreement with the variants identified using Sanger sequencing and was 100% concordant with conventional methods of assigning lineage. We analyzed 4636 publicly available MTBC isolates in the ReSeqTB platform representing all seven major MTBC lineages. The variants detected have an above 94% accuracy of predicting drug based on the accompanying DST results in the platform. The aggregation of variants over time in the platform will establish confidence-graded mutations statistically associated with phenotypic drug resistance. These tools serve as critical reference standards for future molecular diagnostic assay developers, researchers, public health agencies and clinicians working towards the control of drug-resistant tuberculosis.