Scalable pathogen pipeline platform (SP<inf>3</inf>): Enabling unified genomic data analysis with elastic cloud computing
Yang-Turner F., Volk D., Fowler PW., Swann J., Bull M., Hoosdally S., Connor T., Peto T., Crook D.
© 2019 IEEE. Pathogen genomic data analysis can be extremely bespoke and diverse. This paper presents our plan and progress towards creating a Scalable Pathogen Pipeline Platform (SP3) providing an efficient and unified process of collecting, analysing and comparing genomic data analysis with the benefit of elastic cloud computing. SP3 enables container-centric bioinformatic workflows run on personal computers, High-performance computing (HPC) clusters and cloud platforms. We have deployed and tested SP3 on local HPC, Google Cloud Platform (GCP), Microsoft Azure and OpenStack Platforms. SP3 allows users to fetch genomic sequencing data from European Nucleotide Archive (ENA) and conduct analysis with open-source bioinformatic pipelines. We believe SP3 will promote common standards around pathogen genomic data quality, data processing and data analysis, helping answer the challenges of tools divergence and leveraging a pool of public genomic data repository and cloud resources.