Development of an integrated genome informatics, data management and workflow infrastructure: a toolbox for the study of complex disease genetics.
Burren OS., Healy BC., Lam AC., Schuilenburg H., Dolman GE., Everett VH., Laneri D., Nutland S., Rance HE., Payne F., Smyth D., Lowe C., Barratt BJ., Twells RCJ., Rainbow DB., Wicker LS., Todd JA., Walker NM., Smink LJ.
The genetic dissection of complex disease remains a significant challenge. Sample-tracking and the recording, processing and storage of high-throughput laboratory data with public domain data, require integration of databases, genome informatics and genetic analyses in an easily updated and scaleable format. To find genes involved in multifactorial diseases such as type 1 diabetes (T1D), chromosome regions are defined based on functional candidate gene content, linkage information from humans and animal model mapping information. For each region, genomic information is extracted from Ensembl, converted and loaded into ACeDB for manual gene annotation. Homology information is examined using ACeDB tools and the gene structure verified. Manually curated genes are extracted from ACeDB and read into the feature database, which holds relevant local genomic feature data and an audit trail of laboratory investigations. Public domain information, manually curated genes, polymorphisms, primers, linkage and association analyses, with links to our genotyping database, are shown in Gbrowse. This system scales to include genetic, statistical, quality control (QC) and biological data such as expression analyses of RNA or protein, all linked from a genomics integrative display. Our system is applicable to any genetic study of complex disease, of either large or small scale.