Addressing the antibody germline bias and its effect on language models for improved antibody design.

Olsen TH.; Moal IH.; Deane CM.

Addressing the antibody germline bias and its effect on language models for improved antibody design.

Olsen TH., Moal IH., Deane CM.

MOTIVATION: The versatile binding properties of antibodies have made them an extremely important class of biotherapeutics. However, therapeutic antibody development is a complex, expensive, and time-consuming task, with the final antibody needing to not only have strong and specific binding but also be minimally impacted by developability issues. The success of transformer-based language models in protein sequence space and the availability of vast amounts of antibody sequences, has led to the development of many antibody-specific language models to help guide antibody design. Antibody diversity primarily arises from V(D)J recombination, mutations within the CDRs, and/or from a few nongermline mutations outside the CDRs. Consequently, a significant portion of the variable domain of all natural antibody sequences remains germline. This affects the pre-training of antibody-specific language models, where this facet of the sequence data introduces a prevailing bias toward germline residues. This poses a challenge, as mutations away from the germline are often vital for generating specific and potent binding to a target, meaning that language models need be able to suggest key mutations away from germline. RESULTS: In this study, we explore the implications of the germline bias, examining its impact on both general-protein and antibody-specific language models. We develop and train a series of new antibody-specific language models optimized for predicting nongermline residues. We then compare our final model, AbLang-2, with current models and show how it suggests a diverse set of valid mutations with high cumulative probability. AVAILABILITY AND IMPLEMENTATION: AbLang-2 is trained on both unpaired and paired data, and is freely available at https://github.com/oxpig/AbLang2.git.

Original publication

DOI

10.1093/bioinformatics/btae618

Type

Journal article

Journal

Bioinformatics

Publication Date

01/11/2024

Volume

Keywords

Antibodies, Humans, Amino Acid Sequence, Germ-Line Mutation, Computational Biology, Mutation

Cookies on this website

Addressing the antibody germline bias and its effect on language models for improved antibody design.

Olsen TH., Moal IH., Deane CM.

DOI

Type

Journal

Publication Date

Volume

Keywords