p-IgGen: a paired antibody generative language model.
Turnbull OM., Oglic D., Croasdale-Wood R., Deane CM.
SUMMARY: A key challenge in antibody drug discovery is designing novel sequences that are free from developability issues-such as aggregation, polyspecificity, poor expression, or low solubility. Here, we present p-IgGen, a protein language model for paired heavy-light chain antibody generation. The model generates diverse, antibody-like sequences with pairing properties found in natural antibodies. We also create a finetuned version of p-IgGen that biases the model to generate antibodies with 3D biophysical properties that fall within distributions seen in clinical-stage therapeutic antibodies. AVAILABILITY AND IMPLEMENTATION: The model and inference code are freely available at www.github.com/oxpig/p-IgGen. Cleaned training data are deposited at doi.org/10.5281/zenodo.13880874.