Learning to utilize internal protein 3D nanoenvironment descriptors in predicting CRISPR-Cas9 off-target activity.
Mak JK., Bendandi A., Salim JA., Mazoni I., de Moraes FR., Borro L., Störtz F., Rocchia W., Neshich G., Minary P.
Despite advances in determining the factors influencing cleavage activity of a CRISPR-Cas9 single guide RNA (sgRNA) at an (off-)target DNA sequence, a comprehensive assessment of pertinent physico-chemical/structural descriptors is missing. In particular, studies have not yet directly exploited the information-rich internal protein 3D nanoenvironment of the sgRNA-(off-)target strand DNA pair, which we obtain by harvesting 634 980 residue-level features for CRISPR-Cas9 complexes. As a proof-of-concept study, we simulated the internal protein 3D nanoenvironment for all experimentally available single-base protospacer-adjacent motif-distal mutations for a given sgRNA-target strand pair. By determining the most relevant residue-level features for CRISPR-Cas9 off-target cleavage activity, we developed STING_CRISPR, a machine learning model delivering accurate predictive performance of off-target cleavage activity for the type of single-base mutations considered in this study. By interpreting STING_CRISPR, we identified four important Cas9 residue spatial hotspots and associated structural/physico-chemical descriptor classes influencing CRISPR-Cas9 (off-)target cleavage activity for the sgRNA-target strand pairs covered in this study.