Alpha This is a work in progress and may change. Your feedback is very welcome.
  


POSTER

Using automated clustering of the geometry of bulk MHC class I crystal structures to probe the plasticity of the antigen binding cleft and the flexibility of peptide conformers

Christopher J. Thorpe - Currently without organisational affiliation, returning from a career break.

British Society for Immunology 2021 Poster [P-054]

High resolution poster download

Introduction

MHC Class I molecules bind a variety of peptides, conforming to a motif, in an antigen binding cleft and present them to T cells. The three-dimensional fold of these molecules is highly conserved across alleles and species, yet variability of amino acids and the underlying plasticity of the binding groove is the determinant of a wide range of peptide motifs.

This variability and plasticity make these molecules challenging for high fidelity 3D structure prediction. Even recent exciting developments, AlphaFold2 and RoseTTAFold, struggle to define accurate atomic-scale models of MHC class I:peptide complexes. This is largely due to insufficient fidelity in template selection/alteration, not taking into account the concerted side-chain movements within the cleft that rearrange to adapt to bound peptide and allelic variations.

Aims

To understand the variability in the cleft and bound peptide, bulk analysis of known 3D structures of MHC class I:peptide complexes has been performed. Dictionaries of side-chain angles for the cleft and peptide backbone have been created for over 700 MHC class I structures. These dictionaries have then been automatically clustered and analyzed. Further analysis of these clusters, their intersections and their relationship to each other and the underlying sequences of the MHC:peptide complexes will hopefully lead to better fragment selection for homology modelling and a set of restraints for molecular simulations.

Results

Dissimilar peptides, bound to dissimilar alleles take similar paths through the peptide binding groove

Clusters were formed from a collection of Class I nonamer peptides using the DBSCAN algorithm (see Google Colab notebook) with the backbone phi/psi angles of P4, P5, P6, P7 and P8 being used as input. Picking the optimal value of epsilon yielded a set of 5 clusters and 27 outliers. Reducing the number of angles to those with greatest variability as input data for clustering reduced the number of outliers. A similar analysis was performed for octamers with less variablility shown.

In the 5 clusters, three of them have the P5 position pointing downwards into the floor of the cleft or towards the alpha1 helix of the Class I molecule using the C-pocket as the primary or secondary anchor. 2 of the clusters display similar behaviour, but with the P6 residue mostly sequestered in the C-pocket.

Each cluster has a distinctive phi/psi profile, displayed as a plot and heatmap in the clusters below. Many alleles have promiscuous modes of binding, yet some are more restricted. It is not yet known whether this is due to significantly more structures being available for some alleles e.g. HLA-A*02:01. A homology modelling and molecular dynamics simulations will be performed to look at the viability of modes of binding for alleles with a small number of structures.

Discrete clusters can be formed using sidechain chi1/chi2 angles of specific residues lining the antigen binding cleft

An analysis was performed of all amino acid residues in the MHC Class I molecule which contact the peptide. The most common of these amino acid positions were collated and a dataset of the chi1/chi2 angles of these positions created. This dataset was clustered using the Birch algorithm (see Google Colab notebook)and a set of 20 clusters was selected as the optimum starting place to analyse the dataset.

The dataset is significantly more complex to analyse than the clusters of the peptides and further work will be needed to gain more understanding of the clusters. One thing which is emerging on initial analysis is that concommitant rearangements that form part of the plasticity of the groove are visible between clusters. For example the rearrangment of the F-pocket which sequesters the last amino acid of the peptide are clearly distinguishable in individual clusters, as is the rearrangement of the groove to form a distinct ridge in H2-Db.

A sample set of clusters and rationale for the differences between complexes of different clusters is displayed.

Further work

Some of the peptide clusters are well formed but have anchor sidechains directed at different regions of the binding groove. This suggests that although main chain torsion angles are in themselves useful as a mechanism of clustering, they are not entirely sufficient. Including a measure of the direction of the sidechain should yield sub-clusters or improved clusters.

More analysis is needed to optimise the antigen binding cleft clustering in order to discern “critical changes” that result in the reorganisation of the peptide binding cleft and the selection of a specific peptide conformation. The residues selected for clustering need refining to more tightly match the residues on the side walls and floor of the cleft. Clustering of individual features (such as pockets) will be investigated.

Investigations will be performed on the intersections of the peptide and cleft clusters to understand the effect of the peptide on concommitant rearrangements within the binding groove and the groove architecture on peptide conformation.

The effect of amino acid substitutions will be investigated at through contact maps and side chain torsion angles in closely related MHC Class I alleles and peptides so that predictions can be made for which clusters are likely to represent the structures of as yet unsolved alleles and peptides.

Discussions, suggestions and collaborations would be welcomed.

Methods

PDB format structures were obtained from the RCSB. Structures were categorized using a mix automated tools and manual curation for outliers. One molecule per structure was used, unless significant changes in peptide backbone conformation were present.

Single molecules were aligned three-dimensionally to a canonical structure (3HLA) with BioPython. Side chain angles within the binding cleft and peptide backbone phi/psi torsion angles were measured with BioPython. Angles were rounded to the nearest integer and negative angles converted into their positive equivalent.

Test clusters were generated using a range of target cluster sizes and a variety of different clustering methods to identify optimal algorithms. KMeans, Birch, Mini Batch Kmeans, Agglomerative clustering, DBSCAN and Bayesian Gaussian Mixture were tested.

Clustering was optimised by selecting the parameters which performed best with both the Silhouette score and the Calinski-Harabasz score. Further work is needed to test functions which relate to optimisation of structural features, in addition to statistical optimisation.

For clustering of clefts only side chain angles only chi1 and chi2 angles were used. Where angles were missing, such as with Ala/Gly residues, side chain angles were replaced with pseudorandom numbers in the range 0-359 to remove any influence of the underlying amino acid residue at that position. Further experiments will be performed using the mode for chi1/chi2 angles.

Notebooks

Software used

Abstract

MHC Class I molecules bind a variety of peptides, conforming to a motif, in an antigen binding cleft and present them to T cells. The three-dimensional fold of these molecules is highly conserved across alleles and species, yet variability of amino acids and the underlying plasticity of the binding groove is the determinant of a wide range of peptide motifs.

This variability and plasticity make these molecules challenging for high fidelity 3D structure prediction. Even recent exciting developments, AlphaFold2 and RoseTTAFold, struggle to define accurate atomic-scale models of MHC class I:peptide complexes. This is largely due to insufficient fidelity in template selection/alteration, not taking into account the concerted side-chain movements within the cleft that rearrange to adapt to bound peptide and allelic variations.

To understand the variability in the cleft and bound peptide, bulk analysis of known 3D structures of MHC class I:peptide complexes has been performed. Dictionaries of side-chain angles for the cleft and peptide backbone have been created for over 700 MHC class I structures. These dictionaries have then been automatically clustered and analyzed.

The clustering of peptide backbone angles demonstrates that for octamers/nonamers there are a limited number of "routes" for the peptide through the cleft. The degrees of freedom in the peptide backbone is limited by the steric, hydrophobic, and electrostatic shape of the cleft. These routes are shared by many disparate alleles.

Clustering of the side chain geometry within the cleft, without knowledge of sequence, results in clusters that relate to existing MHC class I subgroups, even when noise is introduced for missing angles from Glycine or Alanine amino acids.

These clusters, and the interplay between them, can help in template selection and alteration and can be used as restraints and local energy minima to help refine predicted models and explore the effects of substitutions.