Why is this resource needed?

Data analyses and artificial intelligence applications, such as supervised learning and machine learning, need clean, consistent and well-labelled datasets.

Since the earliest structures of MHC molecules were published, now over 30 years ago, there have been changes in nomenclature which make it hard to create groups of structures from search queries.

In addition, within the structures, there are a variety of strategies used for assigning individual chains and numbering individual residues.

There are also now structures from a variety of species of MHC molecules which contain insertions and deletions when compared to the human and mouse structures which predominate and are used as a canonical numbering in this resource.

All of these things make it hard to create truly consistent multi-allele, multi-species datasets. For visual analyses the structures are in a variety of orientations, so a consistent method of alignment is needed.

This work is licensed under a Creative Commons Attribution 4.0 International License.