When using the CD ComputaBio computing platform, many users have questions about how to choose a protein crystal structure. This article will share some experience on this topic. Each standard has a scope of application. We only discuss here the principles and methods for selecting protein crystal structures for molecular docking.
In experiments, researchers usually use animal models (such as mice) to study human-derived proteins. There are many reasons for this, such as:
1) Unable to obtain (purify and isolate) human protein;
2) The function of protein needs to be investigated in vivo, but human clinical trials cannot be directly conducted;
3) It is more convenient and cheaper to use animal protein;
4) Other limiting factors.
The calculation simulation is much more convenient. If our real research object is the human body, human-derived protein should be used in general. However, if you need to guide the experiment or explain the experimental phenomenon based on the results of the docking calculation, or carry out subsequent experiments (such as site-directed mutation) to verify the calculation results, then in principle, the protein species used in the calculation should be consistent with the experiment, the amino acid sequence may not correspond. For example, input the gene name 1DH1 in the UniprotKB database(https://www.uniprot.org/) , and get the following results. Then, query the corresponding protein according to the species we determined.
Suppose we want to study human protein, then we can search for its Entry name (1DHC_HUMAN) in the RCSB Protein Data Bank database. On the other hand, the PDB database will also give the species information of each crystal structure.
To do any research, you should have a full understanding of the research object. The UniprotKB database integrates protein-related knowledge for us, and we can obtain important information through it. For example, understand what the function of the protein is, how long the sequence is, where is the binding site, and what protein structure is there.
For some proteins, there may be many crystal structures in the RCSB PDB database. In this case, you should choose a crystal structure that contains a complete pocket. For example, when we look for the protein of the 1DH1 gene, we find many crystal structures. Taking 4UMX and 4UMY as examples, if we look at the three-dimensional structure, we will find that 4UMY has more residues missing. The most important thing is that a large section of the residues that make up the pocket is missing, resulting in a change in the shape of the pocket (compare 4UMX). In contrast, 4UMX is more complete. Therefore, we should not choose 4UMY, but 4UMX as the candidate structure.
In many cases, the crystal structure of a protein is not only a protein, but also nucleic acids, peptides, coenzymes, small molecule compounds, etc.; besides the target protein, there may be other proteins. There are detailed records in the protein details page of the PDB database. We need to know what each component is, what is their role, and which is a eutectic ligand.
When there are multiple protein crystal structures to choose from, and many of them contain eutectic ligands, we can choose the one that is similar in structure to the compound to be docked. Therefore, choosing a crystal structure with a suitable pocket shape will facilitate the docking.
One of the quality indicators of protein crystal structure is resolution, which represents the degree of uncertainty in the position of atoms in the crystal structure model. When there are many crystal structures to choose from, we choose the one with high resolution, that is, with a small resolution value. Generally speaking, a resolution <2 Å is good enough.
In fact, how to choose the protein crystal structure, we need to make a comprehensive judgment and choose the crystal structure most suitable for the current research. Although the above content is for molecular docking calculations, it is also applicable to other calculations and simulations.