Protein Loop Conformation Prediction

1. Calculation principle

There are two closed-loop methods for modeling and prediction of Loop in Rosetta, namely CCD and KIC. In 2009, the KIC closed-loop algorithm was first developed and inspired by the principles of robot design, and gained new applications in the field of protein design. KIC has greatly improved the accuracy of Rosetta Loop conformation prediction. Compared with the previous CCD algorithm, it can collect a large number of Loop conformations similar to crystals (rmsd_Ca <1.0 Å). And in the follow-up development, two mature methods of Loop conformation prediction, NGK and FragmentKIC, are derived.

The basic KIC closed-loop algorithm is not complicated, and KIC can be applied in both low-resolution and high-resolution stages:

  1. First, set three nodal amino acids in a loop conformation, and then "cut" the loop from the cutpoint in the middle.
  2. Then randomly disturb the dihedral angle, bond angle, and bond length of all non-node amino acids to generate a new bone structure image
  3. Finally, by calculating and adjusting the dihedral angles of the three-node amino acids, the previous "cut point" is closed.

Loop reconstruction with KIC.Figure 1. Loop reconstruction with KIC. (Mandell D J, et al. 2009)

In order to further overcome the problem of KIC's low sampling rate in Loop conformation prediction, Amelie Stein proposed NGK (Next-generation kinematic loop modeling) in 2013 to further enhance Loop skeleton sampling based on KIC.

Next-generation kinematic loop modeling

NGK contains 5 sampling strategies, which are: Taboo, Omega, Rama2b, Ramp repulsive, and Ramp rama sampling methods. among them:

  • The Rama2b sampling strategy takes into account the hidden influence of the steric interaction of amino acid side chains on the skeleton. By statistically calculating the distribution probability of neighbor amino acid types and their skeleton dihedral angles in the database, the collection of non-node amino acid skeleton dihedral angles is well improved. Efficiency makes the Loop bone structure closer to the natural state.
  • The Omega sampling strategy independently samples the dihedral angle (179.1° +- 6.3°) of the peptide bond of each amino acid, so that it does not deviate from the real situation.
  • Ramp repulsive and Ramp rama sampling consider the rama term and rep term in the scoring function in the high-resolution stage of simulated annealing, making the energy surface in the all-atom state smoother, making it easier for sampling to jump out of the local energy minimum and traverse more areas.
  • The Taboo sampling strategy is only used at low resolutions. Using TabooMap allows the dihedral angles of non-node amino acids to better traverse all possibilities, increasing the conformational diversity at low resolutions, but it may also lead to slower convergence.

When the above 4 or 5 sampling strategies are used in conjunction with the KIC algorithm, a wonderful "chemical reaction" is generated. The NGK algorithm significantly improves the ratio of the collected high-precision Loop conformations in the overall prediction.

2. NGK's tutorial on predicting Loop conformation

2.1 Prepare Loopfile

The loopfile file uses the Pose serial number to define the loop interval. Here we take the PBX1 protein as an example (PDBID: 1B72, pdb needs to be renumbered), we want to reconstruct the loop that predicts amino acids 12-20. The Loopfile is written as follows:
# 1b72 loopfile
12 20 0 0 1

The first and second columns are the starting and ending pose numbers, and the third column is cutpoint, which defines the pose number of the amino acid at the cut point. It can be set according to the actual situation.

The fourth column: set skip rate, the default is 0.

The fifth column: 1=disrupt the existing conformation, 0=maintain the existing conformation, select 1 when you need to predict the loop, and select 0 when optimizing the loop concept.

2.2 Run NGK forecast

2.3 Result analysis

After running, open the score.sc file to check the chainbreak value (the smaller the better, normally it should be less than 1.0) and the total energy value (the lower the better). You can also analyze the packstat value to see if the loop interacts well with the overall environment. If the sampling is large enough, the first 400 conformations can be selected for cluster analysis and representative conformations can be selected as the final model.

Reference:

  1. Mandell D J, Coutsias E A, Kortemme T. Sub-angstrom accuracy in protein loop reconstruction by robotics-inspired conformational sampling. Nature Methods, 2009, 6(8):551-552.
* For Research Use Only.
Inquiry