There are several approaches to peptide structure prediction, including ab initio prediction, homology modelling, molecular dynamics (MD) simulation, and deep learning-based methods.PEP-FOLD3 is a peptide-specific ab initio folding method that can be used to simulate peptides between 5 and 50 amino acids.APPTEST is a peptide-specific protocol that combines a neural network architecture and MD to simulate peptides between 5 and 40 peptides between 5 and 40 amino acids.AF2 is a deep learning-based protein prediction method that uses multiple sequence alignments (MSAs) to predict protein structures based on co-evolved residues. RoseTTAFold works through a similar logic but a different deep learning architecture.Omega-Fold is a deep learning-based method that uses only sequences and not MSAs to make predictions based on a natural language modelling predictions. Considering that most natural peptides may only have sequence information available, sequence-based methods have advantages over structure-based methods. Understanding the weaknesses of current peptide prediction methods will guide the development of future methods.
The authors selected 588 peptides that had, as determined by NMR structural experiments, well-defined secondary structure elements and disordered regions. These peptides were grouped into the following benchmark sets: α-helical membrane-associated peptides (AH MP), α-helical soluble peptides (AH SL), mixed-secondary-structure membrane-associated peptides (MIX MP), mixed-secondary-structure soluble peptides (MIX SL), β-hairpin peptides (BHPIN), and disulfide bond-rich peptides (DSRP). For each peptide, the NMR structure as a whole was compared pairwise with all five AF2 structures, and all paired Ca RMSD distributions were plotted to identify outliers and examine poorly predicted structures (Figure. 1B).
Figure 1. Workflow for benchmarking AlphaFold2 on peptide structure prediction.
Alpha-helical membrane-associated peptides were predicted with considerable accuracy and with very few outliers. These peptides are defined as polyamides that fold into predominantly α-helical structures in the presence of a membrane environment. This group includes peptides such as transmembrane helices, amphiphilic helices, structures with helix-turn-helix motifs, and monomeric helices that partially span the membrane. The histogram of the normalised Ca RMSD showed a single-peaked Gaussian distribution with a mean value of 0.098 angstroms per residue (Fig. 2A). The authors examined individual outliers based on the number of standard deviation (s) above the mean for structural shortcomings in AF2 predictions. In some cases, AF2 failed to predict the helical ends and helix-turn-helix of α-helical peptides (Fig. 2B).
Figure 2. α-helical peptides predictions perform better for membrane associated peptides.
Mixed secondary structure soluble peptides showed moderate accuracy. The group of mixed secondary structure soluble peptides was defined as peptides with the same secondary structure properties as their membrane counterparts, but the structures of these peptides were not recognised in the membrane environment. The normalised Ca RMSD histogram showed a moderately multi-peaked Gaussian distribution with peaks located at 1σ, 2σ and 3σ above the mean value, which was 0.107 Å per residue (Fig. 3C). The outliers indicate that AF2 fails to predict the orientation of the secondary structure to the structureless boundary (Fig. 3D). For example, although the NMR model consists of well-defined compact structures throughout the ensemble, AF2 predicts 2BBL as a completely structureless peptide (Figure 3D).
Figure 3. AlphaFold2 poorly predicts mix membrane peptides and mix soluble peptides.
The structure of disulfide bond-rich peptides is predicted with high accuracy, but with variability in disulfide bond patterns. For the purposes of this work, a disulfide bond-rich peptide (DSRP) was defined as any peptide with two or more disulfide bonds.The largest group in the DSRP benchmark set contained a total of 266 peptides.The DSRP showed a tight, slightly bimodal Gaussian histogram, with peaks two standard deviations above the mean at 0.068 angstroms per residue (Figure 4C). Outliers failed to predict the correct disulfide bonding pattern. 3BBG correctly predicted one and placed most of the remaining cysteine residues in close proximity, 2MSF misplaced two disulfide bonds and did not predict the bonding of the other, and 7L7A failed to predict any disulfide bonding (Fig. 4D).
Figure 4. AlphaFold2 predicts peptide structures better than alternative computational methods PEPFOLD-3 (PF), OmegaFold (OF), RoseTTAFold (RF), and APPTEST (AT).
The authors sought to understand whether AF2 has an advantage over other deep learning and ab initio protein/peptide prediction methods in predicting the experimental structure of peptides. The authors generated predictions for all 588 peptides in the benchmark set using PEPFOLD3, Omega-Fold, RoseTTAFold, and APPTEST.PEPFOLD3 and APPTEST were designed for peptide structure prediction, while Omega-Fold and RoseTTAFold were designed for general protein structure prediction. AF2 performed better than all other peptidome alternatives in predicting peptide structure as measured by length-normalised Ca RMSD. Interestingly, AF2 exceeded APPTEST only on mixed secondary structure soluble peptides (Figure 4E). Finally, AF2 outperformed PEPFOLD3, RoseTTAFold and APPTEST, but performed as well as Omega-Fold on mixed secondary structure soluble peptides (Figure 4F).
Reference