Exciting Functions through Newly Designed Proteins

Inquiry

Background

The flow of information in a protein is from sequence to structure to function, with each step driven by the previous one. Protein design is based on reversing this process: specifying a desired function, designing a structure to perform that function, and finding a sequence that can fold into that structure. This "central law" is the basis for almost all new protein design efforts. Our ability to accomplish these tasks relies on our understanding of protein folding and function, and our ability to capture this understanding in computational methods. In recent years, deep learning-derived methods have enabled us to move beyond the design of protein structures to the design of functional proteins in terms of efficient and accurate structural modeling and enrichment of successful designs.

Figure 1. Sparks of function by de novo protein design

The design of new proteins stems from a desire to reduce the complexity of protein folding to basic physical principles. The assumption was that with a full understanding of the rules controlling protein folding, it would be possible to create new proteins from scratch. Over time, this assumption has proven to be true. Traditionally, protein structures and their interactions with sequences have been energetically and biophysically understandable: what are the three-dimensional interactions carried out between amino acid residues? How do they stabilize a particular conformation of the protein chain or interact with ligands or substrates? Capturing the diverse behavior of proteins with a set of atomic-level physical equations is attractive, providing an interpretable view of the forces that maintain structure. Indeed, the earliest protein design methods successfully used this approach to define the structure of new proteins and to resample side chains for new sequences.

Figure 2. De novo protein design.

Deriving Structure From Function

Figure 3. Deriving Structure From Function

Designing functional proteins from scratch begins with identifying the features needed to accomplish the desired function. After defining a functional motif, designing a protein structure that satisfies its constraints is one of the most challenging aspects of protein design. Traditional backbone-based design approaches to designing protein structures provide the most interpretable way to model protein structures. For example, design incorporating key structural insights improves our ability to control the structure of β-barrel formation (Fig. 3a), which is important in enzyme and membrane protein applications. The application of deep learning has witnessed tremendous protein design changes as the ability to manipulate protein structure in response to functional constraints has changed dramatically. Using a new design approach similar to that based on the original energy landscape, learning and statistical potentials can replace physically-based potentials to guide structural searches, giving a similar ability to generate new structures and topologies as found in nature. The arrival of highly accurate protein structure prediction using the AlphaFold system, and the subsequent development of trRosetta and RoseTTAFold, opened up new ways to generate proteins.

De Novo Structure Design

Figure 4. De Novo Structure Design

In protein design, the rise of diffusion generative models (Figure 4) marks an important advance in that these models provide more stable training and better diversity than other types of generative models while maintaining high sample quality. These models start with white noise and denoise rough features first, then fill in the details, rather than attempting to synthesize the complete atomic structure all at once. This inductive bias, or learning architecture, matches well with the hierarchical nature of protein structure, decomposing the structure generation problem into first a high-level tertiary structure organization, then a local secondary structure, and finally chemical details.

Design Sequences to Specify Structure and Function

Figure 5. Design Sequences to Specify Structure and Function

With the advent of accurate structure prediction methods such as AlphaFold, it has become possible to compare the predicted folding of a design sequence with the original design structure. Relatively fast computations make it possible to predict the folding state of a design sequence as well as a confidence measure (e.g., pLDDT or pAE). One might expect a sequence that is predicted to fold back into the design structure with high confidence ("self-consistent" or "designable") to be more consistent with the design structure, and thus may be more likely to fold in the wet lab. Overall, these findings significantly improve the speed and efficiency of method development, as models and designed sequences can be more faithfully evaluated in the computer without the need for slower and more laborious wet-lab validation feedback (Figure 5).

Related Service We Offer

Protein Strcuture De Novo Design

Protein Sequence Design and Optimization

Protein 3D Structure Prediction

Protein Modification Site Prediction

Reference

Chu A E, Lu T, Huang P S. Sparks of function by de novo protein design. Nature Biotechnology, 2024, 42(2): 203-215.

For research use only. Not intended for any clinical use.

Online Inquiry