ProteinMPNN

CD ComputaBio provides cutting-edge software-based virtual services to empower researchers, but we do not offer free software packages.

What is ProteinMPNN?

ProteinMPNN is a deep-learning model designed for protein sequence design given a backbone structure. It belongs to a class of "inverse folding" or "sequence design" tools.

Key Features

It takes a 3D backbone (atomic coordinates of the protein scaffold) and predicts amino acid sequences that are likely to fold into that structure.
It uses message‐passing neural network (MPNN) architecture to model interactions among residues.
It is much faster than traditional physics-based methods (e.g. Rosetta) for designing sequences, while often achieving higher accuracy in terms of sequence recovery on native backbones.
The tool has been validated experimentally across multiple systems: monomers, cyclic homo-oligomers, nanoparticles, binding proteins, etc.

How ProteinMPNN Works?

Step	Description
Backbone input	A protein backbone structure (from experiment or predicted).
Encoding	MPNN encodes information about backbone geometry and relations (distances/angles between residues).
Decoding / Sequence Prediction	The model outputs a sequence that is likely to fold into that backbone. Some residues may be fixed (e.g., active site or binding site residues) to preserve function.
Filtering / Scoring / Validation	Designed sequences are evaluated using structure prediction tools (AlphaFold2 etc.), metrics like folding confidence (e.g. pLDDT), RMSD, solubility etc. Then only top candidates are taken forward.

Pharmaceutical / Biotech Applications

ProteinMPNN has been applied in a variety of practical contexts, relevant to pharma.

Application	Examples / Benefits
Stability / Expression Optimization	Using ProteinMPNN to redesign native proteins (e.g. TEV protease, myoglobin) to improve thermal stability, expression yield, solubility, while retaining functional activity.
Rescuing Failed Designs	Designing that failed with older methods (Rosetta etc.) were "rescued" using ProteinMPNN; designs folding correctly and showing binding etc.
Nanoparticle / Multimeric Assembly Design	Designing two‐component protein nanomaterials (assemblies) more efficiently than Rosetta, with fewer computational resources, high success rates.
Functional Modulators / Binding Variants	E.g., redesigning ubiquitin‐variants (UbVs) to modulate activity of the Rsp5 E3 ligase (enhancing its activity) through designed binders/variants.
Peptide PROTAC Design	Designing binding peptides for AR and VHL as part of peptide PROTACs (targeted protein degradation agents) with downstream experimental validation.
Synthetic Binding Proteins	Expanding sequence spaces of synthetic binding proteins (SBPs) to improve solubility, stability, binding energy relative to traditional engineering.

Advantages

Much faster design cycles; can generate many candidate sequences quickly.
High sequence recovery and folding confidence, reducing wasted effort in screening non-folding or unstable proteins.
Good at preserving functional regions when needed (fixing active site residues).
Scalability: suitable for designing binders, stabilizing proteins, improving expression — many pharma R&D needs.

Limitations

Input structure quality matters: errors in backbone or missing segments can reduce design quality significantly.
Functional constraints (active/binding site behavior) may require fixing residues, which limits the redesign flexibility.
Predictive validation (structure confidence, sequence recovery) still needs experimental validation to confirm activity, binding, stability etc.
Not always optimal for large conformational changes or dynamic binding sites.

Related Services

Structure Modeling Service
Antibody-Antigen Interaction Modeling Service
Reverse Docking Service
Rigid Docking Service
Peptide Folding Simulation Service

* For Research Use Only.

Related Services