Hannes Stark, Bowen Jing, Tomas Geffner, Jason Yim, Tommi Jaakkola, Arash Vahdat & Karsten Kreis
Modern generative models have shown impressive ability to hallucinate new protein structures (novel folds or designs not seen in nature). However, a major limitation has been the lack of control over the generated protein’s high-level structure or function. In image generation, users can guide generation through sketches, bounding boxes, or text prompts – by contrast, protein designers until now had little ability to specify where a model should place certain structural elements, or the overall shape of the protein. ProtComposer addresses this by introducing a compositional generation approach for proteins, where the user (or an algorithm) provides a rough 3D layout for the protein, and the model then fills in a protein structure that matches this outline. The layout is specified in terms of simple geometric primitives: 3D ellipsoids that encode the position, orientation, size, and even intended secondary structure of protein substructures. In essence, ProtComposer lets one sketch a protein in 3D using ellipsoidal blobs – each blob might represent, say, “an alpha-helix of ~10 residues oriented this way” or “a beta-sheet domain roughly here” – and then generates a full atomic protein backbone and sequence that realizes that sketch.
This innovation is significant in the context of protein design. Natural proteins are often modular, consisting of distinct domains or motifs that come together to form larger complexes. Designing such multi-domain proteins or enforcing specific spatial arrangements (for example, placing two functional domains at a certain distance) is very challenging for unconditional models – they might produce something plausible, but not necessarily the desired arrangement. ProtComposer’s ellipsoid-based conditioning provides a new level of controllability: one can dictate high-level features like overall shape (globular vs elongated), composition of secondary structures (helix vs sheet content), and relative domain positioning. This approach parallels ideas from other domains (the authors compare it to “bounding box” or “blob” based conditioning in image generation), transferring those concepts to protein structures.
The motivation also stems from an observation: prior protein generative models tended to produce relatively simple and often repetitive structures, such as helix bundles (lots of alpha-helices packed together). These are common because they are easy for models to learn and stable to form, but they are “conceptually simple” and less diverse than what nature shows. By forcing the model to condition on a variety of ellipsoid layouts – especially ones that include beta-sheets or mixed structure types – ProtComposer aims to break the bias towards overly helical proteins and expand the diversity of generated folds. Indeed, one of the paper’s key results is that by using random synthetic ellipsoid layouts as input, they obtain proteins with a helix fraction matching that of natural proteins, whereas earlier methods oversampled helices.
In summary, ProtComposer is motivated by the need for controllable and diverse protein structure generation. It provides a way to specify design intent (through coarse 3D descriptors) and then uses a generative model to produce a detailed protein backbone and amino acid sequence that fits that intent.
At its core, ProtComposer builds on a state-of-the-art generative model for proteins known as Multiflow, which is a joint sequence–structure continuous flow model (developed by some of the same authors) that can generate protein backbones and sequences simultaneously. Multiflow itself is a type of SE(3)-equivariant flow-matching model for proteins – it represents a protein as a set of residue “frames” in 3D (each residue has a position and orientation, and an amino acid type) and learns to gradually transform a simple initial distribution (random noise) into the distribution of real protein structures and sequences. It uses techniques like Invariant Point Attention (from AlphaFold) to handle 3D geometry and includes components to generate the discrete sequence along with continuous coordinates.
ProtComposer augments this generative framework with ellipsoid-based conditioning. An ellipsoid for the model is defined by parameters capturing its center (3D coordinates), its orientation (which can be encoded by principal axes or a rotation matrix), its radii (defining the ellipsoid shape – slender for a rod-like helix, flat for a sheet, etc.), and annotations like the number of residues it should contain and the dominant secondary structure type (α-helix, β-sheet, or coil). These ellipsoids are essentially high-level placeholders for chunks of protein structure. For example, a long thin ellipsoid annotated as “helix, 20 residues” suggests the model should place an alpha-helix of about 20 amino acids running roughly along that ellipsoid’s length. Multiple ellipsoids together might describe a desired topology: e.g., two ellipsoids side by side (one helical, one sheet-like) could indicate a two-domain protein with one helical domain and one β-sheet domain.
Incorporating ellipsoids into the generative model is done via a mechanism the authors call Invariant Cross Attention (ICA). In practice, they introduce a set of ellipsoid tokens alongside the protein’s own residue tokens in the model’s architecture. During generation, the model’s layers perform cross-attention between the evolving protein representation and the fixed ellipsoid representations in an SE(3)-invariant way. This means each residue can attend to the ellipsoids, taking into account the relative 3D positions between a residue and an ellipsoid (similar to how AlphaFold’s attention uses relative positional encoding of residues). The invariance ensures that if you rotate or translate the entire setup, the relationship doesn’t change – only relative geometry matters. Essentially, the model learns to “pull” the growing protein toward the ellipsoids: residues will organize and align to fill those ellipsoids because the attention provides a directional signal. The authors fine-tuned the pretrained Multiflow model with this new cross-attention mechanism, rather than training from scratch, which they note is akin to how image models are fine-tuned for conditioning with minimal perturbation of the original model’s behavior. If no ellipsoid is provided, the model should revert to the unconditional generator (thus they take care to ensure an empty ellipsoid set leaves the model unchanged, via how they initialize the conditional training).
Under the hood, the generative process is based on flow matching, which is an approach to generative modeling where one learns a continuous vector field that morphs a distribution of noise into the data distribution. ProtComposer’s flow operates over three coupled spaces: the 3D coordinates of residues, the orientations (rotations) of residue frames, and the discrete amino acid identities. The paper describes using separate flow components for handling translations, rotations, and discrete sequence respectively, which are iteratively applied. For example, they use a linear flow on $\mathbb{R}^3$ for coordinate shifts, a Riemannian flow on SO(3) for orientations, and a discrete flow matching method for the amino acid types. All these are conditioned on the ellipsoids via the invariant cross attention in the network that computes the instantaneous vector field. The training objective ensures that the presence of ellipsoid conditioning steers the flow to produce structures consistent with those ellipsoids.
To sample a new protein given a set of ellipsoids, ProtComposer starts from random noise (e.g., a random set of points and random sequence) and then integrates the learned vector field (essentially denoising) until a structured protein is obtained. The process can be guided with a strength parameter – they implement a form of classifier-free guidance that allows interpolating between unconditional generation and fully-conditioned generation. With guidance, one can trade off how strictly the model adheres to the ellipsoid layout versus how much it improvises. In experiments, they varied a guidance factor $\lambda$ to adjust this trade-off, observing that higher conditioning strength yields better alignment with ellipsoids at some cost to sample diversity (typical of guidance).
ProtComposer was evaluated on a variety of tasks to demonstrate: (1) its ability to faithfully adhere to specified layouts (control accuracy), (2) the increase in diversity and novelty of generated structures when using novel layouts, and (3) the generation of structurally complex compositions that prior models struggle with. Key findings include:
Strong Adherence to Layout Constraints: The authors define metrics to quantify how well the generated protein matches the input ellipsoids – for instance, how closely each ellipsoid’s position and shape are filled by the corresponding protein region, and whether the secondary structure content (helix vs sheet) in that region matches the annotation. ProtComposer achieves high consistency on these metrics. Even for layouts that are quite different from anything in the training data (e.g., novel arrangements of helices and sheets), the model can realize them with remarkable fidelity. In some examples, they show that if you specify alternating helix and sheet ellipsoids in a ring, the model will produce a protein that indeed has alternating helix and sheet segments arranged in a ring-like fashion – something an unconditional model would rarely do. Importantly, this consistency holds beyond the training distribution: they test some wild, hand-drawn ellipsoid configurations and find the model still places structured protein elements accordingly. This demonstrates that ProtComposer’s conditioning is robust and not simply memorizing training pairs.
Improved Diversity and Novelty of Generated Proteins: By sampling random ellipsoid layouts from a simple generative process (the authors built a heuristic model that creates random sets of ellipsoids with random sizes/positions, to serve as a source of many plausible but novel protein “outlines”), they can drive the generator to explore more of the protein structure space. The results show a significant increase in both novelty (measured by how different the generated folds are from any known protein, using metrics like TM-score or fragment frequencies) and diversity (variability among generated samples) when using these synthetic layouts. Essentially, the model is no longer confined to producing the few common motifs it might favor unconditionally; the layouts serve to pull it into new territory. There is a slight trade-off: some extremely novel or complex layouts may yield proteins that are a bit less designable, where designability refers to how likely the sequence is to actually fold into the structure (they often assess this by seeing if AlphaFold or similar can confidently recapitulate the structure from the sequence). The paper notes a cost to designability when pushing for more diversity. However, they illustrate that by tuning the guidance strength, one can navigate a Pareto frontier between novelty/diversity and designability. In other words, ProtComposer allows the user to decide how adventurous to be: with strong conditioning, you get very novel shapes at some risk of lower foldability; with weaker conditioning, you stay closer to known protein space but with higher confidence in realistic folding. This tunable balance is a valuable feature for practical design – early exploratory phases might prioritize novelty, while later stages tune for higher designability.
Generation of Compositional and Complex Structures: ProtComposer demonstrates the ability to generate proteins with multiple distinct substructures (e.g., a protein with a helical bundle on one end and a beta-sheet sandwich on the other, connected by a loop) – essentially chimeric architectures that combine different fold motifs. Previous generative models often default to a single motif repeated (like several helices or a simple up-down-beta sheet). The paper argues that high-desirability proteins often exhibit such modular complexity – akin to having “different spatial parts with different properties” working together – but models without spatial control rarely produce them because it’s a low-probability event to spontaneously form two distinct domains. By explicitly conditioning on a layout that encodes different parts, ProtComposer can routinely generate these multi-domain compositions. They even introduce a metric for “compositionality” which checks if a generated protein contains a mix of secondary structure elements in separate spatial regions (rather than all helices or all strands clumped). ProtComposer improves this compositionality metric, indicating it can make proteins that are not just novel in shape but also rich in structural heterogeneity, better mimicking natural proteins which often have complex domain organizations. In one example, the authors mention how an unconditional model might make a protein that is essentially a single large alpha-helix bundle (low compositional complexity), whereas ProtComposer could create one that has an $\alpha$-$\beta$ mixed domain connected to an all-$\alpha$ domain, etc., thereby achieving higher functional potential.
Editing and Re-mixing Existing Proteins: Another neat capability is using ellipsoids extracted from real proteins. One can take a known protein, abstract its shape into a few ellipsoids (perhaps one per domain), and then provide those to ProtComposer. The model can generate new protein structures that fit the same overall layout but with different connectivity or features. For instance, if a known protein has two domains connected in a particular orientation, ProtComposer can redesign the interface or the structure of each domain while keeping the overall arrangement. This is useful for protein redesign and scaffold hopping – you maintain the high-level architecture (maybe necessary for function) but explore novel ways to implement it. The paper demonstrates cases where the connectivity between substructures is altered (like loops re-routed) yet the ellipsoid placement is satisfied. Essentially, ProtComposer can recompose proteins, which is an interesting approach to invent variants of existing machines or create chimeras in a controlled way.
Given these results, ProtComposer was recognized as achieving state-of-the-art performance in the realm of controllable protein generation – it was an ICLR 2025 Oral paper (top ~2% of submissions). Its ability to span the space between trained data and novel configurations, while maintaining realism, sets a new benchmark. The code was made available, hinting that the community can start using this tool for real design tasks.
Despite its powerful capabilities, ProtComposer has several limitations and assumptions to be aware of:
Reliance on Input Specification: The quality of generated proteins is tied to the quality of the ellipsoid layout provided. Designing a good set of ellipsoids for a desired function can be non-trivial. In practice, a user might not know exactly what ellipsoid arrangement yields, say, a binding pocket or an enzyme active site – more intuitive or higher-level control (like “make a pocket here”) is still an open challenge. The ellipsoid abstraction, while flexible, assumes the user can break down the protein shape in terms of coarse volumes and secondary structure content. This might be easy for some objectives (like “two domains connected by a linker”), but harder for others. Future interfaces might integrate automatic suggestion of ellipsoid configurations given some constraints, or interactive design tools to help users iteratively refine the layout.
Physical Realism and Designability: While ProtComposer improves designability compared to unconstrained generative models by guiding structures, not every generated protein is guaranteed to fold or be biophysically sound. The flow-matching model ensures the outputs lie in the space of learned protein-like structures to a large extent, but pushing to very novel shapes can yield odd structures. The authors noted a trade-off where very novel layouts reduced the confidence that sequences would fold into them. This suggests that the model, like others, may sometimes produce “out-of-distribution” structures that a real polypeptide might struggle to realize. In future work, one could incorporate a folding verification step (e.g., running AlphaFold on generated sequences as a filter) or explicitly train the model with a constraint for foldability. Additionally, aspects like side-chain packing, solvation, or disulfide bonds are not explicitly handled by the model (it outputs just backbone and sequence). After generation, structures likely need relaxation or evaluation in a physics-based forcefield to ensure atoms can be placed without clashes. Integrating such considerations (perhaps via a scoring network or energy term during generation) could further improve the physical validity of designs.
Scope of Ellipsoid Abstraction: The ellipsoid approach currently covers spatial layout and secondary structure, but not all aspects of protein design. For example, if one wanted to enforce a specific sequence motif or a binding site geometry (beyond secondary structure), that’s not captured by ellipsoids. We might imagine extending the conditioning to include functional constraints: e.g., “this ellipsoid region should contain a specific pattern or be able to bind X”. That would require augmenting the model with additional conditioning modalities (like specifying certain residues or distances explicitly). ProtComposer’s framework is a step towards multi-modal control, and one can foresee combining it with other conditioning like motif scaffolding (as seen in other work) by placing an ellipsoid around a known active site configuration that must be included. Currently, it excels at structural layout control, which is a foundational piece, but further work is needed to incorporate biochemical/functional constraints for full-fledged protein design for function.
Computational Cost and Complexity: Flow matching models, especially with SE(3) equivariance and cross attention, are non-trivial in size and training complexity. ProtComposer’s training involved fine-tuning a large model with powerful compute (likely NVIDIA GPUs, given the authors) and the sampling, while faster than some older methods, is still more involved than a single forward pass (it requires integrating ODEs or simulating a continuous trajectory). The method is thus currently in the realm of research and not plug-and-play like some simpler models. Over time, with optimized implementations or distillation, this could improve. The authors did demonstrate the feasibility by producing many samples for analysis, but if a practitioner wanted to design thousands of proteins with it, they would need access to significant computing resources.
Despite these limitations, the potential of ProtComposer is vast. It essentially opens a new design paradigm: “sketch-to-protein” generation. This could revolutionize how protein engineers approach a problem – instead of tweaking sequences and hoping for structures, they can now think in shapes: draw the rough shape of the protein they need for a task, and let the model propose sequences that realize it. As the technology matures, we might see user-friendly tools where one drags ellipsoids in a 3D canvas and the AI outputs a protein model in seconds. This could accelerate the design of novel enzymes, protein cages, therapeutic proteins, and more, by allowing intuitive creativity combined with AI’s knowledge of protein physics.
ProtComposer’s introduction is part of a broader trend of bringing controllability and compositionality into generative models for biology. Early protein generation efforts were mostly unconditional – they generated random folds, which while novel, had limited use if you needed something specific. Recently, methods like diffusion
From a geometric deep learning perspective, ProtComposer showcases the power of combining equivariant networks with attention mechanisms for complex conditional generation. The use of Invariant Cross Attention (ICA) to tie together two sets of geometric objects (ellipsoids and residue frames) is a novel architectural contribution. This idea could be reused in other domains, such as conditioning drug molecule generation on a binding pocket surface (analogous problem in 3D). It also demonstrates the flexibility of the flow-matching framework for handling mixed continuous/discrete data (protein backbone coordinates + sequences) – a relatively cutting-edge approach compared to more common diffusion models. The success of ProtComposer thus contributes to the evolving toolkit of generative modeling: it indicates that flow models with proper conditioning can achieve results on par with or exceeding diffusion models on a challenging task, which might inspire more research into flow-based generative models (which have certain advantages in training stability and invertibility).
In conclusion, ProtComposer represents a significant leap in protein design capability. It empowers researchers to compose proteins in ways previously not possible, by marrying high-level human design intuition (shapes and domains) with low-level sequence generation by AI. Over time, such tools could lead to the creation of proteins with tailor-made architectures for tasks like multienzyme complexes (where you want enzymes held in specific relative orientations), novel vaccines (scaffolding multiple antigens in one protein), or biomaterials (protein cages or fibers of defined shape). The work underscores a key message: introducing the right level of compositional bias or control (in this case, ellipsoidal sketches) can dramatically enhance both the usability and performance of generative models in biology, paving the way for AI-assisted design of biomolecules with unprecedented complexity and precision.