AI810 Blog Post (20255250) [ProtComposer - Compositional Protein Structure Generation with 3D Ellipsoids]

Hannes Stark, Bowen Jing, Tomas Geffner, Jason Yim, Tommi Jaakkola, Arash Vahdat & Karsten Kreis

Motivation and Idea

Modern generative models have shown impressive ability to hallucinate new protein structures (novel folds or designs not seen in nature). However, a major limitation has been the lack of control over the generated protein’s high-level structure or function. In image generation, users can guide generation through sketches, bounding boxes, or text prompts – by contrast, protein designers until now had little ability to specify where a model should place certain structural elements, or the overall shape of the protein. ProtComposer addresses this by introducing a compositional generation approach for proteins, where the user (or an algorithm) provides a rough 3D layout for the protein, and the model then fills in a protein structure that matches this outline. The layout is specified in terms of simple geometric primitives: 3D ellipsoids that encode the position, orientation, size, and even intended secondary structure of protein substructures. In essence, ProtComposer lets one sketch a protein in 3D using ellipsoidal blobs – each blob might represent, say, “an alpha-helix of ~10 residues oriented this way” or “a beta-sheet domain roughly here” – and then generates a full atomic protein backbone and sequence that realizes that sketch.

This innovation is significant in the context of protein design. Natural proteins are often modular, consisting of distinct domains or motifs that come together to form larger complexes. Designing such multi-domain proteins or enforcing specific spatial arrangements (for example, placing two functional domains at a certain distance) is very challenging for unconditional models – they might produce something plausible, but not necessarily the desired arrangement. ProtComposer’s ellipsoid-based conditioning provides a new level of controllability: one can dictate high-level features like overall shape (globular vs elongated), composition of secondary structures (helix vs sheet content), and relative domain positioning. This approach parallels ideas from other domains (the authors compare it to “bounding box” or “blob” based conditioning in image generation), transferring those concepts to protein structures.

The motivation also stems from an observation: prior protein generative models tended to produce relatively simple and often repetitive structures, such as helix bundles (lots of alpha-helices packed together). These are common because they are easy for models to learn and stable to form, but they are “conceptually simple” and less diverse than what nature shows. By forcing the model to condition on a variety of ellipsoid layouts – especially ones that include beta-sheets or mixed structure types – ProtComposer aims to break the bias towards overly helical proteins and expand the diversity of generated folds. Indeed, one of the paper’s key results is that by using random synthetic ellipsoid layouts as input, they obtain proteins with a helix fraction matching that of natural proteins, whereas earlier methods oversampled helices.

In summary, ProtComposer is motivated by the need for controllable and diverse protein structure generation. It provides a way to specify design intent (through coarse 3D descriptors) and then uses a generative model to produce a detailed protein backbone and amino acid sequence that fits that intent.


Method and Architecture

At its core, ProtComposer builds on a state-of-the-art generative model for proteins known as Multiflow, which is a joint sequence–structure continuous flow model (developed by some of the same authors) that can generate protein backbones and sequences simultaneously. Multiflow itself is a type of SE(3)-equivariant flow-matching model for proteins – it represents a protein as a set of residue “frames” in 3D (each residue has a position and orientation, and an amino acid type) and learns to gradually transform a simple initial distribution (random noise) into the distribution of real protein structures and sequences. It uses techniques like Invariant Point Attention (from AlphaFold) to handle 3D geometry and includes components to generate the discrete sequence along with continuous coordinates.

ProtComposer augments this generative framework with ellipsoid-based conditioning. An ellipsoid for the model is defined by parameters capturing its center (3D coordinates), its orientation (which can be encoded by principal axes or a rotation matrix), its radii (defining the ellipsoid shape – slender for a rod-like helix, flat for a sheet, etc.), and annotations like the number of residues it should contain and the dominant secondary structure type (α-helix, β-sheet, or coil). These ellipsoids are essentially high-level placeholders for chunks of protein structure. For example, a long thin ellipsoid annotated as “helix, 20 residues” suggests the model should place an alpha-helix of about 20 amino acids running roughly along that ellipsoid’s length. Multiple ellipsoids together might describe a desired topology: e.g., two ellipsoids side by side (one helical, one sheet-like) could indicate a two-domain protein with one helical domain and one β-sheet domain.

Incorporating ellipsoids into the generative model is done via a mechanism the authors call Invariant Cross Attention (ICA). In practice, they introduce a set of ellipsoid tokens alongside the protein’s own residue tokens in the model’s architecture. During generation, the model’s layers perform cross-attention between the evolving protein representation and the fixed ellipsoid representations in an SE(3)-invariant way. This means each residue can attend to the ellipsoids, taking into account the relative 3D positions between a residue and an ellipsoid (similar to how AlphaFold’s attention uses relative positional encoding of residues). The invariance ensures that if you rotate or translate the entire setup, the relationship doesn’t change – only relative geometry matters. Essentially, the model learns to “pull” the growing protein toward the ellipsoids: residues will organize and align to fill those ellipsoids because the attention provides a directional signal. The authors fine-tuned the pretrained Multiflow model with this new cross-attention mechanism, rather than training from scratch, which they note is akin to how image models are fine-tuned for conditioning with minimal perturbation of the original model’s behavior. If no ellipsoid is provided, the model should revert to the unconditional generator (thus they take care to ensure an empty ellipsoid set leaves the model unchanged, via how they initialize the conditional training).

Under the hood, the generative process is based on flow matching, which is an approach to generative modeling where one learns a continuous vector field that morphs a distribution of noise into the data distribution. ProtComposer’s flow operates over three coupled spaces: the 3D coordinates of residues, the orientations (rotations) of residue frames, and the discrete amino acid identities. The paper describes using separate flow components for handling translations, rotations, and discrete sequence respectively, which are iteratively applied. For example, they use a linear flow on $\mathbb{R}^3$ for coordinate shifts, a Riemannian flow on SO(3) for orientations, and a discrete flow matching method for the amino acid types. All these are conditioned on the ellipsoids via the invariant cross attention in the network that computes the instantaneous vector field. The training objective ensures that the presence of ellipsoid conditioning steers the flow to produce structures consistent with those ellipsoids.

To sample a new protein given a set of ellipsoids, ProtComposer starts from random noise (e.g., a random set of points and random sequence) and then integrates the learned vector field (essentially denoising) until a structured protein is obtained. The process can be guided with a strength parameter – they implement a form of classifier-free guidance that allows interpolating between unconditional generation and fully-conditioned generation. With guidance, one can trade off how strictly the model adheres to the ellipsoid layout versus how much it improvises. In experiments, they varied a guidance factor $\lambda$ to adjust this trade-off, observing that higher conditioning strength yields better alignment with ellipsoids at some cost to sample diversity (typical of guidance).


Capabilities and Experimental Results

ProtComposer was evaluated on a variety of tasks to demonstrate: (1) its ability to faithfully adhere to specified layouts (control accuracy), (2) the increase in diversity and novelty of generated structures when using novel layouts, and (3) the generation of structurally complex compositions that prior models struggle with. Key findings include:

Given these results, ProtComposer was recognized as achieving state-of-the-art performance in the realm of controllable protein generation – it was an ICLR 2025 Oral paper (top ~2% of submissions). Its ability to span the space between trained data and novel configurations, while maintaining realism, sets a new benchmark. The code was made available, hinting that the community can start using this tool for real design tasks.


Limitations and Future Direction

Despite its powerful capabilities, ProtComposer has several limitations and assumptions to be aware of:

Despite these limitations, the potential of ProtComposer is vast. It essentially opens a new design paradigm: “sketch-to-protein” generation. This could revolutionize how protein engineers approach a problem – instead of tweaking sequences and hoping for structures, they can now think in shapes: draw the rough shape of the protein they need for a task, and let the model propose sequences that realize it. As the technology matures, we might see user-friendly tools where one drags ellipsoids in a 3D canvas and the AI outputs a protein model in seconds. This could accelerate the design of novel enzymes, protein cages, therapeutic proteins, and more, by allowing intuitive creativity combined with AI’s knowledge of protein physics.


Conclusion and Impact in Structural Biology

ProtComposer’s introduction is part of a broader trend of bringing controllability and compositionality into generative models for biology. Early protein generation efforts were mostly unconditional – they generated random folds, which while novel, had limited use if you needed something specific. Recently, methods like diffusion and others allowed partial control (e.g., fixing part of a structure or guiding via a distance map). ProtComposer pushes this further by offering a more global yet intuitive control mechanism. It can be seen as analogous to how CAD software is used in mechanical design: one specifies a blueprint, and then the details are filled in. Here, the ellipsoids are the blueprint, and the AI fills in the molecular details.

From a geometric deep learning perspective, ProtComposer showcases the power of combining equivariant networks with attention mechanisms for complex conditional generation. The use of Invariant Cross Attention (ICA) to tie together two sets of geometric objects (ellipsoids and residue frames) is a novel architectural contribution. This idea could be reused in other domains, such as conditioning drug molecule generation on a binding pocket surface (analogous problem in 3D). It also demonstrates the flexibility of the flow-matching framework for handling mixed continuous/discrete data (protein backbone coordinates + sequences) – a relatively cutting-edge approach compared to more common diffusion models. The success of ProtComposer thus contributes to the evolving toolkit of generative modeling: it indicates that flow models with proper conditioning can achieve results on par with or exceeding diffusion models on a challenging task, which might inspire more research into flow-based generative models (which have certain advantages in training stability and invertibility).

In conclusion, ProtComposer represents a significant leap in protein design capability. It empowers researchers to compose proteins in ways previously not possible, by marrying high-level human design intuition (shapes and domains) with low-level sequence generation by AI. Over time, such tools could lead to the creation of proteins with tailor-made architectures for tasks like multienzyme complexes (where you want enzymes held in specific relative orientations), novel vaccines (scaffolding multiple antigens in one protein), or biomaterials (protein cages or fibers of defined shape). The work underscores a key message: introducing the right level of compositional bias or control (in this case, ellipsoidal sketches) can dramatically enhance both the usability and performance of generative models in biology, paving the way for AI-assisted design of biomolecules with unprecedented complexity and precision.