On-demand design, generation, and evolution of macromolecules for desired functions.

Engineering Biology

On-demand design, generation, and evolution of macromolecules for desired functions.

Current State-of-the-Art

Currently, the mapping of structure and function of a macromolecule from the primary sequence is the critical challenge towards achieving on-demand design, generation, and evolution. Computational DNA, RNA, and protein design has advanced to the point where defined structures, binding interactions, and enzymatic activity can be constructed, especially for proteins. Still, substantial improvements are needed in expanding: 1) the range and effectiveness of macromolecular functions that can be designed, and 2) success rate. De novo computational protein design, PDB-informed protein design strategies, origami-based nucleic acid structure design, physics-based design of RNA switches, machine-learning strategies that deduce molecular contacts for protein and nucleic acid folding from multiple sequence alignments, and hybrid approaches have enjoyed considerable success and hold significant promise.

Evolutionary or semi-rational approaches have advanced to the point where substantial improvements can be gained via a wealth of directed evolution, continuous evolution, and library based approaches, often coupled with computation and modelling, but only when suitable macromolecules (i.e., those that possess some function along the axis of the desired function) have been previously identified. However, creating effective and scalable diversification systems, effective selection and screening systems, reaching de novo evolution of function, and expanding the scope and throughput of selection/screening systems with the ability to directly select/screen for the exact function remain critical challenges.

Breakthrough Capabilities

De novo prediction of RNA structure, protein structure, and complexes of DNAs/RNAs and proteins (from primary sequence) and the ability to make accurate predictions of mutability and effect of mutations from structure.

Reliably predict (greater than a 50% success rate) the structure of 300-amino acid proteins and 200-nucleotide RNA domains within 5 Ångstroms from primary sequence.

Bottleneck/Challenge: Existing methods for both RNA and protein structure design rely heavily on macromolecules of known structure.

Potential Solution: Machine learning with coevolutionary models on large multiple sequence alignments of homologous RNAs and proteins to extract structure from sequence alone.

Potential Solution: Better understand structures in the sequence space of non-biological RNA and proteins such as de novo designed structures.

Bottleneck/Challenge: There are no methods capable of predicting RNA-protein complexes at even modest resolution from primary sequence.

Potential Solution: Use of high-throughput technologies for mapping RNA-protein crosslinks, nucleotide-resolution chemical mapping of RNA components, and rapid cryo-EM of RNA-protein complexes to guide and test computational modeling.

Improve force-field and backbone-sampling algorithms and include capabilities to capture force-fields of post-transcriptionally- and post-translationally-modified nucleosides and amino acids.

Bottleneck/Challenge: Conformational dynamics for protein design need to be improved (especially for hydrogen-bonding and electrostatic interactions between protein residues) to more accurately capture interactions responsible for protein structure, stability and function; similarly, algorithms that sample potential protein conformations required for function need to be sped up and improved.

Potential Solution: Gather large datasets of mutants or designed protein sequences and their experimentally characterized activity and use machine learning/data science techniques to develop improved molecular mechanics force-fields.

Potential Solution: Investigate alternative sampling algorithms for protein backbone by developing protein design software able to take advantage of commodities parallel computing architectures such as general purpose GPUs and cloud-based FPGAs.

Bottleneck/Challenge: Force fields for molecular dynamics simulations of RNAs and RNA-protein complexes need to be improved.

Potential Solution: Detailed biophysical characterization of synthetic model systems designed to push the force fields to their limits of predictability.

Potential Solution: Development of design methodologies that leverage angstrom-level RNA and RNA-protein complex simulations.

Bottleneck/Challenge: Even when a global conformational minimum is sampled, there is no guarantee that a force-field will correctly identify it as such because computational protein design algorithms sacrifice scoring accuracy for speed often by only considering pairwise interactions.

Potential Solution: Systematic errors caused by the assumption that pairwise interactions are sufficient to define protein folds must be identified, and score terms that can quickly approximate those errors should be implemented to improve accuracy.

Reliable de novo prediction (greater than a 50% success rate within 5 Ångstrom r.m.s.d.) of RNAs and proteins containing non-canonical structures (including irregular protein loops and RNA aptamers).

Bottleneck/Challenge: There exists a large variety in possible loop conformations, making them difficult to effectively sample. The problem is particularly important for RNA and RNA-protein complexes where functional tertiary folds are dictated by idiosyncratic structures.

Potential Solution: Utilize knowledge of RNA and protein sequence-structure relationships from the PDB to limit conformational search space; explore refinement strategies that couple this with physics-based score functions, molecular dynamics simulations, and employ new generation Monte Carlo sampling methods.¹

Potential Solution: Incorporation of failed designs, based on structural data, into models.

Routine redesign of ligand binding sites and/or aptamers for custom ligands with a greater than 50% success rate.

Bottleneck/Challenge: Motif-based approaches to ligand binding site design is limited by how much structural information of the target ligand is available.

Potential Solution: Fragmenting ligands allows for the formation of larger motif libraries, enabling the design of completely novel ligand binding sites.

Potential Solution: Extensions of new Monte Carlo methods² for atomically-accurate RNA and protein structure prediction to sample sequence and structure simultaneously.

Potential Solution: Computational medium-resolution pre-design of large 3D RNA and protein shapes that encapsulate all surfaces of target ligands, integrated with high-throughput combinatorial screening focused at macromolecule-ligand interfaces for which current computational approaches are not yet accurate.

Routine prediction of structures for 500-amino acid proteins and 200-nucleotide RNA domains within 3 Ångstrom.

Design proteins and RNAs that fold correctly 50% of the time and RNA-protein complexes that form correctly 20% of the time.

Modeling and design of chromatin states that can be manipulated to change function.

Routine prediction of structures for 3,000-amino acid proteins (such as PKSs, etc.), protein-protein and RNA-protein interactions, and protein and RNA-protein complexes (re-engineered ribosomes, spliceosomes, etc.).

Routine prediction of RNA and protein function from structure.

Bottleneck/Challenge: While it is possible to predict the function of unknown RNAs and proteins by homology to molecules with known function, ab initio functional prediction will be a major challenge.

Potential Solution: No clear path to a solution can be envisioned, but it will likely involve the ability to computationally model interaction networks between arbitrary sets of biomolecules simply based on their sequences.

Potential Solution: Achieving a better understanding of the full repertoire of possible protein and RNA functions is needed.

De novo design and/or prediction of macromolecular dynamics and dynamic macromolecular structures.

Improving computational models of RNA dynamics that can incorporate experimental data.

Bottleneck/Challenge: There is a lack of rigorous physical models that can incorporate experimental characterization of RNA structure and physicochemical data into models of RNA folding dynamics.

Potential Solution: Expansion of RNA folding dynamics physicochemical modeling toolsets to incorporate experimental data.

Potential Solution: Machine-learning based models of RNA structure based off large-scale experimental characterization datasets.

Incorporating co-transcriptional (for RNA) and co-translational (for protein) processes (and including cellular factors that participate in these processes) into design algorithms.

Bottleneck/Challenge: A lack of principles of co-transcriptional RNA folding and co-translational protein folding that can be incorporated into design algorithms.

Potential Solution: Approaches to use model systems along with a variety of techniques (high-throughput chemical biology, biophysical) to uncover the required principles.

Bottleneck/Challenge: No RNA or protein design algorithms incorporate co-transcriptional or co-translational folding dynamics into the design process.

Potential Solution: Incorporate the design principles learned from the study of these processes into these algorithms.

Potential Solution: Develop appropriately coarse-grained models that can efficiently simulate co-transcriptional and co-translational folding.

Design of intrinsic regulatory control into biomolecules (e.g., allostery).

Bottleneck/Challenge: Long-range interactions in proteins are difficult to capture in computational protocols because of the enormous amount of conformational sampling that would be required, propagation of error, and limitations in scorefunction accuracy over long ranges.

Potential Solution: A focus on short-range allosteric interactions may be necessary; statistical approaches to understanding long-range allosteric interactions will be useful for future regulatory design.

Bottleneck/Challenge: We lack a sufficient understanding of how ligand-, protein-, and RNA-RNA binding can dynamically alter RNA structure in either equilibrium or out-of-equilibrium RNA folding regimes.

Potential Solution: Development and validation of approaches that can map RNA-ligand and protein- and RNA-RNA interactions at atomic resolution and in high-throughput.

Potential Solution: Development and validation of approaches that can extract RNA folding sub-population information to uncover principles of ligand-, protein-, and RNA-RNA mediated conformational changes.

Bottleneck/Challenge: There are few RNA aptamers that can sense ligands with Kd’s that are sub-micromolar, required for many applications.

Potential Solution: Expansion into non-natural nucleic acid chemistries to expand the structure and chemical diversity of aptamers.

Design of dynamic and responsive protein-RNA nanomachines.

Bottleneck/Challenge: It is challenging to image the three-dimensional structure of self-assembled protein-RNA nanostructures within the cell.

Potential Solution: Application of high-resolution techniques (super-resolution imaging, cryo-EM) to model synthetic protein-RNA nanostructures.

Potential Solution: Use of high-throughput techniques (in-cell chemical probing) to validate proper cellular assembly.

Bottleneck/Challenge: Primitives for converting molecular binding (e.g. ligands, RNAs) into changes in a three-dimensional protein-RNA nanostructure are underdeveloped.

Potential Solution: Incorporate known natural motifs that allow ligand and RNA-mediated switching into protein-RNA nanostructures.

Bottleneck/Challenge: RNA-protein interactions required for nanomachines with most sophisticated functions (e.g., dynamic control over protein complexes, cell signaling pathways) are challenging to engineer.

Potential Solution: Harness improved tools for predicting RNA-protein interactions and integrate them into design of dynamic RNA nanomachines.

Routine design of large proteins, beta topologies, membrane proteins, and loops.

Bottleneck/Challenge: Although these challenges have recently been addressed, the computational methods are too nascent to ensure that successful design is achieved routinely.

Potential Solution: Continued exploration of these computational methods will begin to elucidate the potential for success and existing limitations.

Bottleneck/Challenge: Designing functional proteins requires successful prediction of not just the topology, but also the precise positioning of the elements within that topology.

Potential Solution: Explore new approaches to designing specific conformations within an existing topology that satisfy user-specified parameters, such as angles between secondary structure elements.

Routine design of protein complexes.

Bottleneck/Challenge: Predicting and modelling protein-protein interactions is difficult.

Potential Solution: Continued development of co-evolutionary models, physics models, and design platforms.

Bottleneck/Challenge: Stronger influence of environment: in contrast to the design of individual proteins, complexes require molecules to assemble in a sea of other molecules.

Potential Solution: Improved molecular dynamics simulations.

Routine design of enzymes with high activities (i.e., k_cat/K_M > 10⁵ 1/M*s).

Bottleneck/Challenge: Most powerful protein design platforms don’t address molecular dynamics well, and protein dynamics are fundamentally challenging to capture.

Potential Solution: Ability to at-will engineer enzyme specificity, including to understand what enzymes exist, understand principles behind what exists, and map domain and sequence/functions.

Potential Solution: Improve multi-state design algorithms which are aimed at designing proteins with multiple interchanging conformations.

Bottleneck/Challenge: Successful catalysis often requires considerations other than conformation and residue positioning, such as active site electrostatics.

Potential Solution: Explore the use of more accurate but computationally expensive simulations, such as quantum mechanical calculations, to determine the optimal electrostatic environment for a desired reaction; couple this knowledge with constraints on active site electrostatics during the design process. Alternatively, use knowledge from existing enzymes that catalyze similar reactions to guide these constraints.

Modeling and design of dynamic RNA nanomachines that can engage with and manipulate the chromatin states of living systems.

Modeling and design of dynamic DNA-RNA-protein condensates that can expand beyond the functionality of natural condensates.

High-throughput integrated computational, experimental, and evolutionary schemes for refinement of desired biomolecule functions including enzymatic activity and binding.

For related reading, please see Gene editing, Synthesis, and Assembly, which contains information regarding DNA diversification and library synthesis techniques that can be combined with in vivo diversification and assay/selection schemes described here.

Durable and high-mutation-rate in vivo continuous DNA mutagenesis and evolution systems in model organisms.

Durable and high-mutation-rate in vivo continuous DNA mutagenesis and evolution systems in non-model organisms.

Full control over all statistical properties of DNA diversification in vivo.

Direct sequencing of proteins and carbohydrates.

Bottleneck/Challenge: Current instrumentation tools and technologies.

Potential Solution: High-throughput mass spectrometry that unambiguously identifies protein variants or carbohydrate linkages in a complex mixture.

Bottleneck/Challenge: Limited techniques appropriate for direct sequencing.

Potential Solution: Massively parallel detection and sequencing of proteins and carbohydrates using principles from high-throughput DNA sequencing adapted to other molecules through, for example, labeled primary-sequence specific affinity reagents.

De novo DNA synthesis in vivo with single-cell sequence control.

Ability to select for any function, including those conferred by: 1) small molecules, lipids, or carbohydrates; and 2) proteins or nucleic acids, including biophysical properties or properties not easily tied to growth.

Bottleneck/Challenge: Technology to tie production of small molecule, lipids, or carbohydrates to a selection.

Potential Solution: Cell adhesion on a surface; glycoarrays, lectin arrays.

Potential Solution: Creation of biosensors in a cell to link product to cell death or sortable phenotype (e.g., fluorescence).

Potential Solution: Improve small molecule detection and collection via capillary electrophoresis.

Bottleneck/Challenge: Technology to select for any macromolecular function or property (e.g., fold, shape).

Potential Solution: Synthetic use of natural channels, transporters, quality control systems that naturally discriminate these properties in living systems.

Footnotes

Watkins, A. M., Geniesse, C., Kladwang, W., Zakrevsky, P., Jaeger, L., & Das, R. (2018). Blind prediction of noncanonical RNA structure at atomic accuracy. Science Advances, 4(5), eaar5316. View publication.
Watkins, A. M., Geniesse, C., Kladwang, W., Zakrevsky, P., Jaeger, L., & Das, R. (2018). Blind prediction of noncanonical RNA structure at atomic accuracy. Science Advances, 4(5), eaar5316. View publication.

Last updated: June 19, 2019 Back