Current State-of-the-Art
Currently, the mapping of structure and function of a macromolecule from the primary sequence is the critical challenge towards achieving on-demand design, generation, and evolution. Computational DNA, RNA, and protein design has advanced to the point where defined structures, binding interactions, and enzymatic activity can be constructed, especially for proteins. Still, substantial improvements are needed in expanding: 1) the range and effectiveness of macromolecular functions that can be designed, and 2) success rate. De novo computational protein design, PDB-informed protein design strategies, origami-based nucleic acid structure design, physics-based design of RNA switches, machine-learning strategies that deduce molecular contacts for protein and nucleic acid folding from multiple sequence alignments, and hybrid approaches have enjoyed considerable success and hold significant promise.
Evolutionary or semi-rational approaches have advanced to the point where substantial improvements can be gained via a wealth of directed evolution, continuous evolution, and library based approaches, often coupled with computation and modelling, but only when suitable macromolecules (i.e., those that possess some function along the axis of the desired function) have been previously identified. However, creating effective and scalable diversification systems, effective selection and screening systems, reaching de novo evolution of function, and expanding the scope and throughput of selection/screening systems with the ability to directly select/screen for the exact function remain critical challenges.
Breakthrough Capabilities
De novo prediction of RNA structure, protein structure, and complexes of DNAs/RNAs and proteins (from primary sequence) and the ability to make accurate predictions of mutability and effect of mutations from structure.
Reliably predict (greater than a 50% success rate) the structure of 300-amino acid proteins and 200-nucleotide RNA domains within 5 Ångstroms from primary sequence.
Improve force-field and backbone-sampling algorithms and include capabilities to capture force-fields of post-transcriptionally- and post-translationally-modified nucleosides and amino acids.
Reliable de novo prediction (greater than a 50% success rate within 5 Ångstrom r.m.s.d.) of RNAs and proteins containing non-canonical structures (including irregular protein loops and RNA aptamers).
Routine redesign of ligand binding sites and/or aptamers for custom ligands with a greater than 50% success rate.
Routine prediction of structures for 500-amino acid proteins and 200-nucleotide RNA domains within 3 Ångstrom.
Design proteins and RNAs that fold correctly 50% of the time and RNA-protein complexes that form correctly 20% of the time.
Modeling and design of chromatin states that can be manipulated to change function.
Routine prediction of structures for 3,000-amino acid proteins (such as PKSs, etc.), protein-protein and RNA-protein interactions, and protein and RNA-protein complexes (re-engineered ribosomes, spliceosomes, etc.).
Routine prediction of RNA and protein function from structure.
De novo design and/or prediction of macromolecular dynamics and dynamic macromolecular structures.
Improving computational models of RNA dynamics that can incorporate experimental data.
Incorporating co-transcriptional (for RNA) and co-translational (for protein) processes (and including cellular factors that participate in these processes) into design algorithms.
Design of intrinsic regulatory control into biomolecules (e.g., allostery).
Design of dynamic and responsive protein-RNA nanomachines.
Routine design of large proteins, beta topologies, membrane proteins, and loops.
Routine design of protein complexes.
Routine design of enzymes with high activities (i.e., kcat/KM > 105 1/M*s).
Modeling and design of dynamic RNA nanomachines that can engage with and manipulate the chromatin states of living systems.
Modeling and design of dynamic DNA-RNA-protein condensates that can expand beyond the functionality of natural condensates.
High-throughput integrated computational, experimental, and evolutionary schemes for refinement of desired biomolecule functions including enzymatic activity and binding.
For related reading, please see Gene editing, Synthesis, and Assembly, which contains information regarding DNA diversification and library synthesis techniques that can be combined with in vivo diversification and assay/selection schemes described here.
Durable and high-mutation-rate in vivo continuous DNA mutagenesis and evolution systems in model organisms.
Durable and high-mutation-rate in vivo continuous DNA mutagenesis and evolution systems in non-model organisms.
Full control over all statistical properties of DNA diversification in vivo.
Direct sequencing of proteins and carbohydrates.
De novo DNA synthesis in vivo with single-cell sequence control.
Ability to select for any function, including those conferred by: 1) small molecules, lipids, or carbohydrates; and 2) proteins or nucleic acids, including biophysical properties or properties not easily tied to growth.
Footnotes
- Watkins, A. M., Geniesse, C., Kladwang, W., Zakrevsky, P., Jaeger, L., & Das, R. (2018). Blind prediction of noncanonical RNA structure at atomic accuracy. Science Advances, 4(5), eaar5316. View publication.
- Watkins, A. M., Geniesse, C., Kladwang, W., Zakrevsky, P., Jaeger, L., & Das, R. (2018). Blind prediction of noncanonical RNA structure at atomic accuracy. Science Advances, 4(5), eaar5316. View publication.