Engineering Biology
Data Science Goal:

Establish functional prediction through biological engineering design at the biomolecular, cellular, and consortium scale.

Current State-of-the-Art

ROSETTA, MOE, and NAMD are representative software platforms for biomolecular structure-based design and for the simulation of small molecules and peptides to proteins and larger systems. Google DeepMind’s recent success at CASP131AlQuraishi, M. (2019). AlphaFold at CASP13. Bioinformatics. View publication. demonstrated that machine-learning approaches are also increasingly effective for biomolecular structure prediction, and it is anticipated that design and simulation will increasingly integrate physics- and structure-based modeling with statistical comparative- and screening-based data. Existing software tools are largely sufficient to design protein libraries to experimentally explore molecular space, predict protein domains and other structural boundaries, and leverage comparative (meta)genomics to build deep sets of sequence orthologs for important protein classes and suggest tolerable/efficacious mutation locations. Current limitations of these software include dependencies upon imperfect force-fields, a lack of full quantitative and allosteric modeling and parallel computation, and insufficient design-of-experiments support and structural coverage for statistical analyses. While it seems likely that high-throughput screening combined with machine learning may provide a data-driven approach to identifying function from sequence without resorting to first principles or ground-up approaches, measuring molecular activity at scale remains a key bottleneck.

The design of organisms with a targeted metabolic function (e.g., overexpression of a single biomolecular species) requires computational tools that: 1) identify sets of proteins that can convert readily available molecules to high value products, each protein performing one of a series of chemical modifications; and 2) identify best sets of enzymes and their stoichiometry that can work together as parts of pathways in the context of cellular metabolism. On the pathway level, genome-scale metabolic models link genotype to phenotype through the reconstruction of the complete metabolic reaction network of an organism. This technique can be used to define theoretical production limits and design and test new microbial strains in silico. This approach has been especially effective for predicting and improving metabolite production rates in heterologous biosynthetic pathways. Flux Balance Analysis (FBA), Flux Variability Analysis (FVA), and minimization of metabolic adjustment (MOMA) have been successfully used, in combination with genome-scale metabolic models, to predict cell growth, flux distribution, product synthesis, and to guide host design. A MATLAB toolbox called COBRA2Heirendt, L., Arreckx, S., Pfau, T., Mendoza, S. N., Richelle, A., Heinken, A., … Fleming, R. M. T. (2019). Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0. Nature Protocols, 14(3), 639–702. View publication. (“COnstraint-Based Reconstruction and Analysis”) provides a convenient framework to simulate and analyze the phenotypic behavior of a genome-scale stoichiometric mode3Schellenberger, J., Lewis, N. E., & Palsson, B. Ø. (2011). Elimination of thermodynamically infeasible loops in steady-state metabolic models. Biophysical Journal, 100(3), 544–553. View publication., and retrobiosynthesis tools such as BNICE (“Biochemical Network Integrated Computational Explorer”) and RetroPath are used to design new or improved biochemical pathways.4Medema, M. H., van Raaphorst, R., Takano, E., & Breitling, R. (2012). Computational tools for the synthetic design of biochemical pathways. Nature Reviews. Microbiology, 10(3), 191–202. View publication. In these design tools, software identifies novel metabolites, reactions, and whole pathways by predicting promiscuity based on classification of enzymes according to their chemical action. On the cellular level, a wide variety of host design tools have been developed for identification of gene targets for knockout, overexpression, or downregulation, introduction of non-native enzymatic reactions, and elimination of competing pathways in order to improve the cellular phenotypes.5Long, M. R., Ong, W. K., & Reed, J. L. (2015). Computational methods in metabolic engineering for strain design. Current Opinion in Biotechnology, 34, 135–141. View publication. Pathway and host improvements achieved from these design tools are often non-intuitive and non-obvious. And, while genome-scale metabolic models have been important for metabolic engineering efforts with organic compounds, advances are still required to transform the bioeconomy.

When it comes to community and consortia design, we are primarily in a state of data gathering and developing a baseline understanding of microbial communities across diverse locations/ecosystems, thus tools for multi-scale modeling at multicellular, organismal, and population levels have yet to be developed.

Breakthrough Capabilities & Milestones

Fully-automated molecular design from integrated, large-scale design data frameworks.

Use of enzyme promiscuity prediction algorithms to design biosynthetic pathways for any molecule (natural or non-natural).

Scalable, data-driven host design for complex environments that enable high-level production of natural biomolecules.

Enabled design of functional, self-supporting ecosystems.

Footnotes

  1. AlQuraishi, M. (2019). AlphaFold at CASP13. BioinformaticsView publication.
  2. Heirendt, L., Arreckx, S., Pfau, T., Mendoza, S. N., Richelle, A., Heinken, A., … Fleming, R. M. T. (2019). Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0. Nature Protocols, 14(3), 639–702. View publication.
  3. Schellenberger, J., Lewis, N. E., & Palsson, B. Ø. (2011). Elimination of thermodynamically infeasible loops in steady-state metabolic models. Biophysical Journal, 100(3), 544–553. View publication.
  4. Medema, M. H., van Raaphorst, R., Takano, E., & Breitling, R. (2012). Computational tools for the synthetic design of biochemical pathways. Nature Reviews. Microbiology, 10(3), 191–202. View publication.
  5. Long, M. R., Ong, W. K., & Reed, J. L. (2015). Computational methods in metabolic engineering for strain design. Current Opinion in Biotechnology, 34, 135–141. View publication.
Last updated: June 19, 2019 Back