Engineering Biology

Engineering DNA

Gene Editing, Synthesis, and Assembly focuses on the development and advancement of tools to enable the production of chromosomal DNA and the engineering of entire genomes. Advancements are needed in the design and construction of functional genetic systems though the synthesis of long oligonucleotides, assembly of multiple fragments, and precision editing with high specificity.

Introduction and Impact

Fundamentally, an organism’s sensing, metabolic, and decision-making capabilities are all encoded within their genome, a very long double-stranded DNA molecule. By changing an organism’s genome sequence, we have the ability to rationally alter these cellular functions, and thereby engineer them to address a myriad of societal challenges. The ability to rationally alter DNA sequences, combining gene editing, DNA synthesis, and DNA assembly, are therefore considered a cornerstone capability of engineering biology, enabling us to construct engineered genetic systems to reprogram organisms with targeted functions. Advances in gene editing, synthesis, and assembly have significant transformative impacts on all sectors impacted by engineering biology by broadening the complexity and breadth of functionality that can be introduced into an engineered organism.

The market for synthesized DNA is both mature and ripe for disruption. Using existing technologies, several service providers currently synthesize single-stranded DNA molecules (oligonucleotides) and double-stranded DNA molecules (DNA fragments). They actively compete across several criteria, including cost-per-DNA base pair, sequence fidelity, turnaround time, confidentiality of intellectual property, and customer service. However, several early-stage technologies have the potential to dramatically alter the commercial landscape by enabling the manufacture of much longer DNA fragments at significantly reduced costs.

Gene Editing, Synthesis, and Assembly highlights several technological routes to achieving the overall goal of manufacturing mega-base length DNA molecules, and designing genes and genomes with desired functionalities. We also illustrate how new technological developments in one process (e.g., oligonucleotide synthesis, or coupled synthesis and sequencing) can directly lead to improvements in downstream processes (e.g., DNA fragment synthesis).

Gene Editing, Synthesis, and Assembly Goals

Transformative Tools and Technologies

Oligonucleotide synthesis technologies

Currently, phosphoramidite-based chemistry is the predominant approach for synthesizing oligonucleotides. Even after significant optimization, per-cycle synthesis yields are about 99.5%, meanwhile synthesis of a 200-nucleotide oligonucleotide has a yield of only 35%.¹ New technologies seek to improve this process by: 1) synthesizing thousands of oligonucleotides in parallel, using either on-chip supports or within tiny microtiter wells²; or 2) improving synthesis processivity by replacing the phosphoramidite-based chemistry, for example, using enzyme catalysis (e.g., terminal deoxynucleotidyl transferases) to extend primers with defined nucleotides.³ Clearly, achieving picomole production of 1000-mer oligonucleotides with error-free sequences would significantly improve the overall DNA assembly protocol.

Technologies for oligonucleotide assembly into non-clonal DNA fragments

Currently, multiple 60- to 200-mer oligonucleotides are assembled into non-clonal DNA fragments using a combination of annealing, ligation, and/or polymerase chain reaction. The cost of synthesizing non-clonal DNA fragments is $0.10 to $0.30 per base pair, depending on size and complexity. DNA fragments between 300 and 1800 base pairs can be synthesized by multiple providers and DNA fragments up to 5800 base pairs can be synthesized by select providers at increased cost. Errors are introduced whenever two oligonucleotides form undesired base pairings, when two oligonucleotides are incorrectly ligated together, or when DNA polymerases extend a synthesized DNA fragment with an incorrect nucleotide. Certain sequence determinants will increase the error rate, resulting in a mixture of undesired fragments. Computational sequence design can reduce the frequency of these errors. Mismatch repair enzymes may be added (with added cost) to eliminate DNA fragments with mis-paired nucleotides, for example, as a result of mis-annealing or DNA polymerase errors. This process has been scaled up to assemble thousands of non-clonal DNA fragments per day. The purification of full-length, error-free DNA fragments remains a challenge. Utilizing longer oligonucleotides (see Oligonucleotide synthesis technologies above) would enable the synthesis of longer non-clonal DNA fragments with the same error rate. New technologies utilizing nanopore sequencing have the potential to couple sequencing and purification at single-molecule resolution.

Multi-fragment DNA assembly techniques for clonal genetic systems and genomes

Currently, multiple DNA fragments (300 to 3000 base pairs long) are assembled into large genetic systems (10,000 to 1,000,000 base pairs long) using single-pot DNA assembly techniques that combine cocktails of bio-prospected and/or engineered enzymes, including exonucleases, endonucleases, DNA polymerases, ligases, and/or recombinases.⁴ Enzyme costs are currently about $25 per assembly. Assembled DNA is then introduced into cells for clonal separation and replication. Most assembly techniques have essential sequence determinants, for example, regions of overlapping homology or flanking Type IIs restriction sites.⁵ Errors are introduced when two fragments anneal together at incorrect overlap regions, when two fragments are mis-ligated at incorrect ligation junctions, or when DNA polymerases incorporate incorrect nucleotides during DNA synthesis. Computational sequence design can limit the frequency of errors. A major challenge for DNA assembly is the trial-and-error identification of a full-length, error-free genetic system. For example, an optimized assembly technique with a per-junction efficiency of 90% will assemble a 10-part (3000 base pairs per part) system with 35% yield. At the same per-junction efficiency, assembling a 1,000,000 base pair genome from 3000 base pair DNA fragments will have a miniscule yield of 5.2×10^-14%. This limitation to DNA assembly has motivated the synthesis of longer non-clonal DNA fragments (see Technologies for oligonucleotide assembly into non-clonal DNA fragments above). For example, 1,000,000 base pair genomes could be assembled from 10,000 base pair, 30,000 base pair, or 50,000 base pair DNA fragments with a 0.002%, 2.7%, or 11% efficiency, respectively. If longer non-clonal DNA fragments are unavailable, then hierarchical approaches to DNA assembly are required, which increases the number of DNA assembly reactions and verification costs.

Sequencing costs become significant once assembled genetic systems are large and/or assembly yields are exceedingly small. For example, after assembling a 30,000 base pair genetic system with a 35% yield, it is necessary to sequence at least seven clonal isolates to achieve at least a 95% chance of identifying a fully-correct one. At low throughput, this cost is about $1000 (using Sanger sequencing). Using next generation sequencing, this cost can be greatly reduced to about $0.70, but only when a large amount of DNA (2 billion base pairs) is sequenced at the same time.⁶ Similarly, if a 1,000,000 base pair genome is assembled from 30,000 base pair fragments with a 2.7% yield, then it would be necessary to sequence 100 clonal isolates to achieve a 93% chance of identifying a fully correct one (about $275 in sequencing costs). Finally, hierarchical DNA assembly can be performed by first assembling and purifying smaller genetic systems (e.g., up to 30,000 base pairs) and then using them to perform a multi-fragment assembly to build larger genetic systems (e.g., up to 35 five times larger than the smaller systems).⁷ Hierarchical DNA assembly increases sequencing costs by a multiplier roughly equal to the number of hierarchical cycles. Overall, DNA assembly costs are greatly reduced by utilizing longer non-clonal DNA fragments and by parallelizing operations such that at least 2 billion base pairs of DNA are verified across multiple DNA assembly reactions.

Footnotes & Citations

Hughes, R. A., & Ellington, A. D. (2017). Synthetic DNA synthesis and assembly: putting the synthetic in synthetic biology. Cold Spring Harbor Perspectives in Biology, 9(1). View publication.
Kosuri, S., & Church, G. M. (2014). Large-scale de novo DNA synthesis: technologies and applications. Nature Methods, 11(5), 499–507. View publication.
Palluk, S., Arlow, D. H., de Rond, T., Barthel, S., Kang, J. S., Bector, R., … Keasling, J. D. (2018). De novo DNA synthesis using polymerase-nucleotide conjugates. Nature Biotechnology, 36(7), 645–650. View publication.
Gibson, D. G., Young, L., Chuang, R.-Y., Venter, J. C., Hutchison, C. A., & Smith, H. O. (2009). Enzymatic assembly of DNA molecules up to several hundred kilobases. Nature Methods, 6(5), 343–345. View publication.; Hughes, R. A., & Ellington, A. D. (2017). Synthetic DNA synthesis and assembly: putting the synthetic in synthetic biology. Cold Spring Harbor Perspectives in Biology, 9(1). View publication.
Engler, C., Kandzia, R., & Marillonnet, S. (2008). A one pot, one step, precision cloning method with high throughput capability. Plos One, 3(11), e3647. View publication.
Goodwin, S., McPherson, J. D., & McCombie, W. R. (2016). Coming of age: ten years of next-generation sequencing technologies. Nature Reviews. Genetics, 17(6), 333–351. View publication.
Richardson, S. M., Mitchell, L. A., Stracquadanio, G., Yang, K., Dymond, J. S., DiCarlo, J. E., … Bader, J. S. (2017). Design of a synthetic yeast genome. Science, 355(6329), 1040–1044. View publication.