The sequencing read

Joe Pickrell, Data Scientist at Gencove - May 20, 2021

The coming commoditization of genome sequencing

tl;dr: new competition in genome sequencing hardware will drive expanded use cases for sequencing analytics across industries

In Nature’s Metropolis, the brilliant economic history of Chicago, William Cronon describes the shifts in power between agricultural producers and different industrial groups as technologies like the grain elevator upended traditional market relationships.

In this telling, a set of small steps gradually severed the connection between the buyer of a grain like corn and its original source — while in 1850 someone might buy a bag of corn from an individual producer and know that each individual kernel came from a specific plot of land, by 1880 that same buyer would likely be buying a quantity of ‘Number 2’, pooled from corn produced in any number of anonymous locations.

This commoditization had a number of unexpected downstream consequences, leading to new industries and the development of the first standardized futures contracts.

USDA corn quality grades

This history came to mind recently though I work in a seemingly unrelated industry, that of genome sequencing.

Consider the image below, which shows example short read sequences generated by three different types of sequencing hardware. Connoisseurs of the FASTQ format will notice some subtle differences between the outputs from these machines but will also recognize their fundamental similarity. In some key ways, a sequencing read is a sequencing read is a sequencing read.

Example sequences generated by three different types of sequencing hardware

By analogy, we are not far from a world where sequences like this are a commodity, where the exact hardware used to generate sequence data becomes irrelevant or can be mixed and matched within a ‘quality grade’. My colleague Yaniv Erlich has noted that “the aim of many experimental techniques is to reduce the problems of nature to the determination of DNA sequences”. The commoditization of genome sequences accelerates this trend, such that a successful path to solving many industrial or medical problems will be to reduce the problem to the determination of DNA sequences.

This is already beginning to play out:

  • In medicine, the observation that cells throughout the body are constantly releasing DNA into the bloodstream means that the detection of anything from prenatal genetic conditions to cancer can be approached as a DNA sequencing problem.
  • In agriculture, the problem of tracking of animals through a complex supply chain is naturally considered a problem of DNA sequencing. Remarkably, this is even more general, in that the problem of tracking the origin of any material (even those without DNA) in a supply chain can be approached similarly.
  • Perhaps most famously, the design of the first COVID vaccines was done by companies that had never handled the virus; they had reduced key steps of the R&D process to the simple step of “obtain a sequence”.

It seems fruitful to consider how this trend will influence different industries over the coming years. At Gencove, our view is that the genomics revolution will be driven by software — as data generation becomes simpler and cheaper, in the long run the key genomics technologies will be analytical rather than molecular. At the same time, the molecular technologies necessary to generate sequencing data remain a significant challenge, so to accelerate this transition we additionally invest in tools to drive improvements in the cost and throughput of data generation; for example, with protocols for miniaturized library preparation.

We’re always looking to work with individuals or companies with similar goals; please reach out if you’d like to collaborate!