Blog

The Gencove Team - Jul 25, 2024

Evolving the PGx toolkit with low-pass whole genome sequencing

The tools available to scientists and clinicians in the field of genetics are evolving rapidly. While genotyping microarrays once dominated the field, next-generation sequencing (NGS) technologies are becoming increasingly popular. This is particularly true among researchers studying pharmacogenomics (PGx). In recent years, the number of variants linked to an altered drug response has grown rapidly and consequently put a spotlight on the critical failings of microarrays1,2.

In this blog, we highlight recent data comparing the performance of microarray technology with either low-pass whole genome sequencing (lpWGS) on its own or in combination with a target enrichment sequencing panel.
 

Low-pass sequencing vs genotyping arrays: Fundamental differences

Genotyping microarrays are a relatively fast and low-cost technology that enable researchers to probe samples for specific, pre-defined DNA sequences (often ranging from 500,000 to 2,000,000 variants)3. In PGx, microarrays are typically built to detect common variants that have a known effect on drug response. 

Though cost-effective, microarrays offer narrow and incomplete coverage across many clinically relevant loci, and are only capable of providing data for a predefined set of variants. This means if researchers wish to analyze new variants, a new array must be designed, ordered, and samples re-processed, resulting in a significant practical barrier to research in a rapidly evolving field. Additionally, as genetic studies have predominantly focused on individuals of European descent, arrays built for common PGx variants are likely to be biased towards these populations, reducing the array’s overall effectiveness, and further exacerbating health disparities in genomic research4,5.

The Commonality of Rare Variants
By their nature, microarrays target a limited number of variants. These variants tend to be common (minor allele frequency >1%) in patients with predominantly European ancestry. Yet, the highly polymorphic nature of drug-metabolizing enzymes means that most PGx variants are rare, with each surfacing in less than 1% of individuals but collectively representing a majority of variants6,7,8. Additionally, variants that are rare in patients with European ancestry can be common in other ancestral backgrounds, with one study demonstrating a 10x enrichment of rare PGx variants in non-European populations9. Therefore it is increasingly clear that PGx research in non-European populations is needed, and that the focus of PGx screening should expand beyond arrays and common variants.

In contrast, lpWGS—wherein the entire genome is typically sequenced to an average depth ≤1x—enables researchers to collect a wealth of genetic data across clinically relevant loci. lpWGS is also a hypothesis-free process, meaning each loci’s sequence is recorded regardless of expectations, allowing for an agnostic survey of the landscape of genetic variation across populations. Not only does this flexibility facilitate the discovery of new PGx gene-phenotype interactions, but it reduces assay bias and allows researchers to quickly adapt their study to include new variants.

Comparing low-pass sequencing to genotyping arrays for PGx

In collaboration with GSK, researchers at Gencove performed a study10 in which 79 diverse individuals were selected and assayed using both:

  1. lpWGS with an average target coverage of 1x; and

  2. The Precision Medicine Research Array (PMRA), a genotyping array specifically designed for PGx applications.

We additionally downsampled the sequence data to assess how 0.4x, 0.6x, and 0.8x coverage would affect downstream performance. Sequence data was then imputed to obtain genome-wide genotypes for these individuals. We then examined the concordance of lpWGS genotyping calls to those obtained from the PMRA, with a specific focus on the genes involved in drug absorption, distribution, metabolism, and excretion11.

The sequencing-based results were found to be highly concordant with those deriving from the PMRA (Figure 1). The positive percent agreement (PPA) ranged from 98.5% at 0.4x coverage to 99.4% at 1x for common variants, while rare variants showed a PPA of 82.1% at 0.4x to 95.2% at 1x.

Of note, we found that the results of genome-wide imputation based on sequencing data were consistently more accurate than those produced using the PMRA data, even when the average sequencing depth was just above 0.4x. The marked increase in accuracy, even at very low coverages, demonstrates that low-pass sequencing can be a much more powerful tool for genome-wide trait mapping.

With respect to PGx, these results show that lpWGS represents a competitive alternative to genotyping arrays when analyzing genes that are particularly relevant for PGx, while simultaneously providing higher power for overall trait mapping genome-wide.

Figure 1: Genotype concordance across platforms at specific variants relevant to pharmacogenomics. Concordance at SNPs in ADME genes. Variants were classified as “rare” if the minor allele was present in five or fewer copies in the sample (corresponding to an allele frequency of about 3%. Concordance rates are split according to the genotype calls on the PRMA, which was considered “truth”—reference concordance is at variants where the PRMA is homozygous reference and non-reference concordance is for all other sites.

Zooming in on the HLA locus

The HLA locus plays an essential role in immune regulation, and is therefore of particular interest for PGx applications. However, due to its complexity and its highly polymorphic nature, it is notoriously difficult to assay using microarrays or even standard short-read NGS assays.. In a recent experiment12, we evaluated the accuracy of four-digit HLA imputation from lpWGS.

Specifically, we took 1x low-pass sequence data from 136 diverse individuals and imputed four-digit HLA types across the five “classical” HLA genes, HLA-A, -B, -C, -DQB1, and -DRB1. We then evaluated the accuracy of our calls against the known, “gold-standard” types for these individuals. We found that across these 680 calls (136 x 5 = 680), 98% were concordant with the “truth,” demonstrating that lpWGS enables accurate HLA typing across diverse populations (Figure 2).

Figure 2: Accuracy of HLA calls from low-pass sequencing. HLA types were imputed using a “double-imputation” method involving initial imputation to a whole-genome haplotype panel followed by imputation of four-digit HLA types using CookHLA13.

Improving on lpWGS

One drawback of low-pass sequencing compared to genotyping arrays is that, although it generally outperforms arrays across the genome14, it lacks the precision to accurately assay certain SNPs or genes critical for some studies.

In another recent experiment, we designed an assay in which a lpWGS “backbone” was combined with hybrid capture targeted sequencing probes that are specific for the CYP2D6 gene. The enzyme produced by CYP2D6 is responsible for metabolizing >20% of all prescribed drugs. As with the HLA locus, CYP2D6 is highly polymorphic across global populations and is difficult to type using either microarrays or short-read NGS due to the fact that it is situated next to its paralog CYP2D7, which exhibits high sequence homology and thus poses a challenge in terms of mappability and variant calling.

As a proof of concept, we designed an assay combining lpWGS with a panel of hybrid capture probes that specifically enrich for PGx genes (including CYP2D6), resulting in high sequencing coverage at the enriched genes. This assay was used to sequence 368 diverse individuals. We then examined the realized coverage and mapping qualities of reads overlapping CYP2D6/7 region. As shown in Figure 3, we obtained consistently high depth of coverage (>100x) as well as high mapping quality in CYP2D6, suggesting that lpWGS can be made even more effective for PGx when it is supplemented with target capture probes.

Figure 3: (a) average realized coverage across 368 samples in the genomic region corresponding to CYP2D6/7. (b) average mapping quality (MapQ) for the reads across 368 samples in the same genomic region

Evolving Your PGx Toolkit

Taken together, the data shown here emphasizes that lpWGS can be a powerfully accurate and effective tool for PGx. But how can you begin to integrate lpWGS into your workflow?

Gencove's platform for genomic data generation, analysis, and management offers a robust solution for integrating PGx into pre-clinical research and clinical trials. Using lpWGS or a combination of lpWGS with targeted capture probes, Gencove provides a cost-effective and scalable solution to PGx screening.

Importantly, this technology enables comprehensive genetic profiling, capturing both common and rare variants across diverse populations, thereby helping researchers bring the benefits of PGx to all.