Blog

Gillian Belbin, Senior Data Scientist - Nov 09, 2023

Gencove platform introduces Human Genome Diversity and 1000 Genome reference panel for build GRCh37

We recently reported on the release of the novel gnomAD_v3.1.2 HGDP1KG human reference panel for build GRCh38. This new panel consists of a total of 4091 individuals, and we have previously discussed its considerable gains in imputation power when compared to other publicly available human reference panels.

We have now made the HGDP1KG panel available for build GRCh37. The GRCh37 version consists of a total of 78,752,417 variants lifted over from GRCh38. In a similar set of analyses to that used to evaluate the GRCh38 version, we demonstrate that the new dataset achieves superior performance to that of the prior instantiation of our human GRCh37 panel (consisting of the 1000 Genomes Project Phase III (1KG) samples only).

Results

In order to evaluate the performance of HGDP1KG we performed a similar analysis to that described in our previous post whereby N=116 samples from the Simons Genome Diversity Panel (SGDP) were downsampled to 1x coverage and imputed using either the panel constructed from the 1KG alone, or the new HGDP1KG panel. We compared the results to a ground truth set of genotype calls for the same set of SGDP samples, examining imputation performance at both the intersect of sites shared between each panel (Figure 1A), as well as across the total of sites present within each individual panel (Figure 1B). As was previously observed for GRCh38, we see a marked improvement in imputation r-squared across allele frequencies when imputing to the new HGDP1KG panel, with notable gains in power at the lower end of the site frequency spectrum.

Figure 1(A). Comparison of imputation accuracy between the 1KG + HGDP panel and the 1KG only panel at the intersect of sites shared across both when compared to N=116 ground truth samples derived from SGDP. (B) Comparison of imputation performance for both panels across the site frequency spectrum for all sites represented in each panel.

Conclusions

We demonstrate that the GRCh37 HGDP1KG panel achieves superior performance to pre-existing GRCh37 publicly available reference panels for imputation of low-pass whole genome sequence data.

This is now available through the Gencove platform as our standard GRCh37 human reference panel. Gencove is revolutionizing how genetic data is used, offering a complete platform for generating, analyzing, and managing genomic information. Get in touch to learn more about leveraging the Gencove platform to accelerate and streamline your research.