Blog

Maximillian Rozenblum, Applied Bioinformatics Engineer - Nov 10, 2022

New species: salmon

At Gencove, our mission is to expand the accessibility and application of genomics. In 2020, 90.5 million and 84.1 million metric tons of fish were captured from the wild, and farmed through the controlled aquaculture process respectively. Farmed salmon contributes significantly to global production. Keeping this in mind, we are excited to announce the release of our first Atlantic Salmon haplotype reference panel, with the goal of bringing affordable whole genome information to aquaculture genomic improvement.

This panel enables both low-cost genotyping for genomic selection as well as an unprecedented level of resolution for a high throughput assay for research and fine-mapping in Atlantic Salmon.

Panel construction

To construct the haplotype reference panel, we first identified the latest reference genome (Ssval v3.1), and combined 134 publicly available salmon FASTQs from the data to perform the salmon population structure analysis by Macqueen et al [1]. Next we e created a haplotype reference panel, by performing variant calling and phasing. The resulting reference panel comprises 16.3M SNPs and short indels.

Within the public data we used, there were three experimental groups: salmon of Canadian descent (n=50), of Norwegian descent (n=50), and farmed salmon (n=34). To examine the population structure of the samples that comprised our reference panel, we performed PCA on a subset of the markers. In the following figure, the axes are the first two principal components of the marker subset, and each point on the plot represents an individual in our reference panel.

Each point is colored by the sample breed, and wild animals of the same breed clustered together, illustrating the distinctness of population groups and replicating the structure found in public literature.

In this case, our haplotype reference panel captures known distinctions in genetic origin of Canadian and Norwegian farmed salmon.

We validated the performance of this panel using a leave-one-out approach. From the 134 samples that were in the pipeline, we iteratively removed one sample from the reference panel, took that samples’ raw data, randomly sampled the equivalent 1x coverage and ran them through the imputation pipeline. We then compared the resulting genotypes to the original sample’s genotypes called on the full high coverage data to calculate concordance.

We summarize the leave-one-out results below. The higher average concordance of the farmed salmon than that of wild salmon is likely an artifact of smaller farmed salmon effective population size due to inbreeding. With closed populations, increased inbreeding creates a decrease in genetic diversity which means that even with fewer farmed salmon data, the reference panel better captured the population diversity and provided higher genotyping accuracy.

With over 95% accuracy in every population at over 20 million sites, low-pass sequencing provides the best high-throughput genotyping available in salmon, at a much lower cost than high coverage whole genome sequencing. And with Gencove’s expertise and platform, you will be able to leverage this information easily, quickly, and at scale. We are excited to share it with partners. If you’re interested in learning more, please reach out below.