Blog

Forest Dussault, Software Engineer - Jun 22, 2023

Introducing a new way to bring FASTQ files to the Gencove platform

We’re excited to announce that we are introducing a new feature that will make bringing users’ FASTQ files onto Gencove’s platform easier than ever. We've upgraded the Gencove CLI to allow our users to import FASTQ files into your Gencove projects - directly from URLs!

This latest addition expands the array of convenient methods for transferring data into Gencove's systems. In addition to local uploads and importing from S3 or BaseSpace, we now offer support for direct imports from any AWS, Azure, or GCP URL.

Using the CLI to import FASTQ files from URLs

The process of importing FASTQ files via URLs is straightforward. To do this, users can prepare a map file, which is a simple .csv file that lists all the URLs of the files for import. We refer to this as the *.fastq-map.csv file. In this file, users simply set the ID, read notation (R1/R2), and URL for each sample.

client_id,r_notation,path
sample1,r1,<https://example-bucket.storage.googleapis.com/sample_R1.fastq.gz>
sample1,r2,<https://example-bucket.storage.googleapis.com/sample_R2.fastq.gz>

Once the *.fastq-map.csv file has been prepared, the Gencove CLI is used to import the sample(s). By using the gencove upload command and referencing the map file, Gencove’s system then checks that the URLs listed are valid, and then the import begins.

For further convenience, users can also automatically assign these uploads to a particular project by using the --run-project-id option in the CLI when calling gencove upload. For example:

gencove upload urls.fastq-map.csv --run-project-id <project-id> --api-key <api-key>

If the samples were not assigned directly to a project, they will be available under the “Unassigned” section of the “My FASTQs” area of the web UI. From here, the files can then be imported into projects via the web UI, CLI or API.

The file retrieval from the listed URLs is executed once samples are added to a project, which begins an analysis run. During the analysis run, the FASTQ files are downloaded from the supplied URLs as needed.

Tips on generating presigned URLs

Presigned URLs (also known as also known Shared Access Signatures in Azure and Signed URLs in GCP) are a powerful tool when it comes to securely sharing private resources over the internet. These URLs provide temporary access to a specific resource, limiting the potential for unauthorized use.

Below, we provide basic guidance on how to generate presigned URLs for FASTQ files destined for Gencove using Amazon Web Services (AWS), Google Cloud Platform (GCP), or Microsoft Azure. This section covers simple CLI usage for each platform, alternatively, users are able to use the respective SDKs or APIs of each platform as well.

Note that when generating presigned URLs with any of the supported cloud providers, we strongly recommend setting a generous expiration time to ensure that the supplied URLs do not expire by the time samples are ready to be processed by a project pipeline.

AWS

With AWS, presigned URLs can be generated using AWS CLI and the aws s3 presign command. Here, users must specify the bucket and the object key. Lifespan of the URL in seconds can be defined with the --expires-in parameter. The maximum value is 604800 seconds (1 week), however, note that if the credentials that were used to generate the URL expire before the URL (e.g. in the case of assuming a role), the URL will expire early.

aws s3 presign s3://mybucket/reads_R1.fastq.gz --expires-in 86400

For more information, see the official docs.

GCP

GCP allows the generation of presigned URLs using gsutil. Users can use the gsutil signurl command, specifying the service account's private key file and the bucket object URL. Again, pay close attention to the -d option, where users can specify the URL's duration. The maximum duration for GCP presigned URLs is also 1 week.

gsutil signurl -d 1d /path/to/private-key.json gs://mybucket/reads_R1.fastq.gz

For more information, see the official docs.

Azure

With Azure, users can use the Azure CLI to create a Shared Access Signature (SAS), which is effectively a presigned URL. Using the az storage blob generate-sas command, specify the container name, blob name, and permissions. The --expiry parameter is once again important here, and uses the format yyyy-mm-dd or yyyy-mm-ddTHH:mmZ for a UTC date time value.

az storage blob generate-sas --account-name exampleaccount --container-name mycontainer --name reads_R1.fastq.gz --permissions r --expiry 2023-06-20T00:00Z

For more information, see the official docs.

Conclusion

This enhancement streamlines data ingestion for Gencove users migrating from other cloud platforms, aligning with Gencove's commitment to easy-to-use software that is well integrated with user workflows. More information on Gencove CLI usage is available in the official Gencove docs. As always, please reach out if you have any comments or questions and let us know what you think!