Skip to content

found incorrect link for genome #27

@bluegenes

Description

@bluegenes

When we try to find the link to download GCA_000193795.2, directsketch found the following link: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/193/795/GCA_000193795.1_ASM19379v1/GCA_000193795.1_ASM19379v1_genomic.fna.gz. Note that the folder https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/193/795/GCA_000193795.1_ASM19379v1/ does indeed exist, but this .1 assembly is suppressed, so the download fails.

When I looked up the genome via NCBI, I found the v2 genome is available at : https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/193/795/GCA_000193795.2_ASM19379v2/GCA_000193795.2_ASM19379v2_genomic.fna.gz

so this script found the v1 folder, instead of the v2 folder, causing the download to fail.

To fix this, look into the link + version check here: https://github.com/sourmash-bio/sourmash_plugin_directsketch/blob/main/src/directsketch.rs#L106-L125

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions