Skip to content

Conversation

bluegenes
Copy link
Contributor

@bluegenes bluegenes commented Apr 15, 2024

We don't have wort for protein sigs (yet). So this PR uses a new plugin, sourmash_plugin_directsketch to download and sketch proteomes, checking the md5sum along the way. When no proteome was found, we download the genome, predict proteins with prodigal, then sketch.

Steps to this workflow:

  • check old database to find missing (new) accession
  • download and sketch proteomes
  • download genomes for failed proteome downloads
  • prodigal genomes --> proteins
  • sketch prodigal proteomes
  • cat 3 db together (prior release, direct downloads, prodigal proteomes)

To avoid repeating steps, this workflow use the taxonomy/metadata parsing done in the gtdb-rs214.genomic workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant