Skip to content

5. Command line options

Clara Köhne edited this page Jan 23, 2023 · 4 revisions

Input/output control

Option Description
--contigs PATH Path to one or more nucleotide sequences in FASTA format (required).
--reference PATH Either (1) a path to one or more sequences in FASTA format or (2) a subject database (use --database for the latter; required).
--database When specified, --reference points to a DIAMOND or a BLAST database
--output-dir PATH Write output files to PATH (default path: patchwork_output)

Alignment options

Option Description
--matrix NAME Set the substitution matrix to NAME (default: BLOSUM62)
--custom-matrix PATH Set substitution matrix to the custom matrix in PATH
--gapopen NUMBER Set the gap open penalty to NUMBER (where NUMBER is a positive integer)
--gapextend NUMBER Set the gap extension penalty to NUMBER (where NUMBER is a positive integer)

Using custom DIAMOND flags

Patchwork supports the following of DIAMOND options:

Option Description
--query-gencode NUMBER Set the genetic code. Allowed values can be found on the NCBI website (default: Standard Code)
--strand STRING Set the query strand. Allowed are 'plus', 'minus', and 'both' (default: both)
--min-orf NUMBER Set the minimum open reading frame length
--fast, --mid-sensitive, --sensitive, --more-sensitive, --very-sensitive, --ultra-sensitive Set the sensitivity mode. You may use at most one flag from the list. If none is provided, the DIAMOND default will be used.
--iterate [MODE...] Iterate through sensitivity settings. For DIAMOND >= 2.0.12, you can optionally specify a list of space-separated sensitivity modes. Allowed values include the above-listed sensitivity values, as well as 'default', and none
--frameshift NUMBER Enable and set penalty for frameshifting operations. Positive integers are allowed
--evalue NUMBER Set e-value cutoff to the specified floating-point number
--min-score NUMBER Set the minimum bitscore threshold (floating-point number). Overrides --evalue
--max-target-seqs NUMBER Maximum number of subject/reference sequences that are reported per query. Setting to 0 will report all hits (default: 25)
--top NUMBER Report only hits within the given percentage range of the top score. Overrides --max-target-seqs
--max-hsps NUMBER Maximum number of HSPs DIAMOND may report per target sequence per query. Setting to 0 will report all HSPs (default: 1)
--id PERCENTAGE Report only hits with sequence identity above the given floating-point number
--query-cover PERCENTAGE Discard DIAMOND hits with less query cover than the given percentage
--subject-cover PERCENTAGE Discard DIAMOND hits with less subject cover than the given percentage
--masking MODE Set repeat masking mode. Allowed values are 0 (disabled), 1 (tantan masking) and 2 (BLASTP SEG masking). The last option requires DIAMOND >= 2.0.12. Note that, contrary to the DIAMOND default, Patchwork disables masking by default! (default: 0)
--len NUMBER Discard hits shorter than the provided length

For a full list of DIAMOND options, see DIAMOND's wiki. If you are missing a flag, please notify us and we can add it to Patchwork's supported DIAMOND options.

Choosing a substitution matrix

The following substitution matrices are available (BLOSUM62 is the default option).

Matrix Supported values for (gap open)/(gap extend) Default gap penalties
BLOSUM45 (10-13)/3; (12-16)/2; (16-19)/1 14/2
BLOSUM50 (9-13)/3; (12-16)/2; (15-19)/1 13/2
BLOSUM62 (6-11)/2; (9-13)/1 11/1
BLOSUM80 (6-9)/2; 13/2; 25/2; (9-11)/1 10/1
BLOSUM90 (6-9)/2; (9-11)/1 10/1
PAM250 (11-15)/3; (13-17)/2; (17-21)/1 14/2
PAM70 (6-8)/2; (9-11)/1 10/1
PAM30 (5-7)/2; (8-10)/1 9/1

Using a custom substitution matrix

If you decide to use a custom matrix, it should adhere to the format used by DIAMOND and BioAlignments.jl. Down below is an example of what such a Matrix could look like:

#  Matrix made by matblas from blosum62.iij
#  BLOSUM Clustered Scoring Matrix in 1/2 Bit Units
#  Blocks Database = /data/blocks_5.0/blocks.dat
#  Cluster Percentage: >= 62
#  Entropy =   0.6979, Expected =  -0.5209
   A  R  N  D  C  Q  E  G  H  I  L  K  M  F  P  S  T  W  Y  V  B  Z  X
A  4 -1 -2 -2  0 -1 -1  0 -2 -1 -1 -1 -1 -2 -1  1  0 -3 -2  0 -2 -1  0
R -1  5  0 -2 -3  1  0 -2  0 -3 -2  2 -1 -3 -2 -1 -1 -3 -2 -3 -1  0 -1
N -2  0  6  1 -3  0  0  0  1 -3 -3  0 -2 -3 -2  1  0 -4 -2 -3  3  0 -1
D -2 -2  1  6 -3  0  2 -1 -1 -3 -4 -1 -3 -3 -1  0 -1 -4 -3 -3  4  1 -1
C  0 -3 -3 -3  9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 -3 -3 -2
Q -1  1  0  0 -3  5  2 -2  0 -3 -2  1  0 -3 -1  0 -1 -2 -1 -2  0  3 -1
E -1  0  0  2 -4  2  5 -2  0 -3 -3  1 -2 -3 -1  0 -1 -3 -2 -2  1  4 -1
G  0 -2  0 -1 -3 -2 -2  6 -2 -4 -4 -2 -3 -3 -2  0 -2 -2 -3 -3 -1 -2 -1
H -2  0  1 -1 -3  0  0 -2  8 -3 -3 -1 -2 -1 -2 -1 -2 -2  2 -3  0  0 -1
I -1 -3 -3 -3 -1 -3 -3 -4 -3  4  2 -3  1  0 -3 -2 -1 -3 -1  3 -3 -3 -1
L -1 -2 -3 -4 -1 -2 -3 -4 -3  2  4 -2  2  0 -3 -2 -1 -2 -1  1 -4 -3 -1
K -1  2  0 -1 -3  1  1 -2 -1 -3 -2  5 -1 -3 -1  0 -1 -3 -2 -2  0  1 -1
M -1 -1 -2 -3 -1  0 -2 -3 -2  1  2 -1  5  0 -2 -1 -1 -1 -1  1 -3 -1 -1
F -2 -3 -3 -3 -2 -3 -3 -3 -1  0  0 -3  0  6 -4 -2 -2  1  3 -1 -3 -3 -1
P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4  7 -1 -1 -4 -3 -2 -2 -1 -2
S  1 -1  1  0 -1  0  0  0 -1 -2 -2  0 -1 -2 -1  4  1 -3 -2 -2  0  0  0
T  0 -1  0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1  1  5 -2 -2  0 -1 -1  0
W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1  1 -4 -3 -2 11  2 -3 -4 -3 -2
Y -2 -2 -2 -3 -2 -1 -2 -3  2 -1 -1 -2 -1  3 -3 -2 -2  2  7 -1 -3 -2 -1
V  0 -3 -3 -3 -1 -2 -2 -3 -3  3  1 -2  1 -1 -2 -2  0 -3 -1  4 -3 -2 -1
B -2 -1  3  4 -3  0  1 -1  0 -3 -4  0 -3 -3 -2  0 -1 -4 -3 -3  4  1 -1
Z -1  0  0  1 -3  3  4 -2  0 -3 -3  1 -1 -3 -1  0 -1 -3 -2 -2  1  4 -1
X  0 -1 -1 -1 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2  0  0 -2 -1 -1 -1 -1 -1

Alignment masking and trimming options

Option Description
--retain-stops Do not remove stop codons (*) in the output sequences
--retain-ambiguous Do not remove ambiguous characters from the output sequences
--no-trimming Disable sliding window alignment trimming
--window-size NUMBER Set the size of the sliding window for alignment trimming (default: 4)
--required-distance NUMBER Set the maximum average distance for the sliding window alignment trimming (default: -7.0)

Miscellaneous

Option Description
--threads NUMBER The number of threads to utilize in total (default: all available threads)
--fasta-extension STRING Filetype extension for output FASTA files (default: .fas)
--species-delimiter CHARACTER Delimiter used to distinguish the OTU from the rest of the sequence ID in a FASTA header (default: @)
--no-plots Disable plot output to save time
--wrap-column NUMBER Wrap output FASTA sequences the provided column NUMBER (default: no wrap, everything on one line )

Clone this wiki locally