You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Library of sketching functions used by [PopPUNK](https://www.poppunk.net>).
10
+
Library of sketching functions used by [PopPUNK](https://www.poppunk.net>). See documentation at http://poppunk.readthedocs.io/en/latest/sketching.html
10
11
11
12
## Installation
12
13
@@ -67,7 +68,7 @@ installed (tested on 10.2 and 11.0).
67
68
Create a set of sketches and save these as a database:
This will print the distances to STDOUT and can be captured with `>`. If you wish to output save output files as a database for use with PopPUNK.add the `-o` option.
95
91
96
92
### Other options
97
93
98
94
Sketching:
99
95
100
-
-`--strand` ignores reverse complement k-mers, if input is all in the same sense
96
+
-`--single-strand` ignores reverse complement k-mers, if input is all in the same sense
101
97
-`--min-count` minimum k-mer count to include when using reads
102
98
-`--exact-counter` uses a hash table to count k-mers, which is recommended for non-bacterial datasets.
103
99
104
100
Query:
105
101
106
102
- To only use some of the samples in the sketch database, you can add the `--subset` option with a file which lists the required sample names.
107
-
-`--jaccard` will output the Jaccard distances, rather than core and accessory distances.
103
+
-`query jaccard` will output the Jaccard distances, rather than core and accessory distances.
104
+
-`query sparse` will output a sparse distance matrix,
105
+
using either a `--threshold` or the k-nearest (`-kNN`).
108
106
109
107
### Large datasets
110
108
111
109
When working with large datasets, you can increase the `--cpus` to high numbers and get
112
110
a roughly proportional performance increase.
113
111
114
112
For calculating sketches of read datasets, or large numbers of distances, and you have a CUDA compatible GPU,
115
-
you can calculate distances on your graphics device even more quickly. Add the `--use-gpu` option:
113
+
you can calculate distances on your graphics device even more quickly. Add the `--gpu` option with the desired
@@ -363,7 +291,7 @@ Blais & Blanchette is used (formula 6 in the paper cited below).
363
291
sketch each separately and join the databases.
364
292
- GPU sketching filters out any read containing an N, which may give slightly
365
293
different results from the CPU code.
366
-
- GPU sketching with variable read lengths is untested, but theoretically supported.
294
+
- GPU sketching with variable read lengths is unsupported. Illumina data only for now!
367
295
- GPU distances use lower precision than the CPU code, so slightly different results
368
296
are expected.
369
297
@@ -427,6 +355,9 @@ Modifiers:
427
355
-`PROFILE=1` runs with profiler flags for `ncu` and `nsys`
428
356
-`GPU=1` also build CUDA code (assumes `/usr/local/cuda-11.1/` and SM v8.6)
429
357
358
+
### azure
359
+
The repository key for the ubuntu CUDA install is periodically updated, which may cause build failures. See https://developer.nvidia.com/blog/updating-the-cuda-linux-gpg-repository-key/ and update in `azure-pipelines.yml`.
360
+
430
361
### Test that Python can build an installable package
431
362
432
363
Build a python source package and install it into an empty docker container with vanilla python 3. If this works, then there's a good chance that the version uploaded to pypi will work
0 commit comments