Add a new iterator to skip "bad" bases without canonicalization

Hi Roderick!

I'm developing a k-mer counting Python package for internal usage and I'm using needletail as a backend. While developing it, I noticed that `Kmers` and `CanonicalKmers` are inconsistent regarding non-ATCG characters. While `Kmers` count them, they are skipped by `CanonicalKmers` (understandably so).

Because of that, my function only uses `CanonicalKmers` even when counting non-canonical k-mers (I just reverse complement the sequence if `canonical` boolean is true), which causes additional computational burden.

I don't know if this decision was made by design, but maybe `Kmers` should include an argument that allows the user to choose whether non-ATCG characters should be ignored.

Thank you for all your work in needletail!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a new iterator to skip "bad" bases without canonicalization #40

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add a new iterator to skip "bad" bases without canonicalization #40

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions