Hi Roderick!
I'm developing a k-mer counting Python package for internal usage and I'm using needletail as a backend. While developing it, I noticed that Kmers and CanonicalKmers are inconsistent regarding non-ATCG characters. While Kmers count them, they are skipped by CanonicalKmers (understandably so).
Because of that, my function only uses CanonicalKmers even when counting non-canonical k-mers (I just reverse complement the sequence if canonical boolean is true), which causes additional computational burden.
I don't know if this decision was made by design, but maybe Kmers should include an argument that allows the user to choose whether non-ATCG characters should be ignored.
Thank you for all your work in needletail!