clam identifies genomic regions with sufficient sequencing depth to be considered "callable" and uses this information to calculate population genetic statistics from VCFs. It eliminates the need to generate an all-sites VCF files while still producing accurate diversity estimates. clam was designed specifically for large population genomics datasets.
From bioconda:
conda create -n clam bioconda::clamFrom source:
git clone https://github.com/cademirch/clam.git
cd clam
cargo build --release
./target/release/clam --helpThe clam loci command can be used to generate callable loci intervals from sequencing depth data from either alignments or GVCF files. The resulting interval file describes how many samples were callable at each position in the genome.
clam loci -t 16 -m 10 sample1.d4.gz sample2.d4.gz sample3.d4.gzThe clam stat command can be used to estimate common population genetic statistics such as π, dxy, and FST in windows. stat uses the callable loci interval file alongside a VCF to produce accurate estimates, even in the presence of missing data.
clam stat -t 16 -w 10000 variants.vcf.gz callable-loci.d4Read the documentation for more information.
clam is distributed under the terms of the MIT license.