Smirk is a chemistry-specific tokenizer that provides complete coverage of the OpenSMILES specification, that is built using Rust 🦀 and HuggingFace's tokenizers 🤗. Installation is easy, and Smirk works out-of-the-box with the HuggingFace ecosystem.
Check our documentation to see smirk
in action, or read the paper to learn
about tokenization for molecular foundation models.
pip install smirk