Skip to content

BattModels/smirk

Repository files navigation

Smirk: A Tokenizer for OpenSMILES

GitHub License arXiv:2409.15370

Smirk is a chemistry-specific tokenizer that provides complete coverage of the OpenSMILES specification, that is built using Rust 🦀 and HuggingFace's tokenizers 🤗. Installation is easy, and Smirk works out-of-the-box with the HuggingFace ecosystem.

Check our documentation to see smirk in action, or read the paper to learn about tokenization for molecular foundation models.

Installation

pip install smirk

About

An Atomically Complete Tokenizer for Molecular Foundation Models

Resources

License

Stars

Watchers

Forks

Packages

No packages published