Releases: stanford-oval/WikiChat
WikiChat 2.1!
WikiChat 2.1 is now available! Key updates include:
- Improved Multilingual Support: Now supports 25 different Wikipedias (up from 10) available via web and API at search.genie.stanford.edu/wikipedia_20250320: 🇺🇸 English, 🇫🇷 French, 🇩🇪 German, 🇪🇸 Spanish, 🇯🇵 Japanese, 🇷🇺 Russian, 🇵🇹 Portuguese, 🇨🇳 Chinese, 🇮🇹 Italian, 🇸🇦 Arabic, 🇮🇷 Persian, 🇵🇱 Polish, 🇳🇱 Dutch, 🇺🇦 Ukrainian, 🇮🇱 Hebrew, 🇮🇩 Indonesian, 🇹🇷 Turkish, 🇨🇿 Czech, 🇸🇪 Swedish, 🇰🇷 Korean, 🇫🇮 Finnish, 🇻🇳 Vietnamese, 🇭🇺 Hungarian, Catalan, 🇹🇭 Thai.
- Improved Information Retrieval: Improved retrieval accuracy and speed with the latest Snowflake's Arctic embedding model.
- Improved Preprocessing of Wikipedia using Docling. As always, preprocessed Wikipedia is available on HuggingFace.
- Improved WikiChat Pipeline:
- Added inline citations to the final response.
- The 'generate' stage of the pipeline is now always merged with the 'claim extraction' stage, even in the non-distilled setting, for faster and cheaper inference.
- Removed date-based reranking in favor of LLM-based reranking.
- Switched to using pixi for package management and loguru for logging.
WikiChat v2.0!
-
Multilingual Support: By default, retrieves information from 10 different Wikipedias: 🇺🇸 English, 🇨🇳 Chinese, 🇪🇸 Spanish, 🇵🇹 Portuguese, 🇷🇺 Russian, 🇩🇪 German, 🇮🇷 Farsi, 🇯🇵 Japanese, 🇫🇷 French, and 🇮🇹 Italian.
-
Improved Information Retrieval
- Now supports retrieval from structured data such as tables, infoboxes, and lists, in addition to text.
- Has the highest quality public Wikipedia preprocessing scripts
- Uses the state-of-the-art multilingual retrieval model BGE-M3.
- Uses Qdrant for scalable vector search.
- Uses RankGPT to rerank search results. -
Free Multilingual Wikipedia Search API: We offer a high-quality, free (but rate-limited) search API for access to 10 Wikipedias, encompassing over 180M vector embeddings. See its API documentation.
-
Recipe for adapting WikiChat to your own documents (instead of Wikipedia).
-
Expanded LLM Compatibility: Supports 100+ LLMs through a unified interface, thanks to LiteLLM.
-
Optimized Pipeline: Option for a faster and more cost-effective pipeline by merging the "generate" and "extract claim" stages of WikiChat.
-
LangChain Compatibility: Fully compatible with LangChain 🦜️🔗.
-
And Much More!
Full Changelog: v1.0...v2.0
v1.0
This release marks the code for our Findings of EMNLP 2023 paper.