Skip to content
This repository was archived by the owner on Mar 17, 2025. It is now read-only.

Commit 94421ea

Browse files
renning22Tranglenathanstittmerrymercyleiwen83
authored
Merge 1126 (#7)
* Remove hardcode flash-attn disable setting (lm-sys#2342) * Document turning off proxy_buffering when api is streaming (lm-sys#2337) * Simplify huggingface api example (lm-sys#2355) * Update sponsor logos (lm-sys#2367) * if LOGDIR is empty, then don't try output log to local file (lm-sys#2357) Signed-off-by: Lei Wen <[email protected]> Co-authored-by: Lei Wen <[email protected]> * add best_of and use_beam_search for completions interface (lm-sys#2348) Signed-off-by: Lei Wen <[email protected]> Co-authored-by: Lei Wen <[email protected]> * Extract upvote/downvote from log files (lm-sys#2369) * Revert "add best_of and use_beam_search for completions interface" (lm-sys#2370) * Improve doc (lm-sys#2371) * add best_of and use_beam_search for completions interface (lm-sys#2372) Signed-off-by: Lei Wen <[email protected]> Co-authored-by: Lei Wen <[email protected]> * update monkey patch for llama2 (lm-sys#2379) * Make E5 adapter more restrict to reduce mismatch (lm-sys#2381) * Update UI and sponsers (lm-sys#2387) * Use fsdp api for save save (lm-sys#2390) * Release v0.2.27 * Spicyboros + airoboros 2.2 template update. (lm-sys#2392) Co-authored-by: Jon Durbin <[email protected]> * bugfix of openai_api_server for fastchat.serve.vllm_worker (lm-sys#2398) Co-authored-by: wuyongyu <[email protected]> * Revert "bugfix of openai_api_server for fastchat.serve.vllm_worker" (lm-sys#2400) * Revert "add best_of and use_beam_search for completions interface" (lm-sys#2401) * Release a v0.2.28 with bug fixes and more test cases * Fix model_worker error (lm-sys#2404) * Added google/flan models and fixed AutoModelForSeq2SeqLM when loading T5 compression model (lm-sys#2402) * Rename twitter to X (lm-sys#2406) * Update huggingface_api.py (lm-sys#2409) * Add support for baichuan2 models (lm-sys#2408) * Fixed character overlap issue when api streaming output (lm-sys#2431) * Support custom conversation template in multi_model_worker (lm-sys#2434) * Add Ascend NPU support (lm-sys#2422) * Add raw conversation template (lm-sys#2417) (lm-sys#2418) * Improve docs & UI (lm-sys#2436) * Fix Salesforce xgen inference (lm-sys#2350) * Add support for Phind-CodeLlama models (lm-sys#2415) (lm-sys#2416) Co-authored-by: Lianmin Zheng <[email protected]> * Add falcon 180B chat conversation template (lm-sys#2384) * Improve docs (lm-sys#2438) * add dtype and seed (lm-sys#2430) * Data cleaning scripts for dataset release (lm-sys#2440) * merge google/flan based adapters: T5Adapter, CodeT5pAdapter, FlanAdapter (lm-sys#2411) * Fix docs * Update UI (lm-sys#2446) * Add Optional SSL Support to controller.py (lm-sys#2448) * Format & Improve docs * Release v0.2.29 (lm-sys#2450) * Show terms of use as an JS alert (lm-sys#2461) * vllm worker awq quantization update (lm-sys#2463) Co-authored-by: 董晓龙 <[email protected]> * Fix falcon chat template (lm-sys#2464) * Fix chunk handling when partial chunks are returned (lm-sys#2485) * Update openai_api_server.py to add an SSL option (lm-sys#2484) * Update vllm_worker.py (lm-sys#2482) * fix typo quantization (lm-sys#2469) * fix vllm quanziation args * Update README.md (lm-sys#2492) * Huggingface api worker (lm-sys#2456) * Update links to lmsys-chat-1m (lm-sys#2497) * Update train code to support the new tokenizer (lm-sys#2498) * Third Party UI Example (lm-sys#2499) * Add metharme (pygmalion) conversation template (lm-sys#2500) * Optimize for proper flash attn causal handling (lm-sys#2503) * Add Mistral AI instruction template (lm-sys#2483) * Update monitor & plots (lm-sys#2506) * Release v0.2.30 (lm-sys#2507) * Fix for single turn dataset (lm-sys#2509) * replace os.getenv with os.path.expanduser because the first one doesn… (lm-sys#2515) Co-authored-by: khalil <[email protected]> * Fix arena (lm-sys#2522) * Update Dockerfile (lm-sys#2524) * add Llama2ChangAdapter (lm-sys#2510) * Add ExllamaV2 Inference Framework Support. (lm-sys#2455) * Improve docs (lm-sys#2534) * Fix warnings for new gradio versions (lm-sys#2538) * revert the gradio change; now works for 3.40 * Improve chat templates (lm-sys#2539) * Add Zephyr 7B Alpha (lm-sys#2535) * Improve Support for Mistral-Instruct (lm-sys#2547) * correct max_tokens by context_length instead of raise exception (lm-sys#2544) * Revert "Improve Support for Mistral-Instruct" (lm-sys#2552) * Fix Mistral template (lm-sys#2529) * Add additional Informations from the vllm worker (lm-sys#2550) * Make FastChat work with LMSYS-Chat-1M Code (lm-sys#2551) * Create `tags` attribute to fix `MarkupError` in rich CLI (lm-sys#2553) * move BaseModelWorker outside serve.model_worker to make it independent (lm-sys#2531) * Misc style and bug fixes (lm-sys#2559) * Fix README.md (lm-sys#2561) * release v0.2.31 (lm-sys#2563) * resolves lm-sys#2542 modify dockerfile to upgrade cuda to 12.2.0 and pydantic 1.10.13 (lm-sys#2565) * Add airoboros_v3 chat template (llama-2 format) (lm-sys#2564) * Add Xwin-LM V0.1, V0.2 support (lm-sys#2566) * Fixed model_worker generate_gate may blocked main thread (lm-sys#2540) (lm-sys#2562) * feat: add claude-v2 (lm-sys#2571) * Update vigogne template (lm-sys#2580) * Fix issue lm-sys#2568: --device mps led to TypeError: forward() got an unexpected keyword argument 'padding_mask'. (lm-sys#2579) * Add Mistral-7B-OpenOrca conversation_temmplate (lm-sys#2585) * docs: bit misspell comments model adapter default template name conversation (lm-sys#2594) * Update Mistral template (lm-sys#2581) * Fix <s> in mistral template * Update README.md (vicuna-v1.3 -> vicuna-1.5) (lm-sys#2592) * Update README.md to highlight chatbot arena (lm-sys#2596) * Add Lemur model (lm-sys#2584) Co-authored-by: Roberto Ugolotti <[email protected]> * add trust_remote_code=True in BaseModelAdapter (lm-sys#2583) * Openai interface add use beam search and best of 2 (lm-sys#2442) Signed-off-by: Lei Wen <[email protected]> Co-authored-by: Lei Wen <[email protected]> * Update qwen and add pygmalion (lm-sys#2607) * feat: Support model AquilaChat2 (lm-sys#2616) * Added settings vllm (lm-sys#2599) Co-authored-by: bodza <[email protected]> Co-authored-by: bodza <[email protected]> * [Logprobs] Support logprobs=1 (lm-sys#2612) * release v0.2.32 * fix: Fix for OpenOrcaAdapter to return correct conversation template (lm-sys#2613) * Make fastchat.serve.model_worker to take debug argument (lm-sys#2628) Co-authored-by: hi-jin <[email protected]> * openchat 3.5 model support (lm-sys#2638) * xFastTransformer framework support (lm-sys#2615) * feat: support custom models vllm serving (lm-sys#2635) * kill only fastchat process (lm-sys#2641) * Update server_arch.png * Use conv.update_last_message api in mt-bench answer generation (lm-sys#2647) * Improve Azure OpenAI interface (lm-sys#2651) * Add required_temp support in jsonl format to support flexible temperature setting for gen_api_answer (lm-sys#2653) * Pin openai version < 1 (lm-sys#2658) * Remove exclude_unset parameter (lm-sys#2654) * Revert "Remove exclude_unset parameter" (lm-sys#2666) * added support for CodeGeex(2) (lm-sys#2645) * add chatglm3 conv template support in conversation.py (lm-sys#2622) * UI and model change (lm-sys#2672) Co-authored-by: Lianmin Zheng <[email protected]> * train_flant5: fix typo (lm-sys#2673) * Fix gpt template (lm-sys#2674) * Update README.md (lm-sys#2679) * feat: support template's stop_str as list (lm-sys#2678) * Update exllama_v2.md (lm-sys#2680) * save model under deepspeed (lm-sys#2689) * Adding SSL support for model workers and huggingface worker (lm-sys#2687) * Check the max_new_tokens <= 0 in openai api server (lm-sys#2688) * Add Microsoft/Orca-2-7b and update model support docs (lm-sys#2714) * fix tokenizer of chatglm2 (lm-sys#2711) * Template for using Deepseek code models (lm-sys#2705) * add support for Chinese-LLaMA-Alpaca (lm-sys#2700) * Make --load-8bit flag work with weights in safetensors format (lm-sys#2698) * Format code and minor bug fix (lm-sys#2716) * Bump version to v0.2.33 (lm-sys#2717) * fix tokenizer.pad_token attribute error (lm-sys#2710) * support stable-vicuna model (lm-sys#2696) * Exllama cache 8bit (lm-sys#2719) * Add Yi support (lm-sys#2723) * Add Hermes 2.5 [fixed] (lm-sys#2725) * Fix Hermes2Adapter (lm-sys#2727) * Fix YiAdapter (lm-sys#2730) * add trust_remote_code argument (lm-sys#2715) * Add revision arg to MT Bench answer generation (lm-sys#2728) * Fix MPS backend 'index out of range' error (lm-sys#2737) * add starling support (lm-sys#2738) --------- Signed-off-by: Lei Wen <[email protected]> Co-authored-by: Trangle <[email protected]> Co-authored-by: Nathan Stitt <[email protected]> Co-authored-by: Lianmin Zheng <[email protected]> Co-authored-by: leiwen83 <[email protected]> Co-authored-by: Lei Wen <[email protected]> Co-authored-by: Jon Durbin <[email protected]> Co-authored-by: Jon Durbin <[email protected]> Co-authored-by: Rayrtfr <[email protected]> Co-authored-by: wuyongyu <[email protected]> Co-authored-by: wangxiyuan <[email protected]> Co-authored-by: Jeff (Zhen) Wang <[email protected]> Co-authored-by: karshPrime <[email protected]> Co-authored-by: obitolyz <[email protected]> Co-authored-by: Shangwei Chen <[email protected]> Co-authored-by: HyungJin Ahn <[email protected]> Co-authored-by: zhangsibo1129 <[email protected]> Co-authored-by: Tobias Birchler <[email protected]> Co-authored-by: Jae-Won Chung <[email protected]> Co-authored-by: Mingdao Liu <[email protected]> Co-authored-by: Ying Sheng <[email protected]> Co-authored-by: Brandon Biggs <[email protected]> Co-authored-by: dongxiaolong <[email protected]> Co-authored-by: 董晓龙 <[email protected]> Co-authored-by: Siddartha Naidu <[email protected]> Co-authored-by: shuishu <[email protected]> Co-authored-by: Andrew Aikawa <[email protected]> Co-authored-by: Liangsheng Yin <[email protected]> Co-authored-by: enochlev <[email protected]> Co-authored-by: AlpinDale <[email protected]> Co-authored-by: Lé <[email protected]> Co-authored-by: Toshiki Kataoka <[email protected]> Co-authored-by: khalil <[email protected]> Co-authored-by: khalil <[email protected]> Co-authored-by: dubaoquan404 <[email protected]> Co-authored-by: Chang W. Lee <[email protected]> Co-authored-by: theScotchGame <[email protected]> Co-authored-by: lewtun <[email protected]> Co-authored-by: Stephen Horvath <[email protected]> Co-authored-by: liunux4odoo <[email protected]> Co-authored-by: Norman Mu <[email protected]> Co-authored-by: Sebastian Bodza <[email protected]> Co-authored-by: Tianle (Tim) Li <[email protected]> Co-authored-by: Wei-Lin Chiang <[email protected]> Co-authored-by: Alex <[email protected]> Co-authored-by: Jingcheng Hu <[email protected]> Co-authored-by: lvxuan <[email protected]> Co-authored-by: cOng <[email protected]> Co-authored-by: bofeng huang <[email protected]> Co-authored-by: Phil-U-U <[email protected]> Co-authored-by: Wayne Spangenberg <[email protected]> Co-authored-by: Guspan Tanadi <[email protected]> Co-authored-by: Rohan Gupta <[email protected]> Co-authored-by: ugolotti <[email protected]> Co-authored-by: Roberto Ugolotti <[email protected]> Co-authored-by: edisonwd <[email protected]> Co-authored-by: FangYin Cheng <[email protected]> Co-authored-by: bodza <[email protected]> Co-authored-by: bodza <[email protected]> Co-authored-by: Cody Yu <[email protected]> Co-authored-by: Srinath Janakiraman <[email protected]> Co-authored-by: Jaeheon Jeong <[email protected]> Co-authored-by: One <[email protected]> Co-authored-by: [email protected] <[email protected]> Co-authored-by: David <[email protected]> Co-authored-by: Witold Wasiczko <[email protected]> Co-authored-by: Peter Willemsen <[email protected]> Co-authored-by: ZeyuTeng96 <[email protected]> Co-authored-by: Forceless <[email protected]> Co-authored-by: Jeff <[email protected]> Co-authored-by: MrZhengXin <[email protected]> Co-authored-by: Long Nguyen <[email protected]> Co-authored-by: Elsa Granger <[email protected]> Co-authored-by: Christopher Chou <[email protected]> Co-authored-by: wangshuai09 <[email protected]> Co-authored-by: amaleshvemula <[email protected]> Co-authored-by: Zollty Tsou <[email protected]> Co-authored-by: xuguodong1999 <[email protected]> Co-authored-by: Michael J Kaye <[email protected]> Co-authored-by: 152334H <[email protected]> Co-authored-by: Jingsong-Yan <[email protected]> Co-authored-by: Siyuan (Ryans) Zhuang <[email protected]>
1 parent a887de7 commit 94421ea

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

62 files changed

+6801
-5987
lines changed

assets/server_arch.png

-10.1 KB
Loading

data/dummy_conversation.json

Lines changed: 4007 additions & 5345 deletions
Large diffs are not rendered by default.

docker/Dockerfile

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
1-
FROM nvidia/cuda:11.7.1-runtime-ubuntu20.04
1+
FROM nvidia/cuda:12.2.0-runtime-ubuntu20.04
22

33
RUN apt-get update -y && apt-get install -y python3.9 python3.9-distutils curl
44
RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
55
RUN python3.9 get-pip.py
6-
RUN pip3 install fschat
6+
RUN pip3 install fschat
7+
RUN pip3 install fschat[model_worker,webui] pydantic==1.10.13

docker/docker-compose.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ services:
2323
- driver: nvidia
2424
count: 1
2525
capabilities: [gpu]
26-
entrypoint: ["python3.9", "-m", "fastchat.serve.model_worker", "--model-names", "${FASTCHAT_WORKER_MODEL_NAMES:-vicuna-7b-v1.3}", "--model-path", "${FASTCHAT_WORKER_MODEL_PATH:-lmsys/vicuna-7b-v1.3}", "--worker-address", "http://fastchat-model-worker:21002", "--controller-address", "http://fastchat-controller:21001", "--host", "0.0.0.0", "--port", "21002"]
26+
entrypoint: ["python3.9", "-m", "fastchat.serve.model_worker", "--model-names", "${FASTCHAT_WORKER_MODEL_NAMES:-vicuna-7b-v1.5}", "--model-path", "${FASTCHAT_WORKER_MODEL_PATH:-lmsys/vicuna-7b-v1.5}", "--worker-address", "http://fastchat-model-worker:21002", "--controller-address", "http://fastchat-controller:21001", "--host", "0.0.0.0", "--port", "21002"]
2727
fastchat-api-server:
2828
build:
2929
context: .

docs/commands/leaderboard.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,14 @@ scp atlas:/data/lmzheng/FastChat/fastchat/serve/monitor/elo_results_20230905.pkl
2424
```
2525
wget https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard/raw/main/leaderboard_table_20230905.csv
2626
```
27+
28+
### Update files on webserver
29+
```
30+
DATE=20231002
31+
32+
rm -rf elo_results.pkl leaderboard_table.csv
33+
wget https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard/resolve/main/elo_results_$DATE.pkl
34+
wget https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard/resolve/main/leaderboard_table_$DATE.csv
35+
ln -s leaderboard_table_$DATE.csv leaderboard_table.csv
36+
ln -s elo_results_$DATE.pkl elo_results.pkl
37+
```

docs/commands/webserver.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,16 @@ vim /home/vicuna/anaconda3/envs/fastchat/lib/python3.9/site-packages/gradio/temp
7272
<script src="https://cdnjs.cloudflare.com/ajax/libs/html2canvas/1.4.1/html2canvas.min.js"></script>
7373
```
7474

75-
2. Loading
75+
2. deprecation warnings
76+
```
77+
vim /home/vicuna/anaconda3/envs/fastchat/lib/python3.9/site-packages/gradio/deprecation.py
78+
```
79+
80+
```
81+
def check_deprecated_parameters(
82+
```
83+
84+
3. Loading
7685
```
7786
vim /home/vicuna/anaconda3/envs/fastchat/lib/python3.9/site-packages/gradio/templates/frontend/assets/index-188ef5e8.js
7887
```

docs/dataset_release.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
## Datasets
2+
We release the following datasets based on our projects and websites.
3+
4+
- [LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset](https://huggingface.co/datasets/lmsys/lmsys-chat-1m)
5+
- [Chatbot Arena Conversation Dataset](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations)
6+
- [MT-bench Human Annotation Dataset](https://huggingface.co/datasets/lmsys/mt_bench_human_judgments)

docs/exllama_v2.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# ExllamaV2 GPTQ Inference Framework
2+
3+
Integrated [ExllamaV2](https://github.com/turboderp/exllamav2) customized kernel into Fastchat to provide **Faster** GPTQ inference speed.
4+
5+
**Note: Exllama not yet support embedding REST API.**
6+
7+
## Install ExllamaV2
8+
9+
Setup environment (please refer to [this link](https://github.com/turboderp/exllamav2#how-to) for more details):
10+
11+
```bash
12+
git clone https://github.com/turboderp/exllamav2
13+
cd exllamav2
14+
pip install -e .
15+
```
16+
17+
Chat with the CLI:
18+
```bash
19+
python3 -m fastchat.serve.cli \
20+
--model-path models/vicuna-7B-1.1-GPTQ-4bit-128g \
21+
--enable-exllama
22+
```
23+
24+
Start model worker:
25+
```bash
26+
# Download quantized model from huggingface
27+
# Make sure you have git-lfs installed (https://git-lfs.com)
28+
git lfs install
29+
git clone https://huggingface.co/TheBloke/vicuna-7B-1.1-GPTQ-4bit-128g models/vicuna-7B-1.1-GPTQ-4bit-128g
30+
31+
# Load model with default configuration (max sequence length 4096, no GPU split setting).
32+
python3 -m fastchat.serve.model_worker \
33+
--model-path models/vicuna-7B-1.1-GPTQ-4bit-128g \
34+
--enable-exllama
35+
36+
#Load model with max sequence length 2048, allocate 18 GB to CUDA:0 and 24 GB to CUDA:1.
37+
python3 -m fastchat.serve.model_worker \
38+
--model-path models/vicuna-7B-1.1-GPTQ-4bit-128g \
39+
--enable-exllama \
40+
--exllama-max-seq-len 2048 \
41+
--exllama-gpu-split 18,24
42+
```
43+
44+
`--exllama-cache-8bit` can be used to enable 8-bit caching with exllama and save some VRAM.
45+
46+
## Performance
47+
48+
Reference: https://github.com/turboderp/exllamav2#performance
49+
50+
51+
| Model | Mode | Size | grpsz | act | V1: 3090Ti | V1: 4090 | V2: 3090Ti | V2: 4090 |
52+
|------------|--------------|-------|-------|-----|------------|----------|------------|-------------|
53+
| Llama | GPTQ | 7B | 128 | no | 143 t/s | 173 t/s | 175 t/s | **195** t/s |
54+
| Llama | GPTQ | 13B | 128 | no | 84 t/s | 102 t/s | 105 t/s | **110** t/s |
55+
| Llama | GPTQ | 33B | 128 | yes | 37 t/s | 45 t/s | 45 t/s | **48** t/s |
56+
| OpenLlama | GPTQ | 3B | 128 | yes | 194 t/s | 226 t/s | 295 t/s | **321** t/s |
57+
| CodeLlama | EXL2 4.0 bpw | 34B | - | - | - | - | 42 t/s | **48** t/s |
58+
| Llama2 | EXL2 3.0 bpw | 7B | - | - | - | - | 195 t/s | **224** t/s |
59+
| Llama2 | EXL2 4.0 bpw | 7B | - | - | - | - | 164 t/s | **197** t/s |
60+
| Llama2 | EXL2 5.0 bpw | 7B | - | - | - | - | 144 t/s | **160** t/s |
61+
| Llama2 | EXL2 2.5 bpw | 70B | - | - | - | - | 30 t/s | **35** t/s |
62+
| TinyLlama | EXL2 3.0 bpw | 1.1B | - | - | - | - | 536 t/s | **635** t/s |
63+
| TinyLlama | EXL2 4.0 bpw | 1.1B | - | - | - | - | 509 t/s | **590** t/s |

docs/langchain_integration.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ Here, we use Vicuna as an example and use it for three endpoints: chat completio
1919
See a full list of supported models [here](../README.md#supported-models).
2020

2121
```bash
22-
python3 -m fastchat.serve.model_worker --model-names "gpt-3.5-turbo,text-davinci-003,text-embedding-ada-002" --model-path lmsys/vicuna-7b-v1.3
22+
python3 -m fastchat.serve.model_worker --model-names "gpt-3.5-turbo,text-davinci-003,text-embedding-ada-002" --model-path lmsys/vicuna-7b-v1.5
2323
```
2424

2525
Finally, launch the RESTful API server

docs/model_support.md

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,10 @@
55
- [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
66
- example: `python3 -m fastchat.serve.cli --model-path meta-llama/Llama-2-7b-chat-hf`
77
- Vicuna, Alpaca, LLaMA, Koala
8-
- example: `python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.3`
8+
- example: `python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5`
99
- [BAAI/AquilaChat-7B](https://huggingface.co/BAAI/AquilaChat-7B)
10+
- [BAAI/AquilaChat2-7B](https://huggingface.co/BAAI/AquilaChat2-7B)
11+
- [BAAI/AquilaChat2-34B](https://huggingface.co/BAAI/AquilaChat2-34B)
1012
- [BAAI/bge-large-en](https://huggingface.co/BAAI/bge-large-en#using-huggingface-transformers)
1113
- [baichuan-inc/baichuan-7B](https://huggingface.co/baichuan-inc/baichuan-7B)
1214
- [BlinkDL/RWKV-4-Raven](https://huggingface.co/BlinkDL/rwkv-4-raven)
@@ -30,6 +32,8 @@
3032
- [NousResearch/Nous-Hermes-13b](https://huggingface.co/NousResearch/Nous-Hermes-13b)
3133
- [openaccess-ai-collective/manticore-13b-chat-pyg](https://huggingface.co/openaccess-ai-collective/manticore-13b-chat-pyg)
3234
- [OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5](https://huggingface.co/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5)
35+
- [openchat/openchat_3.5](https://huggingface.co/openchat/openchat_3.5)
36+
- [Open-Orca/Mistral-7B-OpenOrca](https://huggingface.co/Open-Orca/Mistral-7B-OpenOrca)
3337
- [VMware/open-llama-7b-v2-open-instruct](https://huggingface.co/VMware/open-llama-7b-v2-open-instruct)
3438
- [Phind/Phind-CodeLlama-34B-v2](https://huggingface.co/Phind/Phind-CodeLlama-34B-v2)
3539
- [project-baize/baize-v2-7b](https://huggingface.co/project-baize/baize-v2-7b)
@@ -45,6 +49,11 @@
4549
- [WizardLM/WizardLM-13B-V1.0](https://huggingface.co/WizardLM/WizardLM-13B-V1.0)
4650
- [WizardLM/WizardCoder-15B-V1.0](https://huggingface.co/WizardLM/WizardCoder-15B-V1.0)
4751
- [HuggingFaceH4/starchat-beta](https://huggingface.co/HuggingFaceH4/starchat-beta)
52+
- [HuggingFaceH4/zephyr-7b-alpha](https://huggingface.co/HuggingFaceH4/zephyr-7b-alpha)
53+
- [Xwin-LM/Xwin-LM-7B-V0.1](https://huggingface.co/Xwin-LM/Xwin-LM-70B-V0.1)
54+
- [OpenLemur/lemur-70b-chat-v1](https://huggingface.co/OpenLemur/lemur-70b-chat-v1)
55+
- [allenai/tulu-2-dpo-7b](https://huggingface.co/allenai/tulu-2-dpo-7b)
56+
- [Microsoft/Orca-2-7b](https://huggingface.co/microsoft/Orca-2-7b)
4857
- Any [EleutherAI](https://huggingface.co/EleutherAI) pythia model such as [pythia-6.9b](https://huggingface.co/EleutherAI/pythia-6.9b)
4958
- Any [Peft](https://github.com/huggingface/peft) adapter trained on top of a
5059
model above. To activate, must have `peft` in the model path. Note: If
@@ -64,7 +73,7 @@ python3 -m fastchat.serve.cli --model [YOUR_MODEL_PATH]
6473
You can run this example command to learn the code logic.
6574

6675
```
67-
python3 -m fastchat.serve.cli --model lmsys/vicuna-7b-v1.3
76+
python3 -m fastchat.serve.cli --model lmsys/vicuna-7b-v1.5
6877
```
6978

7079
You can add `--debug` to see the actual prompt sent to the model.

0 commit comments

Comments
 (0)