Skip to content
This repository was archived by the owner on Mar 17, 2025. It is now read-only.

Commit a887de7

Browse files
renning22Tranglenathanstittmerrymercyleiwen83
authored
Merge 0922 (#6)
* Remove hardcode flash-attn disable setting (lm-sys#2342) * Document turning off proxy_buffering when api is streaming (lm-sys#2337) * Simplify huggingface api example (lm-sys#2355) * Update sponsor logos (lm-sys#2367) * if LOGDIR is empty, then don't try output log to local file (lm-sys#2357) Signed-off-by: Lei Wen <[email protected]> Co-authored-by: Lei Wen <[email protected]> * add best_of and use_beam_search for completions interface (lm-sys#2348) Signed-off-by: Lei Wen <[email protected]> Co-authored-by: Lei Wen <[email protected]> * Extract upvote/downvote from log files (lm-sys#2369) * Revert "add best_of and use_beam_search for completions interface" (lm-sys#2370) * Improve doc (lm-sys#2371) * add best_of and use_beam_search for completions interface (lm-sys#2372) Signed-off-by: Lei Wen <[email protected]> Co-authored-by: Lei Wen <[email protected]> * update monkey patch for llama2 (lm-sys#2379) * Make E5 adapter more restrict to reduce mismatch (lm-sys#2381) * Update UI and sponsers (lm-sys#2387) * Use fsdp api for save save (lm-sys#2390) * Release v0.2.27 * Spicyboros + airoboros 2.2 template update. (lm-sys#2392) Co-authored-by: Jon Durbin <[email protected]> * bugfix of openai_api_server for fastchat.serve.vllm_worker (lm-sys#2398) Co-authored-by: wuyongyu <[email protected]> * Revert "bugfix of openai_api_server for fastchat.serve.vllm_worker" (lm-sys#2400) * Revert "add best_of and use_beam_search for completions interface" (lm-sys#2401) * Release a v0.2.28 with bug fixes and more test cases * Fix model_worker error (lm-sys#2404) * Added google/flan models and fixed AutoModelForSeq2SeqLM when loading T5 compression model (lm-sys#2402) * Rename twitter to X (lm-sys#2406) * Update huggingface_api.py (lm-sys#2409) * Add support for baichuan2 models (lm-sys#2408) * Fixed character overlap issue when api streaming output (lm-sys#2431) * Support custom conversation template in multi_model_worker (lm-sys#2434) * Add Ascend NPU support (lm-sys#2422) * Add raw conversation template (lm-sys#2417) (lm-sys#2418) * Improve docs & UI (lm-sys#2436) * Fix Salesforce xgen inference (lm-sys#2350) * Add support for Phind-CodeLlama models (lm-sys#2415) (lm-sys#2416) Co-authored-by: Lianmin Zheng <[email protected]> * Add falcon 180B chat conversation template (lm-sys#2384) * Improve docs (lm-sys#2438) * add dtype and seed (lm-sys#2430) * Data cleaning scripts for dataset release (lm-sys#2440) * merge google/flan based adapters: T5Adapter, CodeT5pAdapter, FlanAdapter (lm-sys#2411) * Fix docs * Update UI (lm-sys#2446) * Add Optional SSL Support to controller.py (lm-sys#2448) * Format & Improve docs * Release v0.2.29 (lm-sys#2450) * Show terms of use as an JS alert (lm-sys#2461) * vllm worker awq quantization update (lm-sys#2463) Co-authored-by: 董晓龙 <[email protected]> * Fix falcon chat template (lm-sys#2464) --------- Signed-off-by: Lei Wen <[email protected]> Co-authored-by: Trangle <[email protected]> Co-authored-by: Nathan Stitt <[email protected]> Co-authored-by: Lianmin Zheng <[email protected]> Co-authored-by: leiwen83 <[email protected]> Co-authored-by: Lei Wen <[email protected]> Co-authored-by: Jon Durbin <[email protected]> Co-authored-by: Jon Durbin <[email protected]> Co-authored-by: Rayrtfr <[email protected]> Co-authored-by: wuyongyu <[email protected]> Co-authored-by: wangxiyuan <[email protected]> Co-authored-by: Jeff (Zhen) Wang <[email protected]> Co-authored-by: karshPrime <[email protected]> Co-authored-by: obitolyz <[email protected]> Co-authored-by: Shangwei Chen <[email protected]> Co-authored-by: HyungJin Ahn <[email protected]> Co-authored-by: zhangsibo1129 <[email protected]> Co-authored-by: Tobias Birchler <[email protected]> Co-authored-by: Jae-Won Chung <[email protected]> Co-authored-by: Mingdao Liu <[email protected]> Co-authored-by: Ying Sheng <[email protected]> Co-authored-by: Brandon Biggs <[email protected]> Co-authored-by: dongxiaolong <[email protected]> Co-authored-by: 董晓龙 <[email protected]>
1 parent 9b11481 commit a887de7

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

62 files changed

+1226
-673
lines changed

README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,10 @@ We are focused to support Llama2 at scale now. If you want any other models, ple
1616

1717
## Dev Log
1818

19+
### 2023-09
20+
21+
Sync upstream changes
22+
1923
### 2023-08
2024

2125
Support llama2 at scale.
@@ -37,4 +41,3 @@ Support "Llama-2-13b-chat-hf" and make it the default for API.
3741

3842
* API key database and rate limit enforcement
3943
* Deployable on Kubernetes
40-

docs/commands/leaderboard.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,5 +11,16 @@ python3 clean_battle_data.py
1111

1212
### Run Elo analysis
1313
```
14-
python3 elo_analysis.py --clean-battle-file clean_battle_20230523.json
14+
python3 elo_analysis.py --clean-battle-file clean_battle_20230905.json
15+
```
16+
17+
### Copy files to HF space
18+
1. update plots
19+
```
20+
scp atlas:/data/lmzheng/FastChat/fastchat/serve/monitor/elo_results_20230905.pkl .
21+
```
22+
23+
2. update table
24+
```
25+
wget https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard/raw/main/leaderboard_table_20230905.csv
1526
```

docs/commands/test_process.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,6 @@
1+
## Unit tests for FastChat
2+
The scripts are under [FastChat/tests](../../tests).
3+
14
### Test CLI Inference
25

36
```

docs/commands/webserver.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ cd fastchat_logs/server0
2727
export OPENAI_API_KEY=
2828
export ANTHROPIC_API_KEY=
2929
30-
python3 -m fastchat.serve.gradio_web_server_multi --controller http://localhost:21001 --concurrency 10 --add-chatgpt --add-claude --add-palm --anony-only --elo ~/elo_results/elo_results_20230802.pkl --leaderboard-table-file ~/elo_results/leaderboard_table_20230802.csv --register ~/elo_results/register_oai_models.json
30+
python3 -m fastchat.serve.gradio_web_server_multi --controller http://localhost:21001 --concurrency 10 --add-chatgpt --add-claude --add-palm --anony-only --elo ~/elo_results/elo_results.pkl --leaderboard-table-file ~/elo_results/leaderboard_table.csv --register ~/elo_results/register_oai_models.json --show-terms
3131
3232
python3 backup_logs.py
3333
```

docs/model_support.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,13 +31,15 @@
3131
- [openaccess-ai-collective/manticore-13b-chat-pyg](https://huggingface.co/openaccess-ai-collective/manticore-13b-chat-pyg)
3232
- [OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5](https://huggingface.co/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5)
3333
- [VMware/open-llama-7b-v2-open-instruct](https://huggingface.co/VMware/open-llama-7b-v2-open-instruct)
34+
- [Phind/Phind-CodeLlama-34B-v2](https://huggingface.co/Phind/Phind-CodeLlama-34B-v2)
3435
- [project-baize/baize-v2-7b](https://huggingface.co/project-baize/baize-v2-7b)
3536
- [Qwen/Qwen-7B-Chat](https://huggingface.co/Qwen/Qwen-7B-Chat)
3637
- [Salesforce/codet5p-6b](https://huggingface.co/Salesforce/codet5p-6b)
3738
- [StabilityAI/stablelm-tuned-alpha-7b](https://huggingface.co/stabilityai/stablelm-tuned-alpha-7b)
3839
- [THUDM/chatglm-6b](https://huggingface.co/THUDM/chatglm-6b)
3940
- [THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b)
4041
- [tiiuae/falcon-40b](https://huggingface.co/tiiuae/falcon-40b)
42+
- [tiiuae/falcon-180B-chat](https://huggingface.co/tiiuae/falcon-180B-chat)
4143
- [timdettmers/guanaco-33b-merged](https://huggingface.co/timdettmers/guanaco-33b-merged)
4244
- [togethercomputer/RedPajama-INCITE-7B-Chat](https://huggingface.co/togethercomputer/RedPajama-INCITE-7B-Chat)
4345
- [WizardLM/WizardLM-13B-V1.0](https://huggingface.co/WizardLM/WizardLM-13B-V1.0)
@@ -71,7 +73,7 @@ You can add `--debug` to see the actual prompt sent to the model.
7173

7274
FastChat uses the `Conversation` class to handle prompt templates and `BaseModelAdapter` class to handle model loading.
7375

74-
1. Implement a conversation template for the new model at [fastchat/conversation.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py). You can follow existing examples and use `register_conv_template` to add a new one.
76+
1. Implement a conversation template for the new model at [fastchat/conversation.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/conversation.py). You can follow existing examples and use `register_conv_template` to add a new one. Please also add a link to the official reference code if possible.
7577
2. Implement a model adapter for the new model at [fastchat/model/model_adapter.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/model/model_adapter.py). You can follow existing examples and use `register_model_adapter` to add a new one.
7678
3. (Optional) add the model name to the "Supported models" [section](#supported-models) above and add more information in [fastchat/model/model_registry.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/model/model_registry.py).
7779

docs/openai_api.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ completion = openai.ChatCompletion.create(
6262
print(completion.choices[0].message.content)
6363
```
6464

65-
Streaming is also supported. See [test_openai_api.py](../tests/test_openai_api.py).
65+
Streaming is also supported. See [test_openai_api.py](../tests/test_openai_api.py). If your api server is behind a proxy you'll need to turn off buffering, you can do so in Nginx by setting `proxy_buffering off;` in the location block for the proxy.
6666

6767
### cURL
6868
cURL is another good tool for observing the output of the api.

docs/training.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,3 +87,32 @@ deepspeed fastchat/train/train_lora_t5.py \
8787
--deepspeed playground/deepspeed_config_s2.json
8888

8989
```
90+
91+
### Fine-tuning Vicuna-7B with Local NPUs
92+
93+
You can use the following command to train Vicuna-7B with 8 x 910B (60GB). Use `--nproc_per_node` to specify the number of NPUs.
94+
```bash
95+
torchrun --nproc_per_node=8 --master_port=20001 fastchat/train/train.py \
96+
--model_name_or_path ~/vicuna-7b-v1.5-16k \
97+
--data_path data/dummy_conversation.json \
98+
--fp16 True \
99+
--output_dir output_vicuna \
100+
--num_train_epochs 3 \
101+
--per_device_train_batch_size 8 \
102+
--per_device_eval_batch_size 1 \
103+
--gradient_accumulation_steps 1 \
104+
--evaluation_strategy "no" \
105+
--save_strategy "steps" \
106+
--save_steps 1200 \
107+
--save_total_limit 10 \
108+
--learning_rate 2e-5 \
109+
--weight_decay 0. \
110+
--warmup_ratio 0.03 \
111+
--lr_scheduler_type "cosine" \
112+
--logging_steps 1 \
113+
--fsdp "full_shard auto_wrap" \
114+
--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
115+
--model_max_length 2048 \
116+
--gradient_checkpointing True \
117+
--lazy_preprocess True
118+
```

docs/vllm_integration.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,3 +18,8 @@ See the supported models [here](https://vllm.readthedocs.io/en/latest/models/sup
1818
```
1919
python3 -m fastchat.serve.vllm_worker --model-path lmsys/vicuna-7b-v1.3 --tokenizer hf-internal-testing/llama-tokenizer
2020
```
21+
22+
if you use a awq model, try
23+
'''
24+
python3 -m fastchat.serve.vllm_worker --model-path TheBloke/vicuna-7B-v1.5-AWQ --quantization awq
25+
'''

fastchat/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.2.26"
1+
__version__ = "0.2.29"

fastchat/constants.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
CONVERSATION_LIMIT_MSG = "YOU HAVE REACHED THE CONVERSATION LENGTH LIMIT. PLEASE CLEAR HISTORY AND START A NEW CONVERSATION."
1616
INACTIVE_MSG = "THIS SESSION HAS BEEN INACTIVE FOR TOO LONG. PLEASE REFRESH THIS PAGE."
1717
# Maximum input length
18-
INPUT_CHAR_LEN_LIMIT = int(os.getenv("FASTCHAT_INPUT_CHAR_LEN_LIMIT", 2560))
18+
INPUT_CHAR_LEN_LIMIT = int(os.getenv("FASTCHAT_INPUT_CHAR_LEN_LIMIT", 3072))
1919
# Maximum conversation turns
2020
CONVERSATION_TURN_LIMIT = 50
2121
# Session expiration time

0 commit comments

Comments
 (0)