Skip to content

Commit b9d4d15

Browse files
authored
Update gradio demo and API model providers (#3030)
1 parent 7c12409 commit b9d4d15

File tree

8 files changed

+150
-225
lines changed

8 files changed

+150
-225
lines changed

docs/arena.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,11 @@ We invite the entire community to join this benchmarking effort by contributing
55
## How to add a new model
66
If you want to see a specific model in the arena, you can follow the methods below.
77

8-
- Method 1: Hosted by LMSYS.
9-
1. Contribute the code to support this model in FastChat by submitting a pull request. See [instructions](model_support.md#how-to-support-a-new-model).
10-
2. After the model is supported, we will try to schedule some compute resources to host the model in the arena. However, due to the limited resources we have, we may not be able to serve every model. We will select the models based on popularity, quality, diversity, and other factors.
8+
### Method 1: Hosted by 3rd party API providers or yourself
9+
If you have a model hosted by a 3rd party API provider or yourself, please give us the access to an API endpoint.
10+
- We prefer OpenAI-compatible APIs, so we can reuse our [code](https://github.com/lm-sys/FastChat/blob/gradio/fastchat/serve/api_provider.py) for calling OpenAI models.
11+
- If you have your own API protocol, please follow the [instructions](model_support.md) to add them. Contribute your code by sending a pull request.
1112

12-
- Method 2: Hosted by 3rd party API providers or yourself.
13-
1. If you have a model hosted by a 3rd party API provider or yourself, please give us an API endpoint. We prefer OpenAI-compatible APIs, so we can reuse our [code](https://github.com/lm-sys/FastChat/blob/33dca5cf12ee602455bfa9b5f4790a07829a2db7/fastchat/serve/gradio_web_server.py#L333-L358) for calling OpenAI models.
14-
2. You can use FastChat's OpenAI API [server](openai_api.md) to serve your model with OpenAI-compatible APIs and provide us with the endpoint.
13+
### Method 2: Hosted by LMSYS
14+
1. Contribute the code to support this model in FastChat by submitting a pull request. See [instructions](model_support.md).
15+
2. After the model is supported, we will try to schedule some compute resources to host the model in the arena. However, due to the limited resources we have, we may not be able to serve every model. We will select the models based on popularity, quality, diversity, and other factors.

docs/model_support.md

Lines changed: 31 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,12 @@
11
# Model Support
2+
This document describes how to support a new model in FastChat.
23

3-
## How to support a new model
4+
## Content
5+
- [Local Models](#local-models)
6+
- [API-Based Models](#api-based-models)
47

5-
To support a new model in FastChat, you need to correctly handle its prompt template and model loading.
8+
## Local Models
9+
To support a new local model in FastChat, you need to correctly handle its prompt template and model loading.
610
The goal is to make the following command run with the correct prompts.
711

812
```
@@ -27,32 +31,7 @@ FastChat uses the `Conversation` class to handle prompt templates and `BaseModel
2731

2832
After these steps, the new model should be compatible with most FastChat features, such as CLI, web UI, model worker, and OpenAI-compatible API server. Please do some testing with these features as well.
2933

30-
### API-based model
31-
32-
For API-based model, you still need to follow the above steps to implement conversation template, adapter, and register the model. In addition, you need to
33-
1. Implement an API-based streaming token generator in [fastchat/serve/api_provider.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/serve/api_provider.py)
34-
2. Specify your endpoint info in a JSON configuration file
35-
```
36-
{
37-
"gpt-3.5-turbo-0613": {
38-
"model_name": "gpt-3.5-turbo-0613",
39-
"api_base": "https://api.openai.com/v1",
40-
"api_key": "XXX",
41-
"api_type": "openai"
42-
}
43-
}
44-
```
45-
3. Invoke your API generator in `bot_response` of [fastchat/serve/gradio_web_server.py](https://github.com/lm-sys/FastChat/blob/22642048eeb2f1f06eb1c4e0490d802e91e62473/fastchat/serve/gradio_web_server.py#L427) accordingly.
46-
4. Launch the gradio web server with argument `--register [JSON-file]`.
47-
```
48-
python3 -m fastchat.serve.gradio_web_server --register [JSON-file]
49-
```
50-
You should be able to chat with your API-based model!
51-
52-
Currently, FastChat supports OpenAI, Anthropic, Google Vertex AI, Mistral, and Nvidia NGC.
53-
54-
55-
## Supported models
34+
### Supported models
5635

5736
- [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
5837
- example: `python3 -m fastchat.serve.cli --model-path meta-llama/Llama-2-7b-chat-hf`
@@ -121,3 +100,27 @@ Currently, FastChat supports OpenAI, Anthropic, Google Vertex AI, Mistral, and N
121100
setting the environment variable `PEFT_SHARE_BASE_WEIGHTS=true` in any model
122101
worker.
123102

103+
## API-Based Models
104+
1. Implement an API-based streaming generator in [fastchat/serve/api_provider.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/serve/api_provider.py). You can learn from the OpenAI example.
105+
2. Specify your endpoint info in a JSON configuration file
106+
```
107+
{
108+
"gpt-3.5-turbo-0613": {
109+
"model_name": "gpt-3.5-turbo-0613",
110+
"api_type": "openai",
111+
"api_base": "https://api.openai.com/v1",
112+
"api_key": "sk-******",
113+
"anony_only": false
114+
}
115+
}
116+
```
117+
- "api_type" can be one of the following: openai, anthropic, gemini, mistral. For you own API, you can add a new type and implement it.
118+
- "anony_only" means whether to show this model in anonymous mode only.
119+
3. Launch the gradio web server with argument `--register [JSON-file]`.
120+
121+
```
122+
python3 -m fastchat.serve.gradio_web_server --controller "" --share --register [JSON-file]
123+
```
124+
125+
You should be able to chat with your API-based model!
126+
Currently, FastChat supports OpenAI, Anthropic, Google Vertex AI, Mistral, and Nvidia NGC.

fastchat/serve/api_provider.py

Lines changed: 79 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,93 @@
11
"""Call API providers."""
22

3-
from json import loads
4-
import os
5-
63
import json
4+
import os
75
import random
8-
import requests
96
import time
107

8+
import requests
9+
1110
from fastchat.utils import build_logger
12-
from fastchat.constants import WORKER_API_TIMEOUT
1311

1412

1513
logger = build_logger("gradio_web_server", "gradio_web_server.log")
1614

1715

16+
def get_api_provider_stream_iter(
17+
conv,
18+
model_name,
19+
model_api_dict,
20+
temperature,
21+
top_p,
22+
max_new_tokens,
23+
):
24+
if model_api_dict["api_type"] == "openai":
25+
prompt = conv.to_openai_api_messages()
26+
stream_iter = openai_api_stream_iter(
27+
model_api_dict["model_name"],
28+
prompt,
29+
temperature,
30+
top_p,
31+
max_new_tokens,
32+
api_base=model_api_dict["api_base"],
33+
api_key=model_api_dict["api_key"],
34+
)
35+
elif model_api_dict["api_type"] == "anthropic":
36+
prompt = conv.get_prompt()
37+
stream_iter = anthropic_api_stream_iter(
38+
model_name, prompt, temperature, top_p, max_new_tokens
39+
)
40+
elif model_api_dict["api_type"] == "gemini":
41+
stream_iter = gemini_api_stream_iter(
42+
model_api_dict["model_name"],
43+
conv,
44+
temperature,
45+
top_p,
46+
max_new_tokens,
47+
api_key=model_api_dict["api_key"],
48+
)
49+
elif model_api_dict["api_type"] == "bard":
50+
prompt = conv.to_openai_api_messages()
51+
stream_iter = bard_api_stream_iter(
52+
model_api_dict["model_name"],
53+
prompt,
54+
temperature,
55+
top_p,
56+
api_key=model_api_dict["api_key"],
57+
)
58+
elif model_api_dict["api_type"] == "mistral":
59+
prompt = conv.to_openai_api_messages()
60+
stream_iter = mistral_api_stream_iter(
61+
model_name, prompt, temperature, top_p, max_new_tokens
62+
)
63+
elif model_api_dict["api_type"] == "nvidia":
64+
prompt = conv.to_openai_api_messages()
65+
stream_iter = nvidia_api_stream_iter(
66+
model_name,
67+
prompt,
68+
temperature,
69+
top_p,
70+
max_new_tokens,
71+
model_api_dict["api_base"],
72+
)
73+
elif model_api_dict["api_type"] == "ai2":
74+
prompt = conv.to_openai_api_messages()
75+
stream_iter = ai2_api_stream_iter(
76+
model_name,
77+
model_api_dict["model_name"],
78+
prompt,
79+
temperature,
80+
top_p,
81+
max_new_tokens,
82+
api_base=model_api_dict["api_base"],
83+
api_key=model_api_dict["api_key"],
84+
)
85+
else:
86+
raise NotImplementedError()
87+
88+
return stream_iter
89+
90+
1891
def openai_api_stream_iter(
1992
model_name,
2093
messages,
@@ -111,65 +184,6 @@ def anthropic_api_stream_iter(model_name, prompt, temperature, top_p, max_new_to
111184
yield data
112185

113186

114-
def init_palm_chat(model_name):
115-
import vertexai # pip3 install google-cloud-aiplatform
116-
from vertexai.preview.language_models import ChatModel
117-
from vertexai.preview.generative_models import GenerativeModel
118-
119-
project_id = os.environ["GCP_PROJECT_ID"]
120-
location = "us-central1"
121-
vertexai.init(project=project_id, location=location)
122-
123-
if model_name in ["palm-2"]:
124-
# According to release note, "chat-bison@001" is PaLM 2 for chat.
125-
# https://cloud.google.com/vertex-ai/docs/release-notes#May_10_2023
126-
model_name = "chat-bison@001"
127-
chat_model = ChatModel.from_pretrained(model_name)
128-
chat = chat_model.start_chat(examples=[])
129-
elif model_name in ["gemini-pro"]:
130-
model = GenerativeModel(model_name)
131-
chat = model.start_chat()
132-
return chat
133-
134-
135-
def palm_api_stream_iter(model_name, chat, message, temperature, top_p, max_new_tokens):
136-
if model_name in ["gemini-pro"]:
137-
max_new_tokens = max_new_tokens * 2
138-
parameters = {
139-
"temperature": temperature,
140-
"top_p": top_p,
141-
"max_output_tokens": max_new_tokens,
142-
}
143-
gen_params = {
144-
"model": model_name,
145-
"prompt": message,
146-
}
147-
gen_params.update(parameters)
148-
if model_name == "palm-2":
149-
response = chat.send_message(message, **parameters)
150-
else:
151-
response = chat.send_message(message, generation_config=parameters, stream=True)
152-
153-
logger.info(f"==== request ====\n{gen_params}")
154-
155-
try:
156-
text = ""
157-
for chunk in response:
158-
text += chunk.text
159-
data = {
160-
"text": text,
161-
"error_code": 0,
162-
}
163-
yield data
164-
except Exception as e:
165-
logger.error(f"==== error ====\n{e}")
166-
yield {
167-
"text": f"**API REQUEST ERROR** Reason: {e}\nPlease try again or increase the number of max tokens.",
168-
"error_code": 1,
169-
}
170-
yield data
171-
172-
173187
def gemini_api_stream_iter(
174188
model_name, conv, temperature, top_p, max_new_tokens, api_key=None
175189
):
@@ -353,7 +367,7 @@ def ai2_api_stream_iter(
353367
text = ""
354368
for line in res.iter_lines():
355369
if line:
356-
part = loads(line)
370+
part = json.loads(line)
357371
if "result" in part and "output" in part["result"]:
358372
for t in part["result"]["output"]["text"]:
359373
text += t

fastchat/serve/gradio_block_arena_anony.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,6 @@
2727
disable_btn,
2828
invisible_btn,
2929
acknowledgment_md,
30-
ip_expiration_dict,
3130
get_ip,
3231
get_model_description_md,
3332
)
@@ -630,7 +629,6 @@ def build_side_by_side_ui_anony(models):
630629
Find out who is the 🥇LLM Champion!
631630
632631
## 👇 Chat now!
633-
634632
"""
635633

636634
states = [gr.State() for _ in range(num_sides)]
@@ -640,7 +638,9 @@ def build_side_by_side_ui_anony(models):
640638
gr.Markdown(notice_markdown, elem_id="notice_markdown")
641639

642640
with gr.Group(elem_id="share-region-anony"):
643-
with gr.Accordion("🔍 Expand to see 20+ Arena players", open=False):
641+
with gr.Accordion(
642+
f"🔍 Expand to see the descriptions of {len(models)} models", open=False
643+
):
644644
model_description_md = get_model_description_md(models)
645645
gr.Markdown(model_description_md, elem_id="model_description_markdown")
646646
with gr.Row():

fastchat/serve/gradio_block_arena_named.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,8 @@
2525
disable_btn,
2626
invisible_btn,
2727
acknowledgment_md,
28-
get_model_description_md,
29-
ip_expiration_dict,
3028
get_ip,
29+
get_model_description_md,
3130
)
3231
from fastchat.utils import (
3332
build_logger,
@@ -307,7 +306,9 @@ def build_side_by_side_ui_named(models):
307306
container=False,
308307
)
309308
with gr.Row():
310-
with gr.Accordion("🔍 Expand to see 20+ model descriptions", open=False):
309+
with gr.Accordion(
310+
f"🔍 Expand to see the descriptions of {len(models)} models", open=False
311+
):
311312
model_description_md = get_model_description_md(models)
312313
gr.Markdown(model_description_md, elem_id="model_description_markdown")
313314

0 commit comments

Comments
 (0)