lm-sys
diff --git a/‎docs/arena.md
Lines changed: 7 additions & 6 deletions b/‎docs/arena.md
Lines changed: 7 additions & 6 deletions
diff --git a/‎docs/model_support.md
Lines changed: 31 additions & 28 deletions b/‎docs/model_support.md
Lines changed: 31 additions & 28 deletions
diff --git a/‎fastchat/serve/api_provider.py
Lines changed: 79 additions & 65 deletions b/‎fastchat/serve/api_provider.py
Lines changed: 79 additions & 65 deletions
diff --git a/‎fastchat/serve/gradio_block_arena_anony.py
Lines changed: 3 additions & 3 deletions b/‎fastchat/serve/gradio_block_arena_anony.py
Lines changed: 3 additions & 3 deletions
diff --git a/‎fastchat/serve/gradio_block_arena_named.py
Lines changed: 4 additions & 3 deletions b/‎fastchat/serve/gradio_block_arena_named.py
Lines changed: 4 additions & 3 deletions
@@ -5,10 +5,11 @@ We invite the entire community to join this benchmarking effort by contributing
 ## How to add a new model
 If you want to see a specific model in the arena, you can follow the methods below.
 
-- Method 1: Hosted by LMSYS.
-  1. Contribute the code to support this model in FastChat by submitting a pull request. See [instructions](model_support.md#how-to-support-a-new-model).
-  2. After the model is supported, we will try to schedule some compute resources to host the model in the arena. However, due to the limited resources we have, we may not be able to serve every model. We will select the models based on popularity, quality, diversity, and other factors.
+### Method 1: Hosted by 3rd party API providers or yourself
+If you have a model hosted by a 3rd party API provider or yourself, please give us the access to an API endpoint.
+  - We prefer OpenAI-compatible APIs, so we can reuse our [code](https://github.com/lm-sys/FastChat/blob/gradio/fastchat/serve/api_provider.py) for calling OpenAI models.
+  - If you have your own API protocol, please follow the [instructions](model_support.md) to add them. Contribute your code by sending a pull request.
 
-- Method 2: Hosted by 3rd party API providers or yourself.
-  1. If you have a model hosted by a 3rd party API provider or yourself, please give us an API endpoint. We prefer OpenAI-compatible APIs, so we can reuse our [code](https://github.com/lm-sys/FastChat/blob/33dca5cf12ee602455bfa9b5f4790a07829a2db7/fastchat/serve/gradio_web_server.py#L333-L358) for calling OpenAI models.
-  2. You can use FastChat's OpenAI API [server](openai_api.md) to serve your model with OpenAI-compatible APIs and provide us with the endpoint.
+### Method 2: Hosted by LMSYS
+1. Contribute the code to support this model in FastChat by submitting a pull request. See [instructions](model_support.md).
+2. After the model is supported, we will try to schedule some compute resources to host the model in the arena. However, due to the limited resources we have, we may not be able to serve every model. We will select the models based on popularity, quality, diversity, and other factors.
@@ -1,8 +1,12 @@
 # Model Support
+This document describes how to support a new model in FastChat.
 
-## How to support a new model
+## Content
+- [Local Models](#local-models)
+- [API-Based Models](#api-based-models)
 
-To support a new model in FastChat, you need to correctly handle its prompt template and model loading.
+## Local Models
+To support a new local model in FastChat, you need to correctly handle its prompt template and model loading.
 The goal is to make the following command run with the correct prompts.
 
 ```
@@ -27,32 +31,7 @@ FastChat uses the `Conversation` class to handle prompt templates and `BaseModel
 
 After these steps, the new model should be compatible with most FastChat features, such as CLI, web UI, model worker, and OpenAI-compatible API server. Please do some testing with these features as well.
 
-### API-based model
-
-For API-based model, you still need to follow the above steps to implement conversation template, adapter, and register the model. In addition, you need to
-1. Implement an API-based streaming token generator in [fastchat/serve/api_provider.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/serve/api_provider.py)
-2. Specify your endpoint info in a JSON configuration file
-```
-{
-  "gpt-3.5-turbo-0613": {
-    "model_name": "gpt-3.5-turbo-0613",
-    "api_base": "https://api.openai.com/v1",
-    "api_key": "XXX",
-    "api_type": "openai"
-  }
-}
-```
-3. Invoke your API generator in `bot_response` of [fastchat/serve/gradio_web_server.py](https://github.com/lm-sys/FastChat/blob/22642048eeb2f1f06eb1c4e0490d802e91e62473/fastchat/serve/gradio_web_server.py#L427) accordingly.
-4. Launch the gradio web server with argument `--register [JSON-file]`.
-```
-python3 -m fastchat.serve.gradio_web_server --register [JSON-file]
-```
-You should be able to chat with your API-based model!
-
-Currently, FastChat supports OpenAI, Anthropic, Google Vertex AI, Mistral, and Nvidia NGC.
-
-
-## Supported models
+### Supported models
 
 - [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
   - example: `python3 -m fastchat.serve.cli --model-path meta-llama/Llama-2-7b-chat-hf`
@@ -121,3 +100,27 @@ Currently, FastChat supports OpenAI, Anthropic, Google Vertex AI, Mistral, and N
   setting the environment variable `PEFT_SHARE_BASE_WEIGHTS=true` in any model
   worker.
 
+## API-Based Models
+1. Implement an API-based streaming generator in [fastchat/serve/api_provider.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/serve/api_provider.py). You can learn from the OpenAI example.
+2. Specify your endpoint info in a JSON configuration file
+```
+{
+  "gpt-3.5-turbo-0613": {
+    "model_name": "gpt-3.5-turbo-0613",
+    "api_type": "openai",
+    "api_base": "https://api.openai.com/v1",
+    "api_key": "sk-******",
+    "anony_only": false
+  }
+}
+```
+  - "api_type" can be one of the following: openai, anthropic, gemini, mistral. For you own API, you can add a new type and implement it.
+  - "anony_only" means whether to show this model in anonymous mode only.
+3. Launch the gradio web server with argument `--register [JSON-file]`.
+
+```
+python3 -m fastchat.serve.gradio_web_server --controller "" --share --register [JSON-file]
+```
+
+You should be able to chat with your API-based model!
+Currently, FastChat supports OpenAI, Anthropic, Google Vertex AI, Mistral, and Nvidia NGC.
@@ -1,20 +1,93 @@
 """Call API providers."""
 
-from json import loads
-import os
-
 import json
+import os
 import random
-import requests
 import time
 
+import requests
+
 from fastchat.utils import build_logger
-from fastchat.constants import WORKER_API_TIMEOUT
 
 
 logger = build_logger("gradio_web_server", "gradio_web_server.log")
 
 
+def get_api_provider_stream_iter(
+    conv,
+    model_name,
+    model_api_dict,
+    temperature,
+    top_p,
+    max_new_tokens,
+):
+    if model_api_dict["api_type"] == "openai":
+        prompt = conv.to_openai_api_messages()
+        stream_iter = openai_api_stream_iter(
+            model_api_dict["model_name"],
+            prompt,
+            temperature,
+            top_p,
+            max_new_tokens,
+            api_base=model_api_dict["api_base"],
+            api_key=model_api_dict["api_key"],
+        )
+    elif model_api_dict["api_type"] == "anthropic":
+        prompt = conv.get_prompt()
+        stream_iter = anthropic_api_stream_iter(
+            model_name, prompt, temperature, top_p, max_new_tokens
+        )
+    elif model_api_dict["api_type"] == "gemini":
+        stream_iter = gemini_api_stream_iter(
+            model_api_dict["model_name"],
+            conv,
+            temperature,
+            top_p,
+            max_new_tokens,
+            api_key=model_api_dict["api_key"],
+        )
+    elif model_api_dict["api_type"] == "bard":
+        prompt = conv.to_openai_api_messages()
+        stream_iter = bard_api_stream_iter(
+            model_api_dict["model_name"],
+            prompt,
+            temperature,
+            top_p,
+            api_key=model_api_dict["api_key"],
+        )
+    elif model_api_dict["api_type"] == "mistral":
+        prompt = conv.to_openai_api_messages()
+        stream_iter = mistral_api_stream_iter(
+            model_name, prompt, temperature, top_p, max_new_tokens
+        )
+    elif model_api_dict["api_type"] == "nvidia":
+        prompt = conv.to_openai_api_messages()
+        stream_iter = nvidia_api_stream_iter(
+            model_name,
+            prompt,
+            temperature,
+            top_p,
+            max_new_tokens,
+            model_api_dict["api_base"],
+        )
+    elif model_api_dict["api_type"] == "ai2":
+        prompt = conv.to_openai_api_messages()
+        stream_iter = ai2_api_stream_iter(
+            model_name,
+            model_api_dict["model_name"],
+            prompt,
+            temperature,
+            top_p,
+            max_new_tokens,
+            api_base=model_api_dict["api_base"],
+            api_key=model_api_dict["api_key"],
+        )
+    else:
+        raise NotImplementedError()
+
+    return stream_iter
+
+
 def openai_api_stream_iter(
     model_name,
     messages,
@@ -111,65 +184,6 @@ def anthropic_api_stream_iter(model_name, prompt, temperature, top_p, max_new_to
         yield data
 
 
-def init_palm_chat(model_name):
-    import vertexai  # pip3 install google-cloud-aiplatform
-    from vertexai.preview.language_models import ChatModel
-    from vertexai.preview.generative_models import GenerativeModel
-
-    project_id = os.environ["GCP_PROJECT_ID"]
-    location = "us-central1"
-    vertexai.init(project=project_id, location=location)
-
-    if model_name in ["palm-2"]:
-        # According to release note, "chat-bison@001" is PaLM 2 for chat.
-        # https://cloud.google.com/vertex-ai/docs/release-notes#May_10_2023
-        model_name = "chat-bison@001"
-        chat_model = ChatModel.from_pretrained(model_name)
-        chat = chat_model.start_chat(examples=[])
-    elif model_name in ["gemini-pro"]:
-        model = GenerativeModel(model_name)
-        chat = model.start_chat()
-    return chat
-
-
-def palm_api_stream_iter(model_name, chat, message, temperature, top_p, max_new_tokens):
-    if model_name in ["gemini-pro"]:
-        max_new_tokens = max_new_tokens * 2
-    parameters = {
-        "temperature": temperature,
-        "top_p": top_p,
-        "max_output_tokens": max_new_tokens,
-    }
-    gen_params = {
-        "model": model_name,
-        "prompt": message,
-    }
-    gen_params.update(parameters)
-    if model_name == "palm-2":
-        response = chat.send_message(message, **parameters)
-    else:
-        response = chat.send_message(message, generation_config=parameters, stream=True)
-
-    logger.info(f"==== request ====\n{gen_params}")
-
-    try:
-        text = ""
-        for chunk in response:
-            text += chunk.text
-            data = {
-                "text": text,
-                "error_code": 0,
-            }
-            yield data
-    except Exception as e:
-        logger.error(f"==== error ====\n{e}")
-        yield {
-            "text": f"**API REQUEST ERROR** Reason: {e}\nPlease try again or increase the number of max tokens.",
-            "error_code": 1,
-        }
-        yield data
-
-
 def gemini_api_stream_iter(
     model_name, conv, temperature, top_p, max_new_tokens, api_key=None
 ):
@@ -353,7 +367,7 @@ def ai2_api_stream_iter(
     text = ""
     for line in res.iter_lines():
         if line:
-            part = loads(line)
+            part = json.loads(line)
             if "result" in part and "output" in part["result"]:
                 for t in part["result"]["output"]["text"]:
                     text += t
 
@@ -27,7 +27,6 @@
     disable_btn,
     invisible_btn,
     acknowledgment_md,
-    ip_expiration_dict,
     get_ip,
     get_model_description_md,
 )
@@ -630,7 +629,6 @@ def build_side_by_side_ui_anony(models):
 Find out who is the 🥇LLM Champion!
 
 ## 👇 Chat now!
-
 """
 
     states = [gr.State() for _ in range(num_sides)]
@@ -640,7 +638,9 @@ def build_side_by_side_ui_anony(models):
     gr.Markdown(notice_markdown, elem_id="notice_markdown")
 
     with gr.Group(elem_id="share-region-anony"):
-        with gr.Accordion("🔍 Expand to see 20+ Arena players", open=False):
+        with gr.Accordion(
+            f"🔍 Expand to see the descriptions of {len(models)} models", open=False
+        ):
             model_description_md = get_model_description_md(models)
             gr.Markdown(model_description_md, elem_id="model_description_markdown")
         with gr.Row():
 
@@ -25,9 +25,8 @@
     disable_btn,
     invisible_btn,
     acknowledgment_md,
-    get_model_description_md,
-    ip_expiration_dict,
     get_ip,
+    get_model_description_md,
 )
 from fastchat.utils import (
     build_logger,
@@ -307,7 +306,9 @@ def build_side_by_side_ui_named(models):
                         container=False,
                     )
         with gr.Row():
-            with gr.Accordion("🔍 Expand to see 20+ model descriptions", open=False):
+            with gr.Accordion(
+                f"🔍 Expand to see the descriptions of {len(models)} models", open=False
+            ):
                 model_description_md = get_model_description_md(models)
                 gr.Markdown(model_description_md, elem_id="model_description_markdown")