fix: resolve blocking issue with minimax responses in ai-proxy #1663

hanxiantao · 2025-01-12T09:15:41Z

Ⅰ. Describe what this PR did

修复 ai-proxy 中 minimax 响应阻塞的问题（主要是 TransformResponseHeaders 方法中没有删除 Content-Length 的响应头）
由于 minimax v2 API 和 OpenAi 兼容，代码里只处理 minimax Pro API 的请求和响应，修改对应结构体的名称

Ⅱ. Does this pull request fix one issue?

Ⅲ. Why don't you add test cases (unit test/integration test)?

Ⅳ. Describe how to verify it

docker-compose.yaml

version: '3.7'
services:
  envoy:
    image: higress-registry.cn-hangzhou.cr.aliyuncs.com/higress/gateway:1.4.0-rc.1
    entrypoint: /usr/local/bin/envoy
    # 注意这里对wasm开启了debug级别日志，正式部署时则默认info级别
    command: -c /etc/envoy/envoy.yaml --component-log-level wasm:debug
    depends_on:
      - httpbin
    networks:
      - wasmtest
    ports:
      - "10000:10000"
    volumes:
      - ./envoy.yaml:/etc/envoy/envoy.yaml
      - ./plugin.wasm:/etc/envoy/plugin.wasm
  httpbin:
    image: kennethreitz/httpbin:latest
    networks:
      - wasmtest
    ports:
      - "12345:80"
networks:
  wasmtest: {}

使用OpenAI协议代理minimax chat completion V2 API

envoy.yaml

admin:
  address:
    socket_address:
      protocol: TCP
      address: 0.0.0.0
      port_value: 9901
static_resources:
  listeners:
    - name: listener_0
      address:
        socket_address:
          protocol: TCP
          address: 0.0.0.0
          port_value: 10000
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                scheme_header_transformation:
                  scheme_to_overwrite: https
                stat_prefix: ingress_http
                # Output envoy logs to stdout
                access_log:
                  - name: envoy.access_loggers.stdout
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
                # Modify as required
                route_config:
                  name: local_route
                  virtual_hosts:
                    - name: local_service
                      domains: [ "*" ]
                      routes:
                        - match:
                            prefix: "/"
                          route:
                            cluster: minimax
                            timeout: 300s
                http_filters:
                  - name: wasmtest
                    typed_config:
                      "@type": type.googleapis.com/udpa.type.v1.TypedStruct
                      type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
                      value:
                        config:
                          name: wasmtest
                          vm_config:
                            runtime: envoy.wasm.runtime.v8
                            code:
                              local:
                                filename: /etc/envoy/plugin.wasm
                          configuration:
                            "@type": "type.googleapis.com/google.protobuf.StringValue"
                            value: |
                              {
                                  "provider": {
                                    "type": "minimax",
                                    "apiTokens": [
                                      "YOUR_MINIMAX_API_TOKEN"
                                    ],
                                    "modelMapping": {
                                      "gpt-3": "abab6.5s-chat",
                                      "gpt-4": "abab6.5g-chat",
                                      "*": "abab6.5t-chat"
                                    },
                                    "protocol": "openai"
                                  }
                              }
                  - name: envoy.filters.http.router
  clusters:
    - name: httpbin
      connect_timeout: 30s
      type: LOGICAL_DNS
      # Comment out the following line to test on v6 networks
      dns_lookup_family: V4_ONLY
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: httpbin
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: httpbin
                      port_value: 80
    - name: minimax
      connect_timeout: 30s
      type: LOGICAL_DNS
      dns_lookup_family: V4_ONLY
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: minimax
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: api.minimax.chat
                      port_value: 443
      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
          "sni": "api.minimax.chat"

非流式请求

curl --location --request POST 'http://localhost:10000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data-raw '{
    "model": "gpt-3",
    "messages": [
        {
            "role": "user",
            "content": "你好，你是谁？"
        }
    ],
    "stream": false
}'

响应：

{
    "id": "03d2b8bb5aa07fbaa6237f4a68816a8a",
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "content": "你好！我是一个由MiniMax公司研发的大型语言模型，名为MM智能助理。我被设计用来回答问题、提供信息、帮助解决问题以及执行各种语言处理任务。如果你有任何问题或需要帮助，请随时告诉我！",
                "role": "assistant",
                "name": "MM智能助理",
                "audio_content": ""
            }
        }
    ],
    "created": 1736672699,
    "model": "abab6.5s-chat",
    "object": "chat.completion",
    "usage": {
        "total_tokens": 118,
        "total_characters": 0,
        "prompt_tokens": 70,
        "completion_tokens": 48
    },
    "input_sensitive": false,
    "output_sensitive": false,
    "input_sensitive_type": 0,
    "output_sensitive_type": 0,
    "output_sensitive_int": 0,
    "base_resp": {
        "status_code": 0,
        "status_msg": ""
    }
}

流式请求

curl -X POST 'http://localhost:10000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-d '{
    "model": "gpt-3",
    "messages": [
        {
            "role": "user",
            "content": "你好，你是谁？"
        }
    ],
    "stream": true
}'

响应：

data: {"id":"03d2b8d7eb548eb0f9a7d3119b85815c","choices":[{"index":0,"delta":{"content":"你好","role":"assistant","name":"MM智能助理","audio_content":""}}],"created":1736672727,"model":"abab6.5s-chat","object":"chat.completion.chunk","usage":{"total_tokens":0,"total_characters":0},"input_sensitive":false,"output_sensitive":false,"input_sensitive_type":0,"output_sensitive_type":0,"output_sensitive_int":0}

data: {"id":"03d2b8d7eb548eb0f9a7d3119b85815c","choices":[{"index":0,"delta":{"content":"，我是一个由MiniMax公司研发的大型语言模型，名为MM智能助理。我可以帮助","role":"assistant","name":"MM智能助理","audio_content":""}}],"created":1736672727,"model":"abab6.5s-chat","object":"chat.completion.chunk","usage":{"total_tokens":0,"total_characters":0},"input_sensitive":false,"output_sensitive":false,"input_sensitive_type":0,"output_sensitive_type":0,"output_sensitive_int":0}

data: {"id":"03d2b8d7eb548eb0f9a7d3119b85815c","choices":[{"index":0,"delta":{"content":"回答问题、提供信息、进行对话和执行多种语言处理任务。如果你有任何问题或需要","role":"assistant","name":"MM智能助理","audio_content":""}}],"created":1736672727,"model":"abab6.5s-chat","object":"chat.completion.chunk","usage":{"total_tokens":0,"total_characters":0},"input_sensitive":false,"output_sensitive":false,"input_sensitive_type":0,"output_sensitive_type":0,"output_sensitive_int":0}

data: {"id":"03d2b8d7eb548eb0f9a7d3119b85815c","choices":[{"finish_reason":"stop","index":0,"delta":{"content":"帮助，请随时告诉我！","role":"assistant","name":"MM智能助理","audio_content":""}}],"created":1736672727,"model":"abab6.5s-chat","object":"chat.completion.chunk","usage":{"total_tokens":0,"total_characters":0},"input_sensitive":false,"output_sensitive":false,"input_sensitive_type":0,"output_sensitive_type":0,"output_sensitive_int":0}

data: {"id":"03d2b8d7eb548eb0f9a7d3119b85815c","choices":[{"finish_reason":"stop","index":0,"message":{"content":"你好，我是一个由MiniMax公司研发的大型语言模型，名为MM智能助理。我可以帮助回答问题、提供信息、进行对话和执行多种语言处理任务。如果你有任何问题或需要帮助，请随时告诉我！","role":"assistant","name":"MM智能助理","audio_content":""}}],"created":1736672727,"model":"abab6.5s-chat","object":"chat.completion","usage":{"total_tokens":116,"total_characters":0,"prompt_tokens":70,"completion_tokens":46},"input_sensitive":false,"output_sensitive":false,"input_sensitive_type":0,"output_sensitive_type":0,"output_sensitive_int":0,"base_resp":{"status_code":0,"status_msg":""}}

使用OpenAI协议代理chat completion Pro API

envoy.yaml

admin:
  address:
    socket_address:
      protocol: TCP
      address: 0.0.0.0
      port_value: 9901
static_resources:
  listeners:
    - name: listener_0
      address:
        socket_address:
          protocol: TCP
          address: 0.0.0.0
          port_value: 10000
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                scheme_header_transformation:
                  scheme_to_overwrite: https
                stat_prefix: ingress_http
                # Output envoy logs to stdout
                access_log:
                  - name: envoy.access_loggers.stdout
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
                # Modify as required
                route_config:
                  name: local_route
                  virtual_hosts:
                    - name: local_service
                      domains: [ "*" ]
                      routes:
                        - match:
                            prefix: "/"
                          route:
                            cluster: minimax
                            timeout: 300s
                http_filters:
                  - name: wasmtest
                    typed_config:
                      "@type": type.googleapis.com/udpa.type.v1.TypedStruct
                      type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
                      value:
                        config:
                          name: wasmtest
                          vm_config:
                            runtime: envoy.wasm.runtime.v8
                            code:
                              local:
                                filename: /etc/envoy/plugin.wasm
                          configuration:
                            "@type": "type.googleapis.com/google.protobuf.StringValue"
                            value: |
                              {
                                  "provider": {
                                    "type": "minimax",
                                    "apiTokens": [
                                      "YOUR_MINIMAX_API_TOKEN"
                                    ],
                                    "modelMapping": {
                                      "gpt-3": "abab6.5s-chat",
                                      "gpt-4": "abab6.5g-chat",
                                      "*": "abab6.5t-chat"
                                    },
                                    "protocol": "openai",
                                    "minimaxApiType": "pro",
                                    "minimaxGroupId": "YOUR_MINIMAX_GROUP_ID"
                                  }
                              }
                  - name: envoy.filters.http.router
  clusters:
    - name: httpbin
      connect_timeout: 30s
      type: LOGICAL_DNS
      # Comment out the following line to test on v6 networks
      dns_lookup_family: V4_ONLY
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: httpbin
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: httpbin
                      port_value: 80
    - name: minimax
      connect_timeout: 30s
      type: LOGICAL_DNS
      dns_lookup_family: V4_ONLY
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: minimax
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: api.minimax.chat
                      port_value: 443
      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
          "sni": "api.minimax.chat"

非流式请求

curl --location --request POST 'http://localhost:10000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data-raw '{
    "model": "gpt-3",
    "messages": [
        {
            "role": "user",
            "content": "你好，你是谁？"
        }
    ],
    "stream": false
}'

响应：

{
    "id": "03d2b8ffe2cc508c3b951c2cf8df8874",
    "choices": [
        {
            "index": 0,
            "message": {
                "name": "MM智能助理",
                "role": "assistant",
                "content": "你好！我是一个基于大型语言模型的虚拟助手，由MiniMax公司研发。我的设计旨在通过自然语言处理和机器学习技术来理解和生成文本，以便为用户提供信息、解答问题、进行对话等服务。如果你有任何问题或需要帮助，请随时告诉我！"
            },
            "finish_reason": "stop"
        }
    ],
    "created": 1736672770,
    "model": "abab6.5s-chat",
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 70,
        "completion_tokens": 57,
        "total_tokens": 127
    }
}

流式请求

curl -X POST 'http://localhost:10000/v1/chat/completions' \
-H 'Content-Type: application/json' \
-d '{
    "model": "gpt-3",
    "messages": [
        {
            "role": "user",
            "content": "你好，你是谁？"
        }
    ],
    "stream": true
}'

响应：

data: {"choices":[{"index":0,"message":{"name":"MM智能助理","role":"assistant","content":"你好"}}],"created":1736672798,"model":"abab6.5s-chat","object":"chat.completion","usage":{}}

data: {"choices":[{"index":0,"message":{"name":"MM智能助理","role":"assistant","content":"！我是一个由MiniMax公司研发的大型语言模型，名为MM智能助理。我可以帮助回答问题、提供信息、进行对话和执行多种语言处理任务。如果你有任何问题或需要帮助，请随时告诉我！"}}],"created":1736672799,"model":"abab6.5s-chat","object":"chat.completion","usage":{}}

data: {"id":"03d2b91e70d88198ddbd7edb8ac953fc","choices":[{"index":0,"message":{"name":"MM智能助理","role":"assistant","content":"你好！我是一个由MiniMax公司研发的大型语言模型，名为MM智能助理。我可以帮助回答问题、提供信息、进行对话和执行多种语言处理任务。如果你有任何问题或需要帮助，请随时告诉我！"},"finish_reason":"stop"}],"created":1736672799,"model":"abab6.5s-chat","object":"chat.completion","usage":{"prompt_tokens":70,"completion_tokens":46,"total_tokens":116}}

Ⅴ. Special notes for reviews

codecov-commenter · 2025-01-12T09:18:34Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 43.61%. Comparing base (ef31e09) to head (94cae0c).
Report is 261 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1663      +/-   ##
==========================================
+ Coverage   35.91%   43.61%   +7.70%     
==========================================
  Files          69       76       +7     
  Lines       11576    12358     +782     
==========================================
+ Hits         4157     5390    +1233     
+ Misses       7104     6630     -474     
- Partials      315      338      +23

see 70 files with indirect coverage changes

cr7258

🐉 LGTM

…ba#1663)

Fix the blocking issue with minimax responses in ai-proxy

da0090d

hanxiantao requested review from cr7258, CH3CHO and rinfx as code owners January 12, 2025 09:15

hanxiantao changed the title ~~fix: resolve blocking issue with Minimax responses in ai-proxy~~ fix: resolve blocking issue with minimax responses in ai-proxy Jan 12, 2025

cr7258 approved these changes Jan 13, 2025

View reviewed changes

Merge branch 'main' into minimax-ai-proxy

94cae0c

johnlanni merged commit a1bf315 into alibaba:main Jan 14, 2025
13 checks passed

hanxiantao deleted the minimax-ai-proxy branch January 24, 2025 01:26

VinceCui pushed a commit to VinceCui/higress that referenced this pull request May 21, 2025

fix: resolve blocking issue with minimax responses in ai-proxy (aliba…

1c5510d

…ba#1663)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: resolve blocking issue with minimax responses in ai-proxy #1663

fix: resolve blocking issue with minimax responses in ai-proxy #1663

Uh oh!

hanxiantao commented Jan 12, 2025

Uh oh!

codecov-commenter commented Jan 12, 2025 •

edited

Loading

Uh oh!

cr7258 left a comment

Uh oh!

Uh oh!

Uh oh!

fix: resolve blocking issue with minimax responses in ai-proxy #1663

fix: resolve blocking issue with minimax responses in ai-proxy #1663

Uh oh!

Conversation

hanxiantao commented Jan 12, 2025

Ⅰ. Describe what this PR did

Ⅱ. Does this pull request fix one issue?

Ⅲ. Why don't you add test cases (unit test/integration test)?

Ⅳ. Describe how to verify it

使用OpenAI协议代理minimax chat completion V2 API

使用OpenAI协议代理chat completion Pro API

Ⅴ. Special notes for reviews

Uh oh!

codecov-commenter commented Jan 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cr7258 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov-commenter commented Jan 12, 2025 •

edited

Loading