[vLLM]feat: support micro batch for vllm #1818

Irvingwangjr · 2025-06-03T09:22:56Z

No description provided.

chenhaiq · 2025-06-03T11:09:15Z

examples/ppo_trainer/naive_chat_scheduler.py

+    exceptoin: Exception
+
+
+class MicroBatchChatCompletionScheduler(NaiveChatCompletionScheduler):


NaiveChatCompletionScheduler 的定位应该只是个demo，所以放在example里面。你写的这个应该是正式的版本，挪到 verl/workers/rollout/ 这个目录，直接从ChatCompletionScheduler继承比较合适。
然后把MicroBatchChatCompletionScheduler设置成默认的scheduler

chenhaiq · 2025-06-03T11:13:10Z

examples/ppo_trainer/naive_chat_scheduler.py

+        self.max_inflight_req = max_inflight_req
+        self.server_addresses = server_addresses
+        self.proxy_agents_coros = self._init_proxy_group(self.server_addresses, self.send_queue, self.reduce_queue, self.max_inflight_req)
+        print(self.proxy_agents_coros)


有很多print，如果对训练没帮助就删了吧。如果需要，改成logger

chenhaiq · 2025-06-03T11:22:51Z

examples/ppo_trainer/naive_chat_scheduler.py

+
+
+class MicroBatchChatCompletionScheduler(NaiveChatCompletionScheduler):
+    def __init__(self, config, model_path, server_addresses, max_cache_size=10000, max_inflight_req=8):


在配置文件里面，给 max_inflight_req 加个配置，并写一下使用说明

chenhaiq · 2025-06-03T11:53:41Z

examples/ppo_trainer/naive_chat_scheduler.py

+                    if exception is not None:
+                        print("[MicroBatchChatCompletionScheduler] _consumer process callback get exception", idx)
+                        await reduce_queue.put(RolloutSample(completions, info, None, None, None, exceptoin=exception))
+                    else:


这里的if else分支没起作用吧？
await reduce_queue.put(RolloutSample(completions, info, None, None, None, exceptoin=exception) 是不是就行了

chenhaiq · 2025-06-03T11:55:09Z

examples/ppo_trainer/naive_chat_scheduler.py

+            semaphore = asyncio.Semaphore(max_inflight_req)
+            local_queue = asyncio.Queue(max_inflight_req)
+            coros.append(self.get_element(addr, send_queue, local_queue, semaphore))
+            coros.extend([self.process(local_queue, reduce_queue, semaphore, addr, i) for i in range(max_inflight_req)])


因为对每个 sample都要取semaphore，那么max_inflight_req是不是并没有起到作用？

eric-haibin-lin · 2025-07-06T18:06:32Z

A PR feature without description and usage example is not acceptable

feat: support micro batch for vllm

a7ea3a7

chenhaiq reviewed Jun 3, 2025

View reviewed changes

Irvingwangjr added 2 commits June 3, 2025 20:28

feat: fix log

35d7386

feat: add bench test

4b0c599

feifeibear mentioned this pull request Jun 6, 2025

[roadmap] Rollout Module Development Progress & Roadmap #1881

Closed

12 tasks

chenhaiq mentioned this pull request Jun 6, 2025

[roadmap] Rollout Module Development Progress & Roadmap #1882

Open

13 tasks

Irvingwangjr closed this Aug 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[vLLM]feat: support micro batch for vllm #1818

[vLLM]feat: support micro batch for vllm #1818

Irvingwangjr commented Jun 3, 2025

Uh oh!

chenhaiq Jun 3, 2025

Uh oh!

chenhaiq Jun 3, 2025

Uh oh!

chenhaiq Jun 3, 2025

Uh oh!

chenhaiq Jun 3, 2025

Uh oh!

chenhaiq Jun 3, 2025

Uh oh!

eric-haibin-lin commented Jul 6, 2025

Uh oh!

Uh oh!

		exceptoin: Exception


		class MicroBatchChatCompletionScheduler(NaiveChatCompletionScheduler):



		class MicroBatchChatCompletionScheduler(NaiveChatCompletionScheduler):
		def __init__(self, config, model_path, server_addresses, max_cache_size=10000, max_inflight_req=8):

[vLLM]feat: support micro batch for vllm #1818

[vLLM]feat: support micro batch for vllm #1818

Conversation

Irvingwangjr commented Jun 3, 2025

Uh oh!

chenhaiq Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

chenhaiq Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

chenhaiq Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

chenhaiq Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

chenhaiq Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

eric-haibin-lin commented Jul 6, 2025

Uh oh!

Uh oh!