Skip to content

Conversation

rafvasq
Copy link
Member

@rafvasq rafvasq commented Jun 21, 2024

Signed-off-by: Rafael Vasquez <[email protected]>
Co-authored-by: Prashant Gupta <[email protected]>
@rafvasq rafvasq changed the title add CLI Tools feat: add CLI tools Jun 28, 2024
Signed-off-by: Rafael Vasquez <[email protected]>
@rafvasq rafvasq marked this pull request as ready for review July 15, 2024 17:14
@prashantgupta24
Copy link
Member

Closing in favor of opendatahub-io/vllm#92

tdoublep pushed a commit that referenced this pull request Jan 20, 2025
This PR cleans and simplifies the code.

### Changes:

- removed right padding since not used 
- removed dict of `seq_ids` since on `AIU` only **one** `seq_id` **per**
`request_id` (no beam search or other multi sequence decoding)
- removed for loop over single `seq_id` (always 1 per `request_id`)
during decoding
- deleting batch padding mask and position ids after decode has finished
instead of overwriting it.
- merged main into this branch to resolve merge conflicts

The code has been in client/server mode for the `llama 194m` and
`granite 3b` on `AIU` and `CPU`.
tdoublep pushed a commit that referenced this pull request Jan 20, 2025
This PR cleans and simplifies the code.

### Changes:

- simplified warmup by using a function call to remove duplicated lines
- moving mask and position_ids from `SENDNNCasualLM` to
`SENDNNModelRunner`
- fixing error in pyproject.toml 
- already merged PR #52 and main into this branch for easier merge.

The code has been in client/server mode for the `llama 194m` and
`granite 3b` on `AIU` and `CPU`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants