Skip to content

Conversation

@a-ys
Copy link
Contributor

@a-ys a-ys commented Jun 27, 2024

Description

This PR includes various fixes for the Neo Neuron compilation & vLLM quantization scripts.

9253e42 Hard-codes engine=Python in to the Neo Neuron compilation script so that errors in customer serving.properties do not cause compilation to fail.

405ea55 Removes a hanging reference to TARGET_INSTANCE_TYPE in the Neo Quantization script.

36810cc Adds logic to pass through engine and option.entryPoint to the outputted serving.properties. This is done so that when we compile with hardcoded values engine=Python and option.entryPoint=djl_python.transformers_neuronx, customer values for these are passed through to support custom entrypoints.

c1556ec Changes the output file format to this following:

  • Files in the input directory are directly copied to the output.
  • The outputs of compilation are saved in a subdirectory of the output: optimized_model
  • The outputted serving.properties sets model_id=./optimized_model so that the compiled model is used during deployment.
    This is done to allow for custom entrypoint files & requirements files for serving. Introduces an issue of doubling the model size. This will be refined in future changes.

9ddd2c3 Adds a check to make tp_degree required in the Neo neuron compilation script.

a-ys added 5 commits June 18, 2024 18:45
Changes the Neo output file format to better support requirements.txt
and custom entry point files. The compiler output will be saved to a
subdirectory and the input files are copied over to the output.
@a-ys a-ys force-pushed the v10_neo_patches branch from 2639652 to c1556ec Compare June 27, 2024 21:48
@a-ys a-ys changed the title [Neo] Fixing various Neo compilation/quantization script bugs [Neo] Neo compilation/quantization script bugfixes Jul 9, 2024
@a-ys a-ys marked this pull request as ready for review July 9, 2024 23:32
@a-ys a-ys requested review from a team, frankfliu and zachgk as code owners July 9, 2024 23:32
@lanking520 lanking520 merged commit 88f84ba into deepjavalibrary:master Jul 11, 2024
tosterberg pushed a commit to tosterberg/djl-serving that referenced this pull request Jul 18, 2024
tosterberg added a commit that referenced this pull request Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants