Skip to content

Draft: [NV TRT RTX EP] Fix onnx checker for constants in subgraph #25579

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

gedoensmax
Copy link
Contributor

@gedoensmax gedoensmax commented Jul 29, 2025

Description

This PR is supposed to fix:

  • Loading large models using the AddExternalInitializersFromFilesInMemory API
  • Fix Phi4 128k instruct parsing

The first issue is resolved by adding a check in Graph::InjectExternalInitializersFromFilesInMemory( from what I can tell. The other issue that parsing of the If node fails during Graph::Resolve within NvExecutionProvider::GetSupportedList here. I tried to fix this by loading the external data in memory to raw data.
This did not resolve the error though:

←[1;31m2025-07-29 17:00:31.7917863 [E:onnxruntime:, graph.cc:3212 onnxruntime::Graph::VerifyNodeAndOpMatch] This is an i
nvalid model. In Node, ("/model/rotemb_caches_subgraph/If", If, "", -1) : ("/model/rotemb_caches_subgraph/Greater/output
_0": tensor(bool),) -> ("cos_cache": tensor(float16),"sin_cache": tensor(float16),) , Error Data of TensorProto ( tensor
 name: cos_cache_large) should be stored in */_ORT_MEM_ADDR_/*, but it doesn't exist or is not accessible.←[m

@chilo-ms @skottmckay would you be able to help out ? My guess is this has something to do with the ORT Graph wrapping.

@skottmckay
Copy link
Contributor

@yuslepukhin Should use_tensor_buffer in the call to TensorToTensorProto be true?

constexpr const bool use_tensor_buffer_true = true;
auto new_tensor_proto = utils::TensorToTensorProto(tensor, tensor_name, use_tensor_buffer_true);

The API description says the 'data will be copied into the graph' so I would have expected it to be false.

* The function will find the initialized TensorProtos with external data in the graph with the provided
* external file names and the file content in memory. The API gets the external file name, offset, data length
* from TensorProto, and locate the tensor data from the file in memory buffer.
* It creates a Tensor to replace the existing Tensor in graph. The replacement
* will occur before any of the optimizations take place. The data will be copied into the graph
* since TensorProto can't refer to the user provided buffers.

@gedoensmax
Copy link
Contributor Author

I am using a Phi4 model generated using ORT GenAI builder. It has an if node for large and small projections which seems to be the issue.

@gedoensmax
Copy link
Contributor Author

With the latest change I am able to load a model with this if branch. Still I would like to understand how to correctly handle initializers in subgraphs.

@yuslepukhin
Copy link
Member

yuslepukhin commented Jul 31, 2025

@yuslepukhin Should use_tensor_buffer in the call to TensorToTensorProto be true?

constexpr const bool use_tensor_buffer_true = true;
auto new_tensor_proto = utils::TensorToTensorProto(tensor, tensor_name, use_tensor_buffer_true);

The API description says the 'data will be copied into the graph' so I would have expected it to be false.

* The function will find the initialized TensorProtos with external data in the graph with the provided
* external file names and the file content in memory. The API gets the external file name, offset, data length
* from TensorProto, and locate the tensor data from the file in memory buffer.
* It creates a Tensor to replace the existing Tensor in graph. The replacement
* will occur before any of the optimizations take place. The data will be copied into the graph
* since TensorProto can't refer to the user provided buffers.

Yes, the recent change did not comply with the API description. This needs to be addressed.
The other issue here is that Inject it only considers the main graph as invoked from InferenceSession::Initialize.

@gedoensmax
Copy link
Contributor Author

Will this be fixed by the ORT team ? This is blocking for Phi on NV EP.

@jywu-msft jywu-msft added the ep:NvRTX NV RTX execution provider label Jul 31, 2025
@yuslepukhin
Copy link
Member

Will this be fixed by the ORT team? This is blocking for Phi on NV EP.

Yes, I am working on it now. Can you, also please, share a pointer to the exact model you are using?

@gedoensmax
Copy link
Contributor Author

It is a Phi4 model from ORT GenAI builder. I can work on getting a model shared if required.

@yuslepukhin
Copy link
Member

It is a Phi4 model from ORT GenAI builder. I can work on getting a model shared if required.

@gedoensmax please, find a moment to test this the PRs branch.

yuslepukhin added a commit that referenced this pull request Aug 2, 2025
### Description
<!-- Describe your changes. -->
Move moving weights to memory to the end of Graph::Resolve().
Modify Inject so it copies data into TensorProto according to the C API
docs.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
TypeAndShape inference runs as a part of `Resolve()` and it unable to
inspect and load the initializers that point to OrtValues at that time.
We choose to move TensorProto to OrtValue conversion at the end of
`Resolve()`.

References: #25579
@gedoensmax
Copy link
Contributor Author

I already started it Friday but had some other system issues. Will test this on Monday.

snnn pushed a commit that referenced this pull request Aug 3, 2025
### Description
<!-- Describe your changes. -->
Move moving weights to memory to the end of Graph::Resolve().
Modify Inject so it copies data into TensorProto according to the C API
docs.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
TypeAndShape inference runs as a part of `Resolve()` and it unable to
inspect and load the initializers that point to OrtValues at that time.
We choose to move TensorProto to OrtValue conversion at the end of
`Resolve()`.

References: #25579
@gedoensmax
Copy link
Contributor Author

@yuslepukhin I am still seeing the same error on a Phi4 model.

@gedoensmax
Copy link
Contributor Author

Adding this in graph.cc fixes my issues. https://github.com/microsoft/onnxruntime/blob/381c947894275b66486651208e407e2a3f0af750/onnxruntime/core/graph/graph.cc#L4266

const ONNX_NAMESPACE::GraphProto& Graph::ToGraphProto() {
  if (!GraphProtoSyncNeeded()) {
    for (int tensor_idx = 0; tensor_idx < graph_proto_->initializer_size(); ++tensor_idx) {
      auto tensor = graph_proto_->mutable_initializer(tensor_idx);
      if (utils::HasExternalDataInMemory(*tensor)) {
        std::unique_ptr<ONNX_NAMESPACE::TensorProto> full_init;
        ORT_THROW_IF_ERROR(utils::GetTensorProtoWithDataIfInMemory(*tensor, full_init));
        tensor->clear_data_location();
        tensor->clear_external_data();
        tensor->set_raw_data(full_init->raw_data());
      }
    }
    return *graph_proto_;
  }

I believe the fix is still not lowered to subgraphs as attribute on an ONNX node.

@yuslepukhin
Copy link
Member

yuslepukhin commented Aug 4, 2025

Adding this in graph.cc fixes my issues. https://github.com/microsoft/onnxruntime/blob/381c947894275b66486651208e407e2a3f0af750/onnxruntime/core/graph/graph.cc#L4266

const ONNX_NAMESPACE::GraphProto& Graph::ToGraphProto() {
  if (!GraphProtoSyncNeeded()) {
    for (int tensor_idx = 0; tensor_idx < graph_proto_->initializer_size(); ++tensor_idx) {
      auto tensor = graph_proto_->mutable_initializer(tensor_idx);
      if (utils::HasExternalDataInMemory(*tensor)) {
        std::unique_ptr<ONNX_NAMESPACE::TensorProto> full_init;
        ORT_THROW_IF_ERROR(utils::GetTensorProtoWithDataIfInMemory(*tensor, full_init));
        tensor->clear_data_location();
        tensor->clear_external_data();
        tensor->set_raw_data(full_init->raw_data());
      }
    }
    return *graph_proto_;
  }

I believe the fix is still not lowered to subgraphs as attribute on an ONNX node.

There are two versions of ::ToGraphProto(). One of them is const and returns a modified copy of GraphProto with all in memory references gone.

The other one that is not const does not do it because we would lose all in-memory tags.

adrianlizarraga pushed a commit that referenced this pull request Aug 4, 2025
### Description
<!-- Describe your changes. -->
Move moving weights to memory to the end of Graph::Resolve().
Modify Inject so it copies data into TensorProto according to the C API
docs.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
TypeAndShape inference runs as a part of `Resolve()` and it unable to
inspect and load the initializers that point to OrtValues at that time.
We choose to move TensorProto to OrtValue conversion at the end of
`Resolve()`.

References: #25579

Co-authored-by: Dmitri Smirnov <[email protected]>
@yuslepukhin
Copy link
Member

#25579

@gedoensmax
Copy link
Contributor Author

The other one that is not const does not do it because we would lose all in-memory tags.

Ok I see, I shared the model on sharepoint to you and @skottmckay.

@yuslepukhin
Copy link
Member

The other one that is not const does not do it because we would lose all in-memory tags.

Ok I see, I shared the model on sharepoint to you and @skottmckay.

Thank you for the model. Unfortunately, NVIDIA linker dies at the end of the build on multiple boxes so I can not verify it with a TRT build. Please, run this PR in your environment. Thx!

#25579

@gedoensmax gedoensmax closed this Aug 6, 2025
@gedoensmax gedoensmax force-pushed the geodensmax/mem_adress_for_subgraph branch from 9319fa5 to b1546da Compare August 6, 2025 09:48
@gedoensmax
Copy link
Contributor Author

@yuslepukhin I have been testing with TRT RTX not the TRT EP. Maybe @chilo-ms can help with any build issues or otherwise I am happy to help in any Europe compatible time.
This PR only has my changes or am I missing something ? I have also tried your other PR #25626, but had to add my ToGraphProto change.

I updated this branch to hold the exact code that I am executing. main branch which already has your changes + my fix.

@gedoensmax gedoensmax reopened this Aug 6, 2025
@skottmckay
Copy link
Contributor

I have also tried your other PR #25626, but had to add my ToGraphProto change.

Requires #25652 to fix.

@gedoensmax
Copy link
Contributor Author

@skottmckay Thanks, got lost int the different PRs I will close this.

@gedoensmax gedoensmax closed this Aug 6, 2025
@yuslepukhin
Copy link
Member

I have also tried your other PR #25626, but had to add my ToGraphProto change.

Requires #25652 to fix.

The outcome is not clear to me. Was the PR sufficient or not?

@gedoensmax
Copy link
Contributor Author

I have also tried your other PR #25626, but had to add my ToGraphProto change.

Requires #25652 to fix.

The outcome is not clear to me. Was the PR sufficient or not?

Yes, #25652 was enough. Previously i was only trying the other PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:NvRTX NV RTX execution provider
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants