Draft: [NV TRT RTX EP] Fix onnx checker for constants in subgraph #25579

gedoensmax · 2025-07-29T15:03:20Z

Description

This PR is supposed to fix:

Loading large models using the AddExternalInitializersFromFilesInMemory API
Fix Phi4 128k instruct parsing

The first issue is resolved by adding a check in Graph::InjectExternalInitializersFromFilesInMemory( from what I can tell. The other issue that parsing of the If node fails during Graph::Resolve within NvExecutionProvider::GetSupportedList here. I tried to fix this by loading the external data in memory to raw data.
This did not resolve the error though:

←[1;31m2025-07-29 17:00:31.7917863 [E:onnxruntime:, graph.cc:3212 onnxruntime::Graph::VerifyNodeAndOpMatch] This is an i
nvalid model. In Node, ("/model/rotemb_caches_subgraph/If", If, "", -1) : ("/model/rotemb_caches_subgraph/Greater/output
_0": tensor(bool),) -> ("cos_cache": tensor(float16),"sin_cache": tensor(float16),) , Error Data of TensorProto ( tensor
 name: cos_cache_large) should be stored in */_ORT_MEM_ADDR_/*, but it doesn't exist or is not accessible.←[m

@chilo-ms @skottmckay would you be able to help out ? My guess is this has something to do with the ORT Graph wrapping.

skottmckay · 2025-07-31T09:39:30Z

@yuslepukhin Should use_tensor_buffer in the call to TensorToTensorProto be true?

onnxruntime/onnxruntime/core/graph/graph.cc

Lines 3768 to 3769 in e753643

    
           constexpr const bool use_tensor_buffer_true = true; 
        
           auto new_tensor_proto = utils::TensorToTensorProto(tensor, tensor_name, use_tensor_buffer_true);

The API description says the 'data will be copied into the graph' so I would have expected it to be false.

onnxruntime/include/onnxruntime/core/session/onnxruntime_c_api.h

Lines 4890 to 4895 in e753643

    
              * The function will find the initialized TensorProtos with external data in the graph with the provided 
        
              * external file names and the file content in memory. The API gets the external file name, offset, data length 
        
              * from TensorProto, and locate the tensor data from the file in memory buffer. 
        
              * It creates a Tensor to replace the existing Tensor in graph. The replacement 
        
              * will occur before any of the optimizations take place. The data will be copied into the graph 
        
              * since TensorProto can't refer to the user provided buffers.

gedoensmax · 2025-07-31T11:08:37Z

I am using a Phi4 model generated using ORT GenAI builder. It has an if node for large and small projections which seems to be the issue.

gedoensmax · 2025-07-31T12:23:14Z

With the latest change I am able to load a model with this if branch. Still I would like to understand how to correctly handle initializers in subgraphs.

yuslepukhin · 2025-07-31T17:11:47Z

@yuslepukhin Should use_tensor_buffer in the call to TensorToTensorProto be true?

onnxruntime/onnxruntime/core/graph/graph.cc

Lines 3768 to 3769 in e753643

constexpr const bool use_tensor_buffer_true = true;

auto new_tensor_proto = utils::TensorToTensorProto(tensor, tensor_name, use_tensor_buffer_true);

The API description says the 'data will be copied into the graph' so I would have expected it to be false.

onnxruntime/include/onnxruntime/core/session/onnxruntime_c_api.h

Lines 4890 to 4895 in e753643

* The function will find the initialized TensorProtos with external data in the graph with the provided

* external file names and the file content in memory. The API gets the external file name, offset, data length

* from TensorProto, and locate the tensor data from the file in memory buffer.

* It creates a Tensor to replace the existing Tensor in graph. The replacement

* will occur before any of the optimizations take place. The data will be copied into the graph

* since TensorProto can't refer to the user provided buffers.

Yes, the recent change did not comply with the API description. This needs to be addressed.
The other issue here is that Inject it only considers the main graph as invoked from InferenceSession::Initialize.

gedoensmax · 2025-07-31T19:42:24Z

Will this be fixed by the ORT team ? This is blocking for Phi on NV EP.

yuslepukhin · 2025-07-31T23:06:20Z

Will this be fixed by the ORT team? This is blocking for Phi on NV EP.

Yes, I am working on it now. Can you, also please, share a pointer to the exact model you are using?

gedoensmax · 2025-08-01T00:31:38Z

It is a Phi4 model from ORT GenAI builder. I can work on getting a model shared if required.

yuslepukhin · 2025-08-01T17:23:41Z

It is a Phi4 model from ORT GenAI builder. I can work on getting a model shared if required.

@gedoensmax please, find a moment to test this the PRs branch.

### Description  Move moving weights to memory to the end of Graph::Resolve(). Modify Inject so it copies data into TensorProto according to the C API docs. ### Motivation and Context  TypeAndShape inference runs as a part of `Resolve()` and it unable to inspect and load the initializers that point to OrtValues at that time. We choose to move TensorProto to OrtValue conversion at the end of `Resolve()`. References: #25579

gedoensmax · 2025-08-02T08:28:41Z

I already started it Friday but had some other system issues. Will test this on Monday.

### Description  Move moving weights to memory to the end of Graph::Resolve(). Modify Inject so it copies data into TensorProto according to the C API docs. ### Motivation and Context  TypeAndShape inference runs as a part of `Resolve()` and it unable to inspect and load the initializers that point to OrtValues at that time. We choose to move TensorProto to OrtValue conversion at the end of `Resolve()`. References: #25579

gedoensmax · 2025-08-04T14:40:52Z

@yuslepukhin I am still seeing the same error on a Phi4 model.

gedoensmax · 2025-08-04T16:42:38Z

Adding this in graph.cc fixes my issues. https://github.com/microsoft/onnxruntime/blob/381c947894275b66486651208e407e2a3f0af750/onnxruntime/core/graph/graph.cc#L4266

const ONNX_NAMESPACE::GraphProto& Graph::ToGraphProto() {
  if (!GraphProtoSyncNeeded()) {
    for (int tensor_idx = 0; tensor_idx < graph_proto_->initializer_size(); ++tensor_idx) {
      auto tensor = graph_proto_->mutable_initializer(tensor_idx);
      if (utils::HasExternalDataInMemory(*tensor)) {
        std::unique_ptr<ONNX_NAMESPACE::TensorProto> full_init;
        ORT_THROW_IF_ERROR(utils::GetTensorProtoWithDataIfInMemory(*tensor, full_init));
        tensor->clear_data_location();
        tensor->clear_external_data();
        tensor->set_raw_data(full_init->raw_data());
      }
    }
    return *graph_proto_;
  }

I believe the fix is still not lowered to subgraphs as attribute on an ONNX node.

yuslepukhin · 2025-08-04T16:53:04Z

Adding this in graph.cc fixes my issues. https://github.com/microsoft/onnxruntime/blob/381c947894275b66486651208e407e2a3f0af750/onnxruntime/core/graph/graph.cc#L4266

const ONNX_NAMESPACE::GraphProto& Graph::ToGraphProto() {
  if (!GraphProtoSyncNeeded()) {
    for (int tensor_idx = 0; tensor_idx < graph_proto_->initializer_size(); ++tensor_idx) {
      auto tensor = graph_proto_->mutable_initializer(tensor_idx);
      if (utils::HasExternalDataInMemory(*tensor)) {
        std::unique_ptr<ONNX_NAMESPACE::TensorProto> full_init;
        ORT_THROW_IF_ERROR(utils::GetTensorProtoWithDataIfInMemory(*tensor, full_init));
        tensor->clear_data_location();
        tensor->clear_external_data();
        tensor->set_raw_data(full_init->raw_data());
      }
    }
    return *graph_proto_;
  }

I believe the fix is still not lowered to subgraphs as attribute on an ONNX node.

There are two versions of ::ToGraphProto(). One of them is const and returns a modified copy of GraphProto with all in memory references gone.

The other one that is not const does not do it because we would lose all in-memory tags.

### Description  Move moving weights to memory to the end of Graph::Resolve(). Modify Inject so it copies data into TensorProto according to the C API docs. ### Motivation and Context  TypeAndShape inference runs as a part of `Resolve()` and it unable to inspect and load the initializers that point to OrtValues at that time. We choose to move TensorProto to OrtValue conversion at the end of `Resolve()`. References: #25579 Co-authored-by: Dmitri Smirnov <[email protected]>

yuslepukhin · 2025-08-05T01:36:39Z

#25579

gedoensmax · 2025-08-05T04:13:25Z

The other one that is not const does not do it because we would lose all in-memory tags.

Ok I see, I shared the model on sharepoint to you and @skottmckay.

yuslepukhin · 2025-08-05T23:07:19Z

The other one that is not const does not do it because we would lose all in-memory tags.

Ok I see, I shared the model on sharepoint to you and @skottmckay.

Thank you for the model. Unfortunately, NVIDIA linker dies at the end of the build on multiple boxes so I can not verify it with a TRT build. Please, run this PR in your environment. Thx!

#25579

gedoensmax · 2025-08-06T09:49:04Z

@yuslepukhin I have been testing with TRT RTX not the TRT EP. Maybe @chilo-ms can help with any build issues or otherwise I am happy to help in any Europe compatible time.
This PR only has my changes or am I missing something ? I have also tried your other PR #25626, but had to add my ToGraphProto change.

I updated this branch to hold the exact code that I am executing. main branch which already has your changes + my fix.

skottmckay · 2025-08-06T10:21:37Z

I have also tried your other PR #25626, but had to add my ToGraphProto change.

Requires #25652 to fix.

gedoensmax · 2025-08-06T14:32:05Z

@skottmckay Thanks, got lost int the different PRs I will close this.

yuslepukhin · 2025-08-06T14:58:59Z

I have also tried your other PR #25626, but had to add my ToGraphProto change.

Requires #25652 to fix.

The outcome is not clear to me. Was the PR sufficient or not?

gedoensmax · 2025-08-06T23:05:16Z

I have also tried your other PR #25626, but had to add my ToGraphProto change.

Requires #25652 to fix.

The outcome is not clear to me. Was the PR sufficient or not?

Yes, #25652 was enough. Previously i was only trying the other PR.

jywu-msft added the ep:NvRTX NV RTX execution provider label Jul 31, 2025

yuslepukhin mentioned this pull request Aug 1, 2025

Move moving weights to memory to the end of Graph::Resolve() #25626

Merged

snnn mentioned this pull request Aug 3, 2025

Cherry-pick PR #25626 to 1.23.0 release branch #25640

Merged

gedoensmax closed this Aug 6, 2025

gedoensmax force-pushed the geodensmax/mem_adress_for_subgraph branch from 9319fa5 to b1546da Compare August 6, 2025 09:48

fix for subgraph tensors

e201c9c

gedoensmax reopened this Aug 6, 2025

gedoensmax closed this Aug 6, 2025

Draft: [NV TRT RTX EP] Fix onnx checker for constants in subgraph #25579

Draft: [NV TRT RTX EP] Fix onnx checker for constants in subgraph #25579

Uh oh!

Conversation

gedoensmax commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

skottmckay commented Jul 31, 2025

Uh oh!

gedoensmax commented Jul 31, 2025

Uh oh!

gedoensmax commented Jul 31, 2025

Uh oh!

yuslepukhin commented Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gedoensmax commented Jul 31, 2025

Uh oh!

yuslepukhin commented Jul 31, 2025

Uh oh!

gedoensmax commented Aug 1, 2025

Uh oh!

yuslepukhin commented Aug 1, 2025

Uh oh!

gedoensmax commented Aug 2, 2025

Uh oh!

gedoensmax commented Aug 4, 2025

Uh oh!

gedoensmax commented Aug 4, 2025

Uh oh!

yuslepukhin commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuslepukhin commented Aug 5, 2025

Uh oh!

gedoensmax commented Aug 5, 2025

Uh oh!

yuslepukhin commented Aug 5, 2025

Uh oh!

gedoensmax commented Aug 6, 2025

Uh oh!

skottmckay commented Aug 6, 2025

Uh oh!

gedoensmax commented Aug 6, 2025

Uh oh!

yuslepukhin commented Aug 6, 2025

Uh oh!

gedoensmax commented Aug 6, 2025

Uh oh!

Uh oh!

gedoensmax commented Jul 29, 2025 •

edited

Loading

yuslepukhin commented Jul 31, 2025 •

edited

Loading

yuslepukhin commented Aug 4, 2025 •

edited

Loading