Implement memory-efficient ONNX weight loading (lazy/protobuf streaming)

### Feature description

Currently, `burn-import`'s ONNX loader (`onnx_ir`) reads all ONNX weights (initializers) into memory up front. For large models, this approach is memory intensive and can cause scalability issues or OOM crashes. Instead, we should move to a strategy where weights and tensors are read from the ONNX/protobuf file only as needed during model loading or code generation, possibly using streaming/protobuf lazy parsing.

#### Current state
- `onnx_ir`/`burn-import` loads all weight tensors at once, regardless of actual need.
- This behavior is particularly problematic for large models or resource-constrained environments.
- No option exists for memory-mapped or streaming/lazy loading.

#### Proposal
- Refactor ONNX loading in `burn-import`/`onnx-ir` to support on-demand reading of weights/tensors from the ONNX file.
- Use protobuf's streaming API, or a similar mechanism, to avoid loading unnecessary data into memory.
- Consider providing both eager (current) and lazy/streaming modes for backwards compatibility.
- Update codegen and all ONNX operator implementations in `burn-import` to work with on-demand tensor access.
- Document any new APIs or usage considerations for downstream users.

### Feature motivation

- Support importing and working with extremely large ONNX models without OOM errors.
- Reduce the memory footprint for typical ONNX import scenarios.
- Enable use of `burn-import` in environments with constrained memory (e.g., embedded, wasm, CI/CD, cloud).
- Bring burn-import's ONNX handling up to par with best practices in other frameworks (cf. PyTorch, TensorFlow, ONNX Runtime).

### (Optional) Suggest a Solution

- Investigate the protobuf parsing used in `onnx-ir`, and refactor to support iterators or readers for initializers/weights.
- Use streaming reads for large tensor data blocks, and only decode weights as needed for each node/operator.
- Consider a trait or abstraction for weight access that can be implemented for both eager and lazy backends.
- Profile and benchmark memory usage before/after.
- Add regression tests for large ONNX models to ensure memory use stays low.

---

#### Context
- Related to scalability and performance limitations in current `burn-import` ONNX support.
- Not directly addressed by any existing open tickets; this is a new proposal.
- For overlapping concerns, see existing issues on ONNX import scalability, async, and backend support.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement memory-efficient ONNX weight loading (lazy/protobuf streaming) #3596

Feature description

Current state

Proposal

Feature motivation

(Optional) Suggest a Solution

Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement memory-efficient ONNX weight loading (lazy/protobuf streaming) #3596

Description

Feature description

Current state

Proposal

Feature motivation

(Optional) Suggest a Solution

Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions