-
Notifications
You must be signed in to change notification settings - Fork 678
Open
Labels
featureThe feature requestThe feature requestonnxperformanceAnything related to performanceAnything related to performance
Description
Feature description
Currently, burn-import
's ONNX loader (onnx_ir
) reads all ONNX weights (initializers) into memory up front. For large models, this approach is memory intensive and can cause scalability issues or OOM crashes. Instead, we should move to a strategy where weights and tensors are read from the ONNX/protobuf file only as needed during model loading or code generation, possibly using streaming/protobuf lazy parsing.
Current state
onnx_ir
/burn-import
loads all weight tensors at once, regardless of actual need.- This behavior is particularly problematic for large models or resource-constrained environments.
- No option exists for memory-mapped or streaming/lazy loading.
Proposal
- Refactor ONNX loading in
burn-import
/onnx-ir
to support on-demand reading of weights/tensors from the ONNX file. - Use protobuf's streaming API, or a similar mechanism, to avoid loading unnecessary data into memory.
- Consider providing both eager (current) and lazy/streaming modes for backwards compatibility.
- Update codegen and all ONNX operator implementations in
burn-import
to work with on-demand tensor access. - Document any new APIs or usage considerations for downstream users.
Feature motivation
- Support importing and working with extremely large ONNX models without OOM errors.
- Reduce the memory footprint for typical ONNX import scenarios.
- Enable use of
burn-import
in environments with constrained memory (e.g., embedded, wasm, CI/CD, cloud). - Bring burn-import's ONNX handling up to par with best practices in other frameworks (cf. PyTorch, TensorFlow, ONNX Runtime).
(Optional) Suggest a Solution
- Investigate the protobuf parsing used in
onnx-ir
, and refactor to support iterators or readers for initializers/weights. - Use streaming reads for large tensor data blocks, and only decode weights as needed for each node/operator.
- Consider a trait or abstraction for weight access that can be implemented for both eager and lazy backends.
- Profile and benchmark memory usage before/after.
- Add regression tests for large ONNX models to ensure memory use stays low.
Context
- Related to scalability and performance limitations in current
burn-import
ONNX support. - Not directly addressed by any existing open tickets; this is a new proposal.
- For overlapping concerns, see existing issues on ONNX import scalability, async, and backend support.
Metadata
Metadata
Assignees
Labels
featureThe feature requestThe feature requestonnxperformanceAnything related to performanceAnything related to performance