Skip to content

fix(gguf_parser): fix memoryerror exception when loading non-native models #1452

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 30, 2025

Conversation

taronaeo
Copy link
Collaborator

@taronaeo taronaeo commented May 29, 2025

When trying to run non-native endian model files, RamaLama would use the GGUFInfoParser.read_string method to parse the model file and eventually hit a MemoryError. This is PR takes into consideration Little and Big Endian model endianness and parses them accordingly.

Tested and verified to work on x86 without affecting anything. I will test it on s390x as well tomorrow when the z15 R&D mainframe comes online.

Summary by Sourcery

Enable correct parsing of GGUF model files with varying endianness and prevent MemoryError when loading non-native endian models.

Bug Fixes:

  • Fix MemoryError when loading GGUF models with non-native endianness by properly handling big-endian data parsing.

Enhancements:

  • Extend read_string to accept an endianness parameter and delegate length reading to read_number.
  • Add EOF check in read_string to gracefully handle incomplete reads.
  • Propagate the model endianness parameter to all string-reading calls in parsing routines.

Copy link
Contributor

sourcery-ai bot commented May 29, 2025

Reviewer's Guide

This PR enhances the GGUFInfoParser to correctly handle little- and big-endian models by threading an explicit endianness parameter through string and number reads and adds checks to prevent MemoryErrors on truncated inputs.

Sequence Diagram for GGUFInfoParser.read_string with dynamic length and EOF check

sequenceDiagram
    participant C as Caller
    participant RS_Method as "GGUFInfoParser.read_string()"
    participant GGUFInfoParser_Static as "GGUFInfoParser (static methods)"
    participant M as "model: io.BufferedReader"

    C->>RS_Method: read_string(model, model_endianness, length=-1)
    RS_Method->>GGUFInfoParser_Static: read_number(model, GGUFValueType.UINT64, model_endianness)
    GGUFInfoParser_Static-->>RS_Method: determined_length
    RS_Method->>M: read(determined_length)
    M-->>RS_Method: raw_bytes
    alt len(raw_bytes) < determined_length
        RS_Method-->>C: return ParseError("Unexpected EOF...")
    else Bytes read successfully
        RS_Method->>RS_Method: raw_bytes.decode("utf-8")
        RS_Method-->>C: return decoded_string
    end
Loading

Sequence Diagram: Endianness Parameter Propagation in GGUFInfoParser.parse Method

sequenceDiagram
    participant Client
    participant P as "GGUFInfoParser.parse()"
    participant RS as "GGUFInfoParser.read_string()"
    participant RN as "GGUFInfoParser.read_number()"
    participant RV as "GGUFInfoParser.read_value()"
    participant M as "model: io.BufferedReader"

    Client->>P: parse(model_path, ...)
    P->>M: open(model_path, "rb")
    M-->>P: model_file_handle
    P->>RS: read_string(model, model_endianness, 4)
    RS-->>P: magic_number
    P->>P: Validate magic_number
    opt Invalid magic_number
        P-->>Client: Raise ParseError
    end

    P->>RN: read_number(model, GGUFValueType.UINT32, model_endianness)
    RN-->>P: version
    P->>RN: read_number(model, GGUFValueType.UINT64, model_endianness)
    RN-->>P: tensor_count
    P->>RN: read_number(model, GGUFValueType.UINT64, model_endianness)
    RN-->>P: metadata_kv_count

    loop metadata_kv_count times
        P->>RS: read_string(model, model_endianness)
        RS-->>P: key
        P->>P: read_value_type(model, model_endianness)
        P-->>P: value_type
        P->>RV: read_value(model, value_type, model_endianness)
        RV-->>P: value
        P->>P: metadata[key] = value
    end

    loop tensor_count times
        P->>RS: read_string(model, model_endianness)
        RS-->>P: tensor_name
        P->>RN: read_number(model, GGUFValueType.UINT32, model_endianness)
        RN-->>P: n_dimensions
        loop n_dimensions times
            P->>RN: read_number(model, GGUFValueType.UINT64, model_endianness)
            RN-->>P: dimension_value
        end
        P->>RN: read_number(model, GGUFValueType.UINT32, model_endianness)
        RN-->>P: tensor_type
        P->>RN: read_number(model, GGUFValueType.UINT64, model_endianness)
        RN-->>P: offset
        P->>P: Create Tensor object
    end
    P-->>Client: GGUFModelInfo
Loading

Updated Class Diagram for GGUFInfoParser

classDiagram
    class GGUFInfoParser {
        +is_model_gguf(model_path: str) bool
        +read_string(model: io.BufferedReader, model_endianness: GGUFEndian = GGUFEndian.LITTLE, length: int = -1) str
        +read_number(model: io.BufferedReader, value_type: GGUFValueType, model_endianness: GGUFEndian) float
        +read_value(model: io.BufferedReader, value_type: GGUFValueType, model_endianness: GGUFEndian) object
        +parse(model_name: str, model_registry: str, model_path: str) GGUFModelInfo
    }
    class GGUFEndian {
        <<enumeration>>
        LITTLE
        BIG
    }
    class GGUFValueType {
        <<enumeration>>
        UINT32
        UINT64
        BOOL
        STRING
        ARRAY
    }
    class ParseError {
        <<exception>>
        message: str
    }
    GGUFInfoParser ..> GGUFEndian : uses
    GGUFInfoParser ..> GGUFValueType : uses
    GGUFInfoParser ..> ParseError : creates
Loading

File-Level Changes

Change Details Files
Add explicit endianness support to string parsing
  • Extend read_string signature to accept model_endianness
  • Use read_number with model_endianness to determine string length
  • Propagate model_endianness to all read_string calls in is_model_gguf, parse, read_value, metadata, and tensor loops
ramalama/gguf_parser.py
Handle unexpected EOF when reading strings to prevent MemoryError
  • Check raw byte count against expected length after reading
  • Return a ParseError on truncated input instead of letting decode or struct unpacking fail
ramalama/gguf_parser.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @taronaeo - I've reviewed your changes - here's some feedback:

  • read_string currently returns a ParseError object on EOF, which breaks the expected str return type—raise a ParseError exception instead of returning it.
  • Magic‐number detection still assumes little‐endian; consider reading the raw 4 bytes and comparing directly so big‐endian GGUF files are correctly identified.
  • The read_string signature is confusing with positional (endianness, length) parameters—make model_endianness keyword-only or reorder parameters for clarity.
Here's what I looked at during the review
  • 🟡 General issues: 2 issues found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@taronaeo taronaeo force-pushed the fix/gguf-parser-string-endian branch from 6c85ca5 to b6cc604 Compare May 29, 2025 19:15
@rhatdan
Copy link
Member

rhatdan commented May 29, 2025

LGTM

Signed-off-by: Aaron Teo <[email protected]>

fix(gguf_parser): missed some calls

Signed-off-by: Aaron Teo <[email protected]>

fix(gguf_parser): typo `return` vs `raise`

Signed-off-by: Aaron Teo <[email protected]>

fix(gguf_parser): missing staticmethod declarations

Signed-off-by: Aaron Teo <[email protected]>
@taronaeo taronaeo force-pushed the fix/gguf-parser-string-endian branch from b6cc604 to 1b32a09 Compare May 30, 2025 02:05
@taronaeo
Copy link
Collaborator Author

Just tested on IBM z15 mainframe and it works as intended. If all CI passes and no other changes are required, feel free to merge into main :)

Copy link
Member

@engelmi engelmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rhatdan rhatdan merged commit 27ec0d0 into containers:main May 30, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants