Skip to content

Conversation

@aras-p
Copy link
Contributor

@aras-p aras-p commented Sep 11, 2025

(alternative to #2125 that does away with tables completely -- this one just initializes them on first use)

B44 compression is probably very rarely used, and even when it is, the tables are only ever used on channels marked as "linear" (EXR_PERCEPTUALLY_LINEAR), which is not the default.

Instead of having a megabyte of source code for the tables, that take up 256 kilobytes of library size, initialize the tables once upon first use of B44 compression. The initialization itself takes under a millisecond (Ryzen 5950X, Windows, build without F16C enabled even), and is done in a thread-safe way using InitOnceExecuteOnce / pthread_once depending on the platform.

OpenEXRCore-4_0.dll size goes 868KB -> 614KB.

B44 compression is probably very rarely used, and even when it is, the
tables are only ever used on channels marked as "linear"
(EXR_PERCEPTUALLY_LINEAR), which is not the default.

Instead of having a megabyte of source code for the tables, that take
up 256 kilobytes of library size, initialize the tables once upon
first use of B44 compression. The initialization itself takes under
a millisecond (Ryzen 5950X, Windows, build without F16C enabled even),
and is done in a thread-safe way using InitOnceExecuteOnce / pthread_once
depending on the platform.

OpenEXRCore-4_0.dll size goes 868KB -> 614KB.

Signed-off-by: Aras Pranckevicius <[email protected]>
Symbol visibility is slightly different once a C file contains
not only data, but also a function. So the tables and their
initialization need to be split into separate source files.

Signed-off-by: Aras Pranckevicius <[email protected]>
The ILMTHREAD_THREADING_ENABLED is defined to 0 or 1! Which means
some already existing code within the whole library is already
wrong

Signed-off-by: Aras Pranckevicius <[email protected]>
Copy link

@ThiagoIze ThiagoIze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea and the code seems good to me.

While the table memory is always allocated even if never used, it means there's no memory leak when the library is closed, which is good. What's more, this memory cost is insignificant because it's uncommitted memory so not yet backed by actual system RAM. (edit, I forgot static objects are zero initialized). This PR seems robust and accomplishes its goals of saving space in RAM, on disk, and in source code.

I'll leave it to someone else more familiar with C and openexr to decide if it's good to merge.

Copy link
Contributor

@meshula meshula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this a lot. Reducing the size of the code and removing easily computed tables is one of my goals as well.

@meshula
Copy link
Contributor

meshula commented Sep 18, 2025

Note, I personally prefer the math-only variant to retaining a table based approach.

@ThiagoIze
Copy link

Note, I personally prefer the math-only variant to retaining a table based approach.

I'm curious, what are the advantages of math-only to on-demad table that make you prefer it? The math version is significantly slower and if you add simd, it's much more code. So far the only argument I can come up with against tables is that it has OS specific code so might not work on some esoteric OS.

@kthurston
Copy link

@aras-p I believe this will just crash when actually enacted? I do not see allocations of the tables anywhere... (it is possible we don't test log encoded b44, rather only the linear path where I completely eliminated the table lookup previously).

I think we should NOT merge this, and instead wait for a moment and use the per compression context we're working on for openjph in b44 as well - we can then clean up memory without use of a static destructor function or anything.

Copy link

@kthurston kthurston left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's switch to use compression contexts when that is merged instead of adding this

@ThiagoIze
Copy link

ThiagoIze commented Sep 19, 2025

The allocation happens in internal_b44_table.c

static uint16_t exrcore_expTable_data[65536];
static uint16_t exrcore_logTable_data[65536];

There's no need to manually clean up memory. It will happen automatically on dlclose.

@kthurston
Copy link

ah, sorry! The web view wasn't showing me the diff properly because of the number of lines removed - it appeared like the whole file was removed. But I still think we can avoid the vrss entirely which may help on small / embedded systems (i.e. a camera) at a potential slight pessimisation - will have a play

@aras-p
Copy link
Contributor Author

aras-p commented Sep 20, 2025

But I still think we can avoid the vrss entirely which may help on small / embedded systems (i.e. a camera) at a potential slight pessimisation

This PR does not increase memory usage compared to what is in main branch; just the tables are not taking up executable space. The other PR (#2125) does away with tables completely, but doing the math is slower than just the table lookup; detailed timings in that PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants