-
Notifications
You must be signed in to change notification settings - Fork 656
B44: initialize exp/log tables at runtime #2126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
B44: initialize exp/log tables at runtime #2126
Conversation
B44 compression is probably very rarely used, and even when it is, the tables are only ever used on channels marked as "linear" (EXR_PERCEPTUALLY_LINEAR), which is not the default. Instead of having a megabyte of source code for the tables, that take up 256 kilobytes of library size, initialize the tables once upon first use of B44 compression. The initialization itself takes under a millisecond (Ryzen 5950X, Windows, build without F16C enabled even), and is done in a thread-safe way using InitOnceExecuteOnce / pthread_once depending on the platform. OpenEXRCore-4_0.dll size goes 868KB -> 614KB. Signed-off-by: Aras Pranckevicius <[email protected]>
Symbol visibility is slightly different once a C file contains not only data, but also a function. So the tables and their initialization need to be split into separate source files. Signed-off-by: Aras Pranckevicius <[email protected]>
The ILMTHREAD_THREADING_ENABLED is defined to 0 or 1! Which means some already existing code within the whole library is already wrong Signed-off-by: Aras Pranckevicius <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea and the code seems good to me.
While the table memory is always allocated even if never used, it means there's no memory leak when the library is closed, which is good. What's more, this memory cost is insignificant because it's uncommitted memory so not yet backed by actual system RAM. (edit, I forgot static objects are zero initialized). This PR seems robust and accomplishes its goals of saving space in RAM, on disk, and in source code.
I'll leave it to someone else more familiar with C and openexr to decide if it's good to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this a lot. Reducing the size of the code and removing easily computed tables is one of my goals as well.
|
Note, I personally prefer the math-only variant to retaining a table based approach. |
I'm curious, what are the advantages of math-only to on-demad table that make you prefer it? The math version is significantly slower and if you add simd, it's much more code. So far the only argument I can come up with against tables is that it has OS specific code so might not work on some esoteric OS. |
|
@aras-p I believe this will just crash when actually enacted? I do not see allocations of the tables anywhere... (it is possible we don't test log encoded b44, rather only the linear path where I completely eliminated the table lookup previously). I think we should NOT merge this, and instead wait for a moment and use the per compression context we're working on for openjph in b44 as well - we can then clean up memory without use of a static destructor function or anything. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's switch to use compression contexts when that is merged instead of adding this
|
The allocation happens in internal_b44_table.c There's no need to manually clean up memory. It will happen automatically on dlclose. |
|
ah, sorry! The web view wasn't showing me the diff properly because of the number of lines removed - it appeared like the whole file was removed. But I still think we can avoid the vrss entirely which may help on small / embedded systems (i.e. a camera) at a potential slight pessimisation - will have a play |
This PR does not increase memory usage compared to what is in main branch; just the tables are not taking up executable space. The other PR (#2125) does away with tables completely, but doing the math is slower than just the table lookup; detailed timings in that PR. |
(alternative to #2125 that does away with tables completely -- this one just initializes them on first use)
B44 compression is probably very rarely used, and even when it is, the tables are only ever used on channels marked as "linear" (EXR_PERCEPTUALLY_LINEAR), which is not the default.
Instead of having a megabyte of source code for the tables, that take up 256 kilobytes of library size, initialize the tables once upon first use of B44 compression. The initialization itself takes under a millisecond (Ryzen 5950X, Windows, build without F16C enabled even), and is done in a thread-safe way using InitOnceExecuteOnce / pthread_once depending on the platform.
OpenEXRCore-4_0.dll size goes 868KB -> 614KB.