Float Table Finder #8431
PennRobotics
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Would a floating point data table finder be worth developing?
In the bare minimal case, I show values I often encounter and their mapping:
3f800000
41200000
447a0000
3eaaaaab
3a83126f
472c4400
40490fdb
4b371b00
4bb71b00
4c371b00
Sure, it's possible there are instructions or another data type that is referred by
472c4400
, like the stringG,D
, but if the next value is473b8000
(as was the case in the last three audio interfaces—all from different manufacturers—whose firmware I've disassembled) then I know I'm looking at a table of sampling rates that is likely to also have 88200, 96000, and sometimes 192000 or 32000.I see how it would be easy enough to create a new AbstractAnalyzer and look through a list of pairs (or even just a list of ints) but I don't know enough about Ghidra's data types, Java, and who this would even serve to decide to do it. (It also isn't a showstopper for me. I usually can see an obvious float table in the hex dump and can just load a binary into Trace32, open a Draw window for a memory region with the %Float specifier, zoom in and out and scan for anything that looks like a decay, impulse, sine table, or otherwise identifiable sequence. This gets 95% of cases.)
Here are discussion points I propose for a hypothetical Float Table Finder Analyzer:
Depending on the architecture, doing float analysis before code decompilation avoids trouble e.g. 8051, where any arbitrary byte sequence is likely to result in a valid op code and many of those op codes convert into C code. Running Aggressive Instruction Finder before identifying float tables may result in blocks of code that almost look like they make sense, until you are lucky to end up with a jump in the middle of another mnemonic, resulting in an error, and then you click back to find the source of the error is a misidentified float table.
Also, if you already identify a float table before code analysis, it is more likely (at least on some architectures) that a function will have the appropriate argument type defined and then display call arguments correctly in the Decompilation window.
To illustrate the heuristic point of whether a value converted from ulong to float (the resulting float shown here as an decimal integer) should be accepted as a valid float:
Similarly, should 0.216 or 0.0000216 be accepted as readily? What about 1/216000000? 1/216? 1/360? Why? Why not? Most importantly, how??
As far as pattern matching, consider you have 3f800000 and then 3f792a30, 3f728241, 3f6c01a3, ... and on the 256th entry, 3a83126f:
If the next byte starts with anything other than 3a, the table size is a "predictable" 0x100.
This would be the outcome of an exponential decay starting on 1.0 at f[0] and ending on 0.001 at f[255]. The downside is you wouldn't be able to predict/tabulate the decay rate for every imaginable sequence. You can have non-exponential decay (i.e. polynomial) that still starts on 1.0 and ends on 0.001. What if the user wanted to end on a different value (e.g. 5 time constants) or choose a start value where all values collectively sum to 1.0?
You can see that the first byte of each successive value changes monotonically and rarely: starting from 3f and eventually reaching 3a.
The next byte is generally within a few numbers of the other and often decreasing (except when the first byte changes) and then the entropy of the third and fourth bytes is all over the place.
Because the mantissa isn't byte-aligned, this obviously isn't a perfect method, but you could analyze the exponent and mantissa separately and identify patterns in how much they change and in which direction.
Visually, you'd immediate recognize a decay lookup table if a large selection of data/code where the table is contained was graphed as floats, because our visual system does excellent pattern matching, and patterns are just regions of low entropy. In the hex dump/memory viewer, you could also identify this table it if you knew what to look for and grouped by 4 bytes. As a list of ?? bytes in the Listing window, it's easy to overlook. If the lookup table is already defined as something else (especially code) it gets harder to realize what it actually is and it might take a while before you realize something is wrong.
The beauty here is this method should also pick up things like sine tables or even a large enough table of frequency clock speeds (which I've seen from time to time in CMSIS/vendor microcontroller code) because they will have the same low entropy changes as a decay table.
Then… if such a thing can be successfully implemented and is well received… we do it all again for doubles!
Beta Was this translation helpful? Give feedback.
All reactions