-
Notifications
You must be signed in to change notification settings - Fork 7
Description
It might be in another package, but we can't completely count on waldo, just as we couldn't completely count on rlang, these are not trying to be 100% accurate and we need to be.
We need a better system than the current proxy system, this could be a similar function, but it would take the 2 inputs rather than independently editing inputs, and it should be applied after all differences are recorded, so the intact diff data is still available.
We also need to be able to be very precise with customization, waldo does some things by default, on named environments for instance, we want better granularity, and be able to expose it to users.
I would really like a better way to explore the diffs too. I tried something with woof https://github.com/moodymudskipper/woof but sacrificed robustness for convenience and we also can't do this.
I'm not sure if the user needs to navigate the diff but I want to be able to say that all differences were do to environment cloning, or srcrefs, pointers, NSE artifacts etc. This would allow us to have a more comforting output when we reproduce all meaningful elements properly.
This in turn would allow us to tell more when we're not accurate, so for instance we would always show a message when generating equivalent environments or data.table objects.
We could have a tree structure for the diff like in woof but with unambiguous names, for instance diff$`.subset2(10)`$`attr("b")` and we'd maintain a table with the leaves of the tree and individual diffs (or absence on either side), along with some meta data like class, type, or special labels for srcrefs etc.
The diff object would be such table with id and parent_id too, and a method for $ would allow the navigation and subsetting of the table, to print the table we just unclass.
We can use the same path system as in waldo, except that we don't use $ and @ that are S3 method dependent.
We use identical with the strictest options to build the object and then what we show might depend on options.
We could even use serialize to be extremely accurate on the comparison, but optionally because it'd be too slow and mostly unnecessary.
waldo shows a double diff for data frames containing atomics, maybe for other classes too, it's very convenient but we don't need this I think, we're good with showing diffs only for nodes. which means we could also really just have a nested list for diffs, and nodes could keep both versions in attributes (both internal nodes i.e. lists and and leaves i.e. character.
An attribute can't be NULL so to differentiate absent and NULL we'll have a single data attribute, which is a list of 1 or 2 named components.
As a first step we can still use waldo for the diffs of leaves.