-
Notifications
You must be signed in to change notification settings - Fork 477
Refactor, fix, and optimizer filters/rules #1794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks @stevenyoungs for looking this over and the feedback. I'm hoping that @Nick-Hall is open to these three ideas (fixes, using JSON data, and the optimizer) because all three make the filter system so much faster while keeping the API almost the same. Many places in Gramps use the raw data for speed, including tools, views, displayers, and importers. It would be a shame if filters couldn't do the same. |
No problem. I'm also keen to see the benefits of more standard storage and see what opportunities it unlocks. The recent discussion of GQL is an interesting option. |
Your feedback is very useful. It always helps for another person to review the code. I very much appreciate your contributions. |
With the new JSON format, I think that we can regard the raw data as providing lightweight objects. The format mirrors the Gramps objects and is therefore much easier to understand and more resilient to updates. I've been doing some research today to see if we can make the dict format more object-like. For example, I am open to using JSON data in more core code, including filters. We should probably discuss where the raw format is acceptable. Third-party code should be careful when using the raw format. It may change between feature releases, not just major releases, but his has always been the case. An optimizer seems like a good idea. I haven't reviewed your code yet though. |
How about just wrapping our raw data with class DataDict(dict):
def __getattr__(self, key):
value = self[key]
if isinstance(value, dict):
return DataDict(value)
elif isinstance(value, list):
return DataList(value)
else:
return value
class DataList(list):
def __getitem__(self, position):
value = super().__getitem__(position)
if isinstance(value, dict):
return DataDict(value)
elif isinstance(value, list):
return DataList(value)
else:
return value It allows exactly what you describe, is low-cost, and doesn't require any more code than that above.
Agreed. The rule has seemed to be: use where it is needed for speed or easy representation. |
The only downside is that we leak the internal representation of the objects into the higher layers of code. But since those layers, in part, already use the raw format, this PR does not make a material difference. |
For many attributes of an object, it isn't so much "internal" now that we are switching to JSON. For example, this is the Person object API to get the parent family list: person.parent_family_list # object attribute
person.get_parent_family_handle_list() # function call With the above person.parent_family_list # JSON access
person.get_parent_family_handle_list() # function call, instantiates object The new version would hide the fact that the function call creates the object (but just once). So because the JSON data mirrors the actual attribute names, it doesn't have to change at all. |
Created a PR #1824 to explore attribute access of the JSON dict. |
I'll update this PR to use new DataDict attribute access to dict. Which will actually revert most of the data access to what it is in master. |
@Nick-Hall, I want to refactor this PR based on the DataDict from #1824. Should I change the base branch to #1824, or wait until it is merged? |
I have reviewed and merged PR #1824. Normally we would wait longer to give people longer to comment, but in this case the PR was related to existing work so I made an exception. |
Thank you! |
Thanks @cdhorn for the reviews! I'm not going to change the if/else constructs as I believe they are more clear and explicit. |
You're welcome. I always tend to fix these out of habit since pylint and a Sonarcube scan will call them out for refactoring. Is nice seeing all the great work you, Steven and Nick have been doing. |
Maybe this is something that should incorporated into the pipeline. |
Please no. There are lots of styles, but not this one. |
@dsblank I tried to fix a couple of conflicts, but now the unit tests are failing. It looks like the problem is due to the new type checking. Feel free to check or revert my merge. |
Ok, I'll take a look |
@Nick-Hall, types now adjusted. And a bug found via typing checks (a method was spelled wrong). I had to use a few This probably going to be difficult to merge in a few places. |
@dsblank Is this ready for me to do a final review? |
Yes, all ready! |
Do you want me to resolve conflicts? |
Yes, please. I will review after they have been fixed. |
@Nick-Hall, I corrected the diffs in the webUI and now everything passes tests, but now it wants to do a rebase via command line. But |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR looks good. I clearly have to thank previous reviewers for that.
Just a couple of things I noticed:
- The
po/POTFILES.skip
file needs updating.
- Add
gramps/gen/filters/optimizer.py
- Remove
gramps/gen/filters/rules/family/_memberbase.py
- The typing modules in
gramps/gen/filters/rules
are imported using absolute imports when they should be using relative imports.
The webUI creates branch merges which we don't want in our commit history. The workflow that I recommend is to rebase from the command line on a regular basis and force push back to the branch. However, your workflow doesn't actually cause a problem. To merge, I would do a If you wanted to have several commits in a PR then, you would need to avoid the branch merges. With multiple commits, I still rebase, but then I commit with the We also need to remember that the commit messages are used to create the ChangeLog and the first and last lines (and possibly more) become part of the NEWS file and release notes. I often have to remind developers about this. Finally, I was going to mention that we normally work in our own forks now rather than creating branches in origin. Again, I doesn't actually cause a problem, so I didn't think it was worth mentioning earlier. |
Thanks, I would appreciate that!
Absolutely. And they aren't too numerous to name: @cdhorn, @stevenyoungs, and @fxtmtrsine. |
And thanks to @emyoulation and others for emoji encouragement and positive feedback :) |
@Nick-Hall, you can take it from here? Let me know if there is still resolving to be done. |
This change does four things: 1. Unrolls the recursive call of filter.apply(), and splits out single checks into filter.apply_to_one(). 2. Uses data attributes (`person.gender`) rather than accessor functions (`person.get_gender()`) where possible. 3. Adds an optimizer based on `rule.selected_handles` sets. 4. Adds typing hints to make sure the right objects are passed into methods. Final comparison of finding those related to home person in 40k person Family Tree, between Gramps 5.2 and this change (Gramps 6.0), in seconds (smaller is better): Version | Prepare Time | Apply Time | Total Time --------| -------------:|-----------:|----------: Gramps 5.2 | 4.5 | 27.7 | 32.2 Gramps 6.0 | 8.0 | 0.5 | 8.5 The above uses the optimizer. Here is a test finding all people with a tag (5 people match): Version | Prepare Time | Apply Time | Total Time --------| -------------:|-----------:|----------: Gramps 5.2 | 0.0 | 5.0 | 5.0 Gramps 6.0 | 0.0 | 1.6 | 1.6 Recall that converting from JSON to objects is a little slower than converting from array BLOBS to objects, so this is a large improvement. Co-authored-by: Christopher Horn <[email protected]> Co-authored-by: stevenyoungs <[email protected]>
71b3a16
to
1280aa4
Compare
Squashed and rebased. |
|
This PR does four things:
person.gender
) rather than accessor functions (person.get_gender()
) where possiblerule.selected_handles
setsFinal comparison of finding those related to home person in 40k person Family Tree, between Gramps 5.2 and master + this PR (Gramps 6.0), in seconds (smaller is better):
The above uses the optimizer. Here is a test finding all people with a tag (5 people match):
Recall that converting from JSON to objects is a little slower than converting from array BLOBS to objects, so this is a large improvement.