-
Notifications
You must be signed in to change notification settings - Fork 474
Switch from pickled blobs to JSON data #1786
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If we are moving from BLOBs to JSON then we should really use the new format. See PR #800. The new format uses the The main benefit of the new format is that it is easier maintain and debug. Instead of lists we use dictionaries. So, for example, we refer to the field "parent_family_list" instead of field number 9. Upgrades are no problem. We just read and write the raw data. When I have more time I'll update you on discussion whilst you have been away. |
Oh, that sounds like a great idea! I'll take a look at the JSON format and switch to that. Should work even better with the SQL JSON_EXTRACT(). |
There are a few places where the new format is used, so we will get some bonus performance improvements. Feel free to make changes to my existing code if you see a benefit. You may also want to have a quick look at how we serialize |
Making some progress. Turns out, the serialized format had leaked into many other places, probably for speed. Probably good candidates for business logic. |
I added a |
@Nick-Hall , I will probably need your assistance regarding the complete save/load of the to_json and from_json functions. I looked at your PR but as it touches 590 files, there is a lot there. In this PR, I can now upgrade a database, and load the people views (except for name functions which I have to figure out). |
Thanks @Nick-Hall, that was very useful. I think that I will cherry pick some of the changes (like attribute name changes, elimination of private attributes). You'll see that I did many of the same changes you made. But, one thing I found is that if we want to allow upgrades from previous versions, then we need to be able to read in blob_data, and write out json_data. I think my version has that covered. I'll continue to make progress. |
@dsblank Why are you removing the properties? The validation in the setters will no longer be called. |
@Nick-Hall , I thought that was what @prculley did for optimization, and I thought was needed. I can put those back :) |
Perhaps we could consider a solution similar to that provided by the pickle A A I expect that only a handful of classes would need to override the default methods. |
CC: @Nick-Hall |
@Nick-Hall, you have any estimate on possible review on this PR (and the #1794 filter fixes)? I have some available time coming up, and would like to start work on checking the addons for this next version (6.0 or 5.3). |
@dsblank I'll make time this weekend, but may be able to start sooner. |
@Nick-Hall, if you'd like to meet over Google Meet or Zoom so that I can walk you (and others) through proposed changes, I'd be glad to. |
This PR is looking good now. I agree with you that the remaining serialize/unserialize code in the upgrade path should be left to another PR. Changes to the upgrades always require extra testing. Would it be useful to log database upgrades? Perhaps a version table that could store the dates of database creation and any upgrades. I'm not suggesting adding to this PR though. I'll convert my changes to add a create timestamp field to the primary objects so that they work with the new raw JSON format. Upgrades to the schema should be easier after this PR is merged. I'm not sure if we'll want to include it in the next release. We can discuss this later. |
Sounds good! |
Shall we merge this PR then? |
I'm just about to do some final checks, followed by a rebase and merge now. It was getting late last night. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good from my point of view
Thank you all for the reviews and comments! |
@dsblank Please don't merge PRs until I have done a final review. I was about to merge this, but noticed that the new In your merge a "Co-authored-by: stevenyoungs [email protected]" credit seems to have been lost. Was this intentional? Otherwise, I appreciate that you squashed the commits and rebased to maintain a linear history according to our committing policies. |
Oh, sorry... how do you want to fix? I did not mean to lose Steve's credit. |
I've created PR #1823 with the changes I was going to include. Unfortunately, we can't go back and add the credit to the commit now. We can add a copyright line in a file if it hasn't already been done. I'll make sure that I mention Steve in the release announcement. |
This PR made the following changes: * Database format 21: add JSON, remove pickle * Rename new column to json_data * Added to_dict, from_dict * Refactor for upgrade uses * Refactor serializers to classes * Updated libgedcom * Apply suggestions from code review * Fixed broken test: couldn't replicate, so went with new results * Migrated metadata to JSON * Refine BSDDB * Regular bug fix: citation date error * Added logging to serialize * A manual test script for validating conversion
This PR made the following changes: * Database format 21: add JSON, remove pickle * Rename new column to json_data * Added to_dict, from_dict * Refactor for upgrade uses * Refactor serializers to classes * Updated libgedcom * Apply suggestions from code review * Fixed broken test: couldn't replicate, so went with new results * Migrated metadata to JSON * Refine BSDDB * Regular bug fix: citation date error * Added logging to serialize * A manual test script for validating conversion
This PR converts the database interface to use JSON data rather than the pickled blobs used since the early days.
db.serializer
a. abstracts data column name
b. contains serialize/unserialize functions
a. It does this by switching between serializers