Switch from pickled blobs to JSON data #1786

dsblank · 2024-10-10T21:56:00Z

This PR converts the database interface to use JSON data rather than the pickled blobs used since the early days.

Uses a new abstraction in the database: db.serializer
a. abstracts data column name
b. contains serialize/unserialize functions
Updates database format to 21
The conversion from 20 to 21 reads pickled blobs, and writes JSON data.
a. It does this by switching between serializers
New databases do not contain pickled blobs
Converted databases contain both fields

Nick-Hall · 2024-10-10T23:04:14Z

If we are moving from BLOBs to JSON then we should really use the new format. See PR #800.

The new format uses the to_json and from_json methods in the serialize module to build the json from the underlying classes. It comes with get_schema class methods which provide a JSON Schema that allow the validation that we already use in our unit tests.

The main benefit of the new format is that it is easier maintain and debug. Instead of lists we use dictionaries. So, for example, we refer to the field "parent_family_list" instead of field number 9.

Upgrades are no problem. We just read and write the raw data.

When I have more time I'll update you on discussion whilst you have been away.

dsblank · 2024-10-11T01:14:12Z

Oh, that sounds like a great idea! I'll take a look at the JSON format and switch to that. Should work even better with the SQL JSON_EXTRACT().

Nick-Hall · 2024-10-11T15:57:22Z

There are a few places where the new format is used, so we will get some bonus performance improvements.

Feel free to make changes to my existing code if you see a benefit.

You may also want to have a quick look at how we serialize GrampsType. Enough information is stored so that we can recreate the object, but I don't think that I chose to store all fields.

dsblank · 2024-10-12T23:49:21Z

Making some progress. Turns out, the serialized format had leaked into many other places, probably for speed. Probably good candidates for business logic.

dsblank · 2024-10-13T02:07:32Z

I added a to_dict() and from_dict() based on the to_json() and from_json(). I didn't know about the object hooks. Brilliant! That saves so much code.

dsblank · 2024-10-13T16:07:30Z

@Nick-Hall , I will probably need your assistance regarding the complete save/load of the to_json and from_json functions. I looked at your PR but as it touches 590 files, there is a lot there.

In this PR, I can now upgrade a database, and load the people views (except for name functions which I have to figure out).

Nick-Hall · 2024-10-13T17:01:29Z

@dsblank I have rebased PR #800 on the gramps51 branch. Only 25 files were actually changed.

You can also see the changes suggested by @prculley resulting from his testing and performance benchmarks.

dsblank · 2024-10-13T19:04:34Z

Thanks @Nick-Hall, that was very useful. I think that I will cherry pick some of the changes (like attribute name changes, elimination of private attributes).

You'll see that I did many of the same changes you made. But, one thing I found is that if we want to allow upgrades from previous versions, then we need to be able to read in blob_data, and write out json_data. I think my version has that covered.

I'll continue to make progress.

Nick-Hall · 2024-10-13T23:04:02Z

@dsblank Why are you removing the properties? The validation in the setters will no longer be called.

dsblank · 2024-10-14T02:47:46Z

@Nick-Hall , I thought that was what @prculley did for optimization, and I thought was needed. I can put those back :)

This reverts commit a9da731.

Nick-Hall · 2024-10-14T14:00:44Z

Perhaps we could consider a solution similar to that provided by the pickle __getstate__ and __setstate__ methods.

A get_state method in a base class could return a dictionary of public attributes by default. This could be overridden to add properties if required.

Aset_state method could write the values back. In the case of properties we could just set the corresponding private variable rather than calling the setter. The list to tuple conversion could also be done in this method.

I expect that only a handful of classes would need to override the default methods.

dsblank · 2024-11-29T17:35:14Z

CC: @Nick-Hall

dsblank · 2024-12-04T15:40:34Z

@Nick-Hall, you have any estimate on possible review on this PR (and the #1794 filter fixes)?

I have some available time coming up, and would like to start work on checking the addons for this next version (6.0 or 5.3).

Nick-Hall · 2024-12-04T23:00:36Z

@dsblank I'll make time this weekend, but may be able to start sooner.

dsblank · 2024-12-07T20:40:05Z

@Nick-Hall, if you'd like to meet over Google Meet or Zoom so that I can walk you (and others) through proposed changes, I'd be glad to.

Nick-Hall · 2024-12-07T23:56:01Z

This PR is looking good now.

I agree with you that the remaining serialize/unserialize code in the upgrade path should be left to another PR. Changes to the upgrades always require extra testing.

Would it be useful to log database upgrades? Perhaps a version table that could store the dates of database creation and any upgrades. I'm not suggesting adding to this PR though.

I'll convert my changes to add a create timestamp field to the primary objects so that they work with the new raw JSON format. Upgrades to the schema should be easier after this PR is merged. I'm not sure if we'll want to include it in the next release. We can discuss this later.

dsblank · 2024-12-08T00:33:57Z

Sounds good!

dsblank · 2024-12-08T14:15:52Z

Shall we merge this PR then?

Nick-Hall · 2024-12-08T14:20:15Z

I'm just about to do some final checks, followed by a rebase and merge now. It was getting late last night.

stevenyoungs

Good from my point of view

dsblank · 2024-12-08T15:23:16Z

Thank you all for the reviews and comments!

Nick-Hall · 2024-12-08T15:48:30Z

@dsblank Please don't merge PRs until I have done a final review. I was about to merge this, but noticed that the new gen.db.conversion_tools package was not listed in the setup.py and the two files it contains are not in the POTFILES.skip file.

In your merge a "Co-authored-by: stevenyoungs [email protected]" credit seems to have been lost. Was this intentional?

Otherwise, I appreciate that you squashed the commits and rebased to maintain a linear history according to our committing policies.

dsblank · 2024-12-08T15:56:28Z

Oh, sorry... how do you want to fix? I did not mean to lose Steve's credit.

Nick-Hall · 2024-12-08T16:31:47Z

I've created PR #1823 with the changes I was going to include.

Unfortunately, we can't go back and add the credit to the commit now. We can add a copyright line in a file if it hasn't already been done. I'll make sure that I mention Steve in the release announcement.

This PR made the following changes: * Database format 21: add JSON, remove pickle * Rename new column to json_data * Added to_dict, from_dict * Refactor for upgrade uses * Refactor serializers to classes * Updated libgedcom * Apply suggestions from code review * Fixed broken test: couldn't replicate, so went with new results * Migrated metadata to JSON * Refine BSDDB * Regular bug fix: citation date error * Added logging to serialize * A manual test script for validating conversion

Database format 21: add JSON, remove pickle

a05fe5a

dsblank requested a review from Nick-Hall October 10, 2024 21:56

dsblank added 3 commits October 10, 2024 18:10

Rename new column to json_data

a8ef265

Read prev version

97d3388

Load old version

7497abd

Added to_dict, from_dict

76d622e

dsblank added 2 commits October 12, 2024 14:43

Refactor for upgrade uses

c014c15

Peoplemodel mostly working

43ea2b2

dsblank added 3 commits October 12, 2024 22:32

Save new db 21 with JSON data field

1350f5c

Docstrings

e677833

Generic needs to handle both blob and json during upgrades

7ac4f7b

dsblank added 4 commits October 13, 2024 15:23

name fixes

08869eb

black linting

99b3d2b

Removed unneeded properties on primary objects

a9da731

Use a version of Nick's to/from json funcs

bc5ac5b

dsblank added 4 commits October 13, 2024 22:49

Revert "Removed unneeded properties on primary objects"

33928a9

This reverts commit a9da731.

linting

3785e46

WIP: eventmodel, and familymodel

b211640

Use column position in model

21caaa2

dsblank requested review from Nick-Hall and stevenyoungs December 8, 2024 14:16

Nick-Hall approved these changes Dec 8, 2024

View reviewed changes

stevenyoungs approved these changes Dec 8, 2024

View reviewed changes

dsblank merged commit 81d1e01 into master Dec 8, 2024
3 checks passed

Nick-Hall added a commit to Nick-Hall/gramps that referenced this pull request Dec 8, 2024

Add gen.db.conversion_tools from PR gramps-project#1786

e8ed7e3

stevenyoungs mentioned this pull request Dec 12, 2024

Teach GrampsType.set() to work with a dict #1825

Merged

dsblank pushed a commit that referenced this pull request Dec 22, 2024

Add gen.db.conversion_tools from PR #1786

5647610

DavidMStraub mentioned this pull request Jan 1, 2025

Upgrade path for Gramps 6.0/database schema version 21 gramps-project/gramps-web-api#596

Closed

SNoiraud pushed a commit to SNoiraud/gramps that referenced this pull request Jan 26, 2025

Add gen.db.conversion_tools from PR gramps-project#1786

a84643b

kulath mentioned this pull request Jan 27, 2025

Added db.select_from_TABLE methods #1828

Open

DavidMStraub added a commit to DavidMStraub/addons-source that referenced this pull request Feb 5, 2025

Implement changes of gramps-project/gramps#1786

c8174b9

Nick-Hall deleted the dsb/depickle branch February 13, 2025 15:32

DavidMStraub added a commit to DavidMStraub/addons-source that referenced this pull request Mar 16, 2025

Implement changes of gramps-project/gramps#1786

6918eed

GaryGriffin pushed a commit to gramps-project/addons-source that referenced this pull request Mar 19, 2025

Implement changes of gramps-project/gramps#1786

f6bb888

ForeverFloating pushed a commit to ForeverFloating/gramps that referenced this pull request Mar 21, 2025

Add gen.db.conversion_tools from PR gramps-project#1786

5d1015b

Switch from pickled blobs to JSON data #1786

Switch from pickled blobs to JSON data #1786

Uh oh!

Conversation

dsblank commented Oct 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Nick-Hall commented Oct 10, 2024

Uh oh!

dsblank commented Oct 11, 2024

Uh oh!

Nick-Hall commented Oct 11, 2024

Uh oh!

dsblank commented Oct 12, 2024

Uh oh!

dsblank commented Oct 13, 2024

Uh oh!

dsblank commented Oct 13, 2024

Uh oh!

Nick-Hall commented Oct 13, 2024

Uh oh!

dsblank commented Oct 13, 2024

Uh oh!

Nick-Hall commented Oct 13, 2024

Uh oh!

dsblank commented Oct 14, 2024

Uh oh!

Nick-Hall commented Oct 14, 2024

Uh oh!

dsblank commented Nov 29, 2024

Uh oh!

dsblank commented Dec 4, 2024

Uh oh!

Nick-Hall commented Dec 4, 2024

Uh oh!

dsblank commented Dec 7, 2024

Uh oh!

Nick-Hall commented Dec 7, 2024

Uh oh!

dsblank commented Dec 8, 2024

Uh oh!

dsblank commented Dec 8, 2024

Uh oh!

Nick-Hall commented Dec 8, 2024

Uh oh!

stevenyoungs left a comment

Choose a reason for hiding this comment

Uh oh!

dsblank commented Dec 8, 2024

Uh oh!

Uh oh!

Nick-Hall commented Dec 8, 2024

Uh oh!

dsblank commented Dec 8, 2024

Uh oh!

Nick-Hall commented Dec 8, 2024

Uh oh!

Uh oh!

dsblank commented Oct 10, 2024 •

edited

Loading