-
Notifications
You must be signed in to change notification settings - Fork 17
Publishing dependency tracking and side-effect calculation + dependency state summary for published and draft #369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
72688d7 to
51511f6
Compare
|
@bradenmacdonald, @kdmccormick: Based on your feedback today, I reworked the admin page entries to look like this:
It's not quite obvious, but the version information for the draft is italicized if it's different from the published version + dependency state. |
0b452cf to
9323705
Compare
Create the PublishSideEffect model as the publishing analog for the DraftSideEffect model, and create these side effects for parent containers when one of the children is published. This allows us to efficiently query when any subtree of content was affected by a publish. Also introduces PublishableEntityVersionDependency as a way of tracking unpinned dependencies of a version, e.g. the children of a version of a Unit. Also introduces a new dependencies_hash_digest field to Draft and Published models to track dependency state. This PR adds Django admin functionality for this new data, as well as optimizing some existing calls that used to have to do more complex container -> entity list traversal.
9323705 to
2e9cc4c
Compare
|
@ormsbee Cool, I think that's a bit more useful and clear. |
|
@kdmccormick, @bradenmacdonald: I still have to write more tests, but I need to put this down for a day or two to attend to some long-overdue reviews. The code should be in a reviewable state, minus some union-related type annotation stuff that I have to look into later. Please review if you have time. |
|
Annoying late night thought: It sounds like we may at some point elevate the state summary hash to something that external clients would use, e.g. for history comparison or caching purposes. But it's not stored in the history, since it's calculated on the Draft and Published tables. But it doesn't have to be--since any change to a Draft or a Published (even side-effects) has to be represented in the change logs, that means that we could put those hash summaries on the DraftChangeLogRecord and PublishChangeLogRecord instead. That way, you could see where a particular state was in the history of that thing, and what caused it to become that way. |
|
@ormsbee I think it's a great idea |
|
Before locking ourselves into UUIDs, I'm still curious if there are any other ways to generate a dependency hash. A bad idea (just writing it down for posterity)What about generating the hash from the intrinsic data of the PE, so that a hash fully captures that state of some entity in a package? For example, in pseudocode: I think this would work OK with libraries as they are today. But, they would not capture any data that is hung off of the LC models in the future. It would break down as soon as someone, for example, made a custom Container subclass with some critical metadata on the ContainerVersion subclass... updating that metadata would fail to modify the dependency hash. Maybe this could be resolved by allowing PE subclasses to register custom hashing functions? Probably not worth it. A half-baked ideaWhat about just hashing just the PEV by its key and version number? We already guarantee that (key, version number) uniquely identifies a PEV in a package, so it should be sufficient input for a hash whose job is to capture whether any changes have happened to PE's dependencies (which are in that same package). (key, version_num) only breaks down across Open edX instances, but I don't think we need to worry about that for the dependency hash, right? |
I know you state this is a bad idea, but I do want to explicitly highlight the tradeoffs we're making here for other people to read later, because in many ways this sort of hashing would be the gold standard for how to do this. As you point out, it would require anyone extending the data models to properly calculate a hash for their extended data. Not only would we need to create API hooks for that, it's asking a lot of people to get that hashing right. Instead of this, both the UUID approach in this PR and the (identifier + version num) approach suggested below try to reduce the burden on developers by taking the shortcut that any Going to reply to the (key, version) comment in a bit... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still reviewing, just a couple quick thoughts so far.
| version = models.ForeignKey(PublishableEntityVersion, on_delete=models.RESTRICT) | ||
| entity = models.ForeignKey(PublishableEntity, on_delete=models.RESTRICT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[low-importance -- I think this only matters for django admin niceness and if/when we begin pruning PEVs]
The RESTRICT on entity makes sense to me. If V depends on E, then we wouldn't want to delete E as long as the relationship (V depends on E) exists.
But RESTRICT on version might be overly cautious? If V depends on E, and we're deleting V, then we should be reasonable to cascade-delete the relationship (V depends on E), no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you're right. I'll make the version CASCADE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, I did this because if it's CASCADE, then entity deletion would retroactively change all the versions that referenced it, leaving no history of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, I did this because if it's CASCADE, then entity deletion would retroactively change all the versions that referenced it, leaving no history of it.
I think I'm not understanding? Deletion of the referenced entity is disallowed unless deletion of referring versions has already happened.
| version = models.ForeignKey(PublishableEntityVersion, on_delete=models.RESTRICT) | ||
| entity = models.ForeignKey(PublishableEntity, on_delete=models.RESTRICT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think at least one of these field names should change so that it's clear what the direction of the dependency is. One idea:
| version = models.ForeignKey(PublishableEntityVersion, on_delete=models.RESTRICT) | |
| entity = models.ForeignKey(PublishableEntity, on_delete=models.RESTRICT) | |
| version = models.ForeignKey(PublishableEntityVersion, on_delete=models.RESTRICT) | |
| dependent_entity = models.ForeignKey(PublishableEntity, on_delete=models.RESTRICT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about this one, just because when I see dependent_entity, I think "the entity that depends on this thing", not the other way around. Like how a kid is a dependent, you know? dependency_entity maybe? But at that point we'd have code like dep.dependency_entity...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you're right, i had the semantics backwards. The version is the dependent. Concretely, in our system parents are the dependents of their children 😂
OK, let's avoid the word "dependent". dependency_entity is not great either. Maybe affected_version? Idk, I'm not in love with that. Feel free to keep it as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional] One more idea I had:
referent_versionreferenced_entity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I like that one. I'll do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the bright side, most of the time in the code this relation is used via the M2M relation names, which are more intuitive:
entity.affects.all()
entity_version.dependencies.all()There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
final option thought -- referring_version might be slightly clearer than referent_version. Both are techinically correct, afaict.
|
What's left to do before we can merge this PR? We need the full descendant publishing to fix some libraries bugs - see #421 |
openedx_learning/apps/authoring/publishing/models/publishable_entity.py
Outdated
Show resolved
Hide resolved
|
@bradenmacdonald: I did a major shift over of where I'm storing the hashes (onto the log records), but I'm still chasing down some transaction related bugs with some very unhelpful traces. |
|
I've finished the refactoring to put the hash digests into the log records. Most of that work is in publishing/api.py, particularly I still need to consolidate the migrations and then do some work on my data migration. I had a hacky one working for the old way of doing things, now trying to bring that over to the new refactored hash tracking. It looks like I've got failing tests in MySQL though, but not locally on sqlite. I'll look into that in the morning. |
|
Ugh. I can sort of reproduce the test failures locally, but it looks like it's some kind of intermittent race condition. |
|
Okay, I think I fixed the issue. Back to data migration and cleanup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple more comments; will continue reviewing this afternoon
| The idea behind dependencies is that a PublishableEntityVersion may be | ||
| declared to reference unpinned PublishableEntities. Changes to those | ||
| referenced PublishableEntities still affect the draft or published state of | ||
| the PublisahbleEntity, even if the version of the PublishableEntity is not | ||
| incremented. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The idea behind dependencies is that a PublishableEntityVersion may be | |
| declared to reference unpinned PublishableEntities. Changes to those | |
| referenced PublishableEntities still affect the draft or published state of | |
| the PublisahbleEntity, even if the version of the PublishableEntity is not | |
| incremented. | |
| The idea behind dependencies is that a PublishableEntity's Versions may | |
| be declared to reference unpinned PublishableEntities. Changes to those | |
| referenced PublishableEntities still affect the draft or published state of | |
| the referent PublishableEntity, even if its version is not incremented. |
[optional] "Changes to those referenced PublishableEntities still affect the draft or published state of the PublisahbleEntity" is kinda confusing, even though I knew what you meant. Here's a suggestion that might make it clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took your suggestion and added a little more text.
| version = models.ForeignKey(PublishableEntityVersion, on_delete=models.RESTRICT) | ||
| entity = models.ForeignKey(PublishableEntity, on_delete=models.RESTRICT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[optional] One more idea I had:
referent_versionreferenced_entity
|
mypy suceeds locally but fails on this PR build. 😞 |
|
Okay, data migration is in there now. |
|
The mypy issue is fixed now (in a hacky way, but I'm really not interested in debugging mypy on the night before a release cut. 😛) |

Status: Data models and fundamental logic are implemented. I need to clean up a lot and add tests and some admin debug screens. I also need to do the data migration for existing containers data.
This PR introduces a publishing app concept of dependency tracking, as well as adding a
PublishingSideEffectanalog of the existingDraftSideEffect. The high-level goal is to be able to efficiently do operations like, "publish this entity and all its children" or answer queries like "when was the last time the published state of this entity changed (including its children)?".The
publish_from_draftsfunction has been modified to publish all dependencies.Follow-on to #328, broadly implements what was discussed in #317, though I found it simpler to model all dependencies at the PublishableEntity level, rather than splitting them out into separate Draft and Published dependencies. That being said, the
dependencies_hash_digestmust be computed separately, since the published and draft versions of any given dependency can be different at any given time.