Skip to content

Conversation

@ormsbee
Copy link
Contributor

@ormsbee ormsbee commented Aug 25, 2025

Status: Data models and fundamental logic are implemented. I need to clean up a lot and add tests and some admin debug screens. I also need to do the data migration for existing containers data.

This PR introduces a publishing app concept of dependency tracking, as well as adding a PublishingSideEffect analog of the existing DraftSideEffect. The high-level goal is to be able to efficiently do operations like, "publish this entity and all its children" or answer queries like "when was the last time the published state of this entity changed (including its children)?".

The publish_from_drafts function has been modified to publish all dependencies.

Follow-on to #328, broadly implements what was discussed in #317, though I found it simpler to model all dependencies at the PublishableEntity level, rather than splitting them out into separate Draft and Published dependencies. That being said, the dependencies_hash_digest must be computed separately, since the published and draft versions of any given dependency can be different at any given time.

@ormsbee ormsbee marked this pull request as draft August 25, 2025 22:42
@ormsbee ormsbee changed the title Simpler side effects Publishing dependency tracking and side-effect calculation for Draft and Published Aug 25, 2025
@ormsbee ormsbee changed the title Publishing dependency tracking and side-effect calculation for Draft and Published Publishing dependency tracking and side-effect calculation + dependency state summary for published and draft Aug 27, 2025
@ormsbee ormsbee force-pushed the simpler-side-effects branch from 72688d7 to 51511f6 Compare August 28, 2025 15:49
@ormsbee
Copy link
Contributor Author

ormsbee commented Sep 3, 2025

@bradenmacdonald, @kdmccormick: Based on your feedback today, I reworked the admin page entries to look like this:

image

It's not quite obvious, but the version information for the draft is italicized if it's different from the published version + dependency state.

@ormsbee ormsbee force-pushed the simpler-side-effects branch from 0b452cf to 9323705 Compare September 3, 2025 14:41
Create the PublishSideEffect model as the publishing analog for
the DraftSideEffect model, and create these side effects for
parent containers when one of the children is published. This
allows us to efficiently query when any subtree of content was
affected by a publish.

Also introduces PublishableEntityVersionDependency as a way of
tracking unpinned dependencies of a version, e.g. the children
of a version of a Unit.

Also introduces a new dependencies_hash_digest field to Draft
and Published models to track dependency state.

This PR adds Django admin functionality for this new data, as
well as optimizing some existing calls that used to have to do
more complex container -> entity list traversal.
@ormsbee ormsbee force-pushed the simpler-side-effects branch from 9323705 to 2e9cc4c Compare September 3, 2025 14:42
@bradenmacdonald
Copy link
Contributor

@ormsbee Cool, I think that's a bit more useful and clear.

@ormsbee ormsbee marked this pull request as ready for review September 3, 2025 17:20
@ormsbee
Copy link
Contributor Author

ormsbee commented Sep 3, 2025

@kdmccormick, @bradenmacdonald: I still have to write more tests, but I need to put this down for a day or two to attend to some long-overdue reviews. The code should be in a reviewable state, minus some union-related type annotation stuff that I have to look into later. Please review if you have time.

@ormsbee
Copy link
Contributor Author

ormsbee commented Sep 11, 2025

Annoying late night thought: It sounds like we may at some point elevate the state summary hash to something that external clients would use, e.g. for history comparison or caching purposes. But it's not stored in the history, since it's calculated on the Draft and Published tables. But it doesn't have to be--since any change to a Draft or a Published (even side-effects) has to be represented in the change logs, that means that we could put those hash summaries on the DraftChangeLogRecord and PublishChangeLogRecord instead. That way, you could see where a particular state was in the history of that thing, and what caused it to become that way.

@kdmccormick
Copy link
Member

@ormsbee I think it's a great idea

@kdmccormick
Copy link
Member

Before locking ourselves into UUIDs, I'm still curious if there are any other ways to generate a dependency hash.

A bad idea (just writing it down for posterity)

What about generating the hash from the intrinsic data of the PE, so that a hash fully captures that state of some entity in a package? For example, in pseudocode:

def hash(pev) = combine_hashes(
    hash(pev.publishable_entity.key),
    hash(pev.title),
    *pev.component_version.contents.hashes if pev.component_version,
    *hash(child for child in pev.container_version.entity_list.children) if pev.container,
)

I think this would work OK with libraries as they are today. But, they would not capture any data that is hung off of the LC models in the future. It would break down as soon as someone, for example, made a custom Container subclass with some critical metadata on the ContainerVersion subclass... updating that metadata would fail to modify the dependency hash.

Maybe this could be resolved by allowing PE subclasses to register custom hashing functions? Probably not worth it.

A half-baked idea

What about just hashing just the PEV by its key and version number? We already guarantee that (key, version number) uniquely identifies a PEV in a package, so it should be sufficient input for a hash whose job is to capture whether any changes have happened to PE's dependencies (which are in that same package).

(key, version_num) only breaks down across Open edX instances, but I don't think we need to worry about that for the dependency hash, right?

@ormsbee
Copy link
Contributor Author

ormsbee commented Sep 11, 2025

What about generating the hash from the intrinsic data of the PE, so that a hash fully captures that state of some entity in a package?

I know you state this is a bad idea, but I do want to explicitly highlight the tradeoffs we're making here for other people to read later, because in many ways this sort of hashing would be the gold standard for how to do this. As you point out, it would require anyone extending the data models to properly calculate a hash for their extended data. Not only would we need to create API hooks for that, it's asking a lot of people to get that hashing right.

Instead of this, both the UUID approach in this PR and the (identifier + version num) approach suggested below try to reduce the burden on developers by taking the shortcut that any PublishableEntityVersion and all the models that may hang off of it are intended to be immutable. Therefore, whatever the actual content is, we can use any shorthand identifier for it, so long as those identifiers won't conflict with each other. This means that we aren't quite as accurate as hashing in the intrinsic content. If we add a piece of text to a problem in v2 and remove it again in v3, there is nothing in the system that marks v1 and v3 as being equivalent to each other. In exchange for this loss of accuracy, we get a simpler API and fast, predictable hash generation.

Going to reply to the (key, version) comment in a bit...

@kdmccormick kdmccormick self-requested a review October 24, 2025 14:40
Copy link
Member

@kdmccormick kdmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still reviewing, just a couple quick thoughts so far.

Comment on lines 304 to 305
version = models.ForeignKey(PublishableEntityVersion, on_delete=models.RESTRICT)
entity = models.ForeignKey(PublishableEntity, on_delete=models.RESTRICT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[low-importance -- I think this only matters for django admin niceness and if/when we begin pruning PEVs]

The RESTRICT on entity makes sense to me. If V depends on E, then we wouldn't want to delete E as long as the relationship (V depends on E) exists.

But RESTRICT on version might be overly cautious? If V depends on E, and we're deleting V, then we should be reasonable to cascade-delete the relationship (V depends on E), no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right. I'll make the version CASCADE.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, I did this because if it's CASCADE, then entity deletion would retroactively change all the versions that referenced it, leaving no history of it.

Copy link
Member

@kdmccormick kdmccormick Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, I did this because if it's CASCADE, then entity deletion would retroactively change all the versions that referenced it, leaving no history of it.

I think I'm not understanding? Deletion of the referenced entity is disallowed unless deletion of referring versions has already happened.

Comment on lines 304 to 305
version = models.ForeignKey(PublishableEntityVersion, on_delete=models.RESTRICT)
entity = models.ForeignKey(PublishableEntity, on_delete=models.RESTRICT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at least one of these field names should change so that it's clear what the direction of the dependency is. One idea:

Suggested change
version = models.ForeignKey(PublishableEntityVersion, on_delete=models.RESTRICT)
entity = models.ForeignKey(PublishableEntity, on_delete=models.RESTRICT)
version = models.ForeignKey(PublishableEntityVersion, on_delete=models.RESTRICT)
dependent_entity = models.ForeignKey(PublishableEntity, on_delete=models.RESTRICT)

Copy link
Contributor Author

@ormsbee ormsbee Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this one, just because when I see dependent_entity, I think "the entity that depends on this thing", not the other way around. Like how a kid is a dependent, you know? dependency_entity maybe? But at that point we'd have code like dep.dependency_entity...?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, you're right, i had the semantics backwards. The version is the dependent. Concretely, in our system parents are the dependents of their children 😂

OK, let's avoid the word "dependent". dependency_entity is not great either. Maybe affected_version? Idk, I'm not in love with that. Feel free to keep it as it is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] One more idea I had:

  • referent_version
  • referenced_entity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I like that one. I'll do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the bright side, most of the time in the code this relation is used via the M2M relation names, which are more intuitive:

entity.affects.all()
entity_version.dependencies.all()

Copy link
Member

@kdmccormick kdmccormick Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final option thought -- referring_version might be slightly clearer than referent_version. Both are techinically correct, afaict.

@bradenmacdonald
Copy link
Contributor

What's left to do before we can merge this PR? We need the full descendant publishing to fix some libraries bugs - see #421

@ormsbee
Copy link
Contributor Author

ormsbee commented Oct 28, 2025

@bradenmacdonald: I did a major shift over of where I'm storing the hashes (onto the log records), but I'm still chasing down some transaction related bugs with some very unhelpful traces.

@ormsbee
Copy link
Contributor Author

ormsbee commented Oct 29, 2025

I've finished the refactoring to put the hash digests into the log records. Most of that work is in publishing/api.py, particularly _update_branch_dependencies_hash_digests() and hash_for_log_record().

I still need to consolidate the migrations and then do some work on my data migration. I had a hacky one working for the old way of doing things, now trying to bring that over to the new refactored hash tracking.

It looks like I've got failing tests in MySQL though, but not locally on sqlite. I'll look into that in the morning.

@ormsbee
Copy link
Contributor Author

ormsbee commented Oct 29, 2025

Ugh. I can sort of reproduce the test failures locally, but it looks like it's some kind of intermittent race condition.

@ormsbee
Copy link
Contributor Author

ormsbee commented Oct 29, 2025

Okay, I think I fixed the issue. Back to data migration and cleanup.

Copy link
Member

@kdmccormick kdmccormick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple more comments; will continue reviewing this afternoon

Comment on lines 261 to 265
The idea behind dependencies is that a PublishableEntityVersion may be
declared to reference unpinned PublishableEntities. Changes to those
referenced PublishableEntities still affect the draft or published state of
the PublisahbleEntity, even if the version of the PublishableEntity is not
incremented.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The idea behind dependencies is that a PublishableEntityVersion may be
declared to reference unpinned PublishableEntities. Changes to those
referenced PublishableEntities still affect the draft or published state of
the PublisahbleEntity, even if the version of the PublishableEntity is not
incremented.
The idea behind dependencies is that a PublishableEntity's Versions may
be declared to reference unpinned PublishableEntities. Changes to those
referenced PublishableEntities still affect the draft or published state of
the referent PublishableEntity, even if its version is not incremented.

[optional] "Changes to those referenced PublishableEntities still affect the draft or published state of the PublisahbleEntity" is kinda confusing, even though I knew what you meant. Here's a suggestion that might make it clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took your suggestion and added a little more text.

Comment on lines 304 to 305
version = models.ForeignKey(PublishableEntityVersion, on_delete=models.RESTRICT)
entity = models.ForeignKey(PublishableEntity, on_delete=models.RESTRICT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[optional] One more idea I had:

  • referent_version
  • referenced_entity

@ormsbee
Copy link
Contributor Author

ormsbee commented Oct 29, 2025

mypy suceeds locally but fails on this PR build. 😞

@ormsbee
Copy link
Contributor Author

ormsbee commented Oct 30, 2025

Okay, data migration is in there now.

@ormsbee
Copy link
Contributor Author

ormsbee commented Oct 30, 2025

The mypy issue is fixed now (in a hacky way, but I'm really not interested in debugging mypy on the night before a release cut. 😛)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants