Skip to content

Conversation

@bradenmacdonald
Copy link
Contributor

@bradenmacdonald bradenmacdonald commented Feb 14, 2025

This PR builds on #254 and implements new containers and units apps in Learning Core - one for abstract containers in general and one for units in particular. The recent commits are mine, but much of the implementation and conceptual work was done by @mariajgrimaldi (and @ormsbee) (Thanks!).

This is not a complete backend implementation of "units" on its own - I've sketched out the remaining work in this issue.

Changes in this PR

Compared to the prototype in #254 :

  1. The containers models/API is now part of the publishing app (NEW)
  2. Publishing a container will always publish its children, even if you didn't specify them in the publish request. (NEW)
  3. Expanded the unit test cases significantly.
  4. Combined defined_list, initial_list, and frozen_list into a single entity_list. Removed functions related to defined_list/initial_list/frozen_list from the api for now. These are more of an internal implementation detail and we don't want the test cases tied too heavily to the internal details.
  5. Simplified EntitityListRow: combined draft_version and published_version into just entity_version.
  6. Added new high-level APIs:
    • get_components_in_draft_unit and get_components_in_published_unit
    • contains_unpublished_changes to check if a unit's unpinned children have unpublished changes, even if the unit itself is unchanged.
    • get_components_in_published_unit_as_of that can get the state of the unit at any previous point in the learning package's publication history.
    • get_containers_with_entity that can find which containers contain the given PublishableEntity, e.g. which units contain a component that we're about to delete.

Old Questions (mostly answered now - ignore these)
  1. UX specs say modifying a child will mark the unit as "having unpublished changes" but ADR says otherwise. Does it make sense to implement a container_has_unpublished_changes() helper function that recursively checks for unpublished changes in a container (and its child containers, and so on...), so we can display this in the UI? See slack discussion

    • I implemented this for now.
  2. Do I understand correctly then that the version history of units should NOT be thought of as a series of snapshots of the unit (as I had been thinking) but rather a changelog of edits made to the unit itself (but ignoring most changes made its the children)?

    • If so, To get a snapshot of the unit at a point in time, it's necessary to pass in a timestamp or a PublishLog PK. Should we have an API to get an old version of a unit by passing in a PublishLog PK (or timestamp)?
    • I have implemented a proof of concept as get_components_in_published_unit_as_of
  3. Is it correct that units allow you to pin the version of a component, which means you always get exactly that component, but sequences/subsections will not really support pinning in the same way, because even if you pin a unit to a specific version, the unpinned components in that unit can still be updated?

  • If so, is there any value in even implementing "pinning" for containers larger than units?
  1. Do we have any plans to implement pinned/unpinned components in the UI or is everything unpinned?

  2. If we have unpinned published versions, why do we need the various defined_list, initial_list, frozen_list etc? So far in the test cases I've been writing I haven't found a need to even reference initial_list nor frozen_list.

  3. Why do we specify draft_version_pks and published_version_pks when creating a new draft version of a unit? Seems wrong to say anything at all about the published version when it doesn't yet exist and may never exist.

  4. Note to self: if soft deleting a component doesn't mark the unit as changed, we need to update the UI to show soft deleted draft components and allow publishing the deletion. Or maybe this is handled by a container_has_unpublished_changes and recursivelying pseudo-publishing the container (no changes to container but force publish the children including soft deletes).

# This value is mutable if and only if there are unpinned references in
# defined_list. In that case, frozen_list should start as None, and be
# updated to pin references when another version of this Container becomes
# the Draft version. But if this version ever becomes the Draft again
# (e.g. the user hits "discard changes" or some kind of revert happens),
# then we need to clear this back to None.

  1. Based on the above, frozen list doesn't make a lot of sense, because the versions of the container don't capture many of the changes to the contents. e.g. container version 1 had component version 1 through 50, and container version 2 had component versions 51 through 100, what is the point of tracking whether container version 1 "should" show v1 or v50 when they're wildly different?

Private ref: FAL-4051

@openedx-webhooks
Copy link

Thanks for the pull request, @bradenmacdonald!

This repository is currently maintained by @axim-engineering.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.


Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Feb 14, 2025
@ormsbee
Copy link
Contributor

ormsbee commented Feb 14, 2025

  1. UX specs say modifying a child will mark the unit as "having unpublished changes" but ADR says otherwise. Does it make sense to implement a container_has_unpublished_changes() helper function that recursively checks for unpublished changes in a container (and its child containers, and so on...), so we can display this in the UI? See slack discussion

I think we'll want some bookkeeping models in the containers app that track when a new child version is created and recursively marks its ancestors "dirty". Similarly, I think we'll want to have one that tracks "this publish affected these ancestors, even if they haven't otherwise changed". I haven't worked out any details beyond the vague thoughts that:

  • We need to keep it really lightweight.
  • It might be useful to have a major/minor version numbering of that for containers, in the sense that major number tracks changes to the container's metadata, ordering, and membership; and the minor number tracks changes to the contents. So 1.0, 1.1, 1.7, 2.0, 2.1, etc.
  • I used to think that we had to track all containers that used a particular entity and then auto-generate new draft versions of those containers with the entity missing if the entity was soft-deleted... and then also auto-publish those containers when te entity's deletion was published. But now I'm thinking that we do that where we know it makes sense (like in the originating Unit where we're editing it), but we otherwise are tolerant of when unpinned children disappear like that.

@ormsbee
Copy link
Contributor

ormsbee commented Feb 14, 2025

  1. Why do we specify draft_version_pks and published_version_pks when creating a new draft version of a unit? Seems wrong to say anything at all about the published version when it doesn't yet exist and may never exist.

Yeah, I feel like we had this discussion at some point and came to the conclusion that it was because my original sketch for EntityListRow had draft_version and published_version, but really it should have just had a single version field.

@ormsbee
Copy link
Contributor

ormsbee commented Feb 14, 2025

  1. If we have unpinned published versions, why do we need the various defined_list, initial_list, frozen_list etc? So far in the test cases I've been writing I haven't found a need to even reference initial_list nor frozen_list.

I had originally thought that these would be useful for history purposes, but I think it's not necessary if we have minor version tracking.

I do wonder if the relationship between containers and versioning is basic enough where containers should be folded directly into the publishing app (with specific container-types still having their own apps).

@ormsbee
Copy link
Contributor

ormsbee commented Feb 14, 2025

Though I guess one very confusing part about having a major and minor version for containers is that it's possible to publish any particular subset of the children, so it's not like there's really a published "1.12" version–"1.12" just means "twelve changes have happened to some descendent". So it'd be more like we'd be running two minor counters for children–one for draft changes, one for publishes.

@ormsbee
Copy link
Contributor

ormsbee commented Feb 14, 2025

Also, I guess we should double-check the actual cost of encoding snapshots of the child versions given the current representation. Trying to sketch some of this out now...

@bradenmacdonald
Copy link
Contributor Author

bradenmacdonald commented Feb 14, 2025

I think we'll want some bookkeeping models in the containers app that track when a new child version is created and recursively marks its ancestors "dirty".

For now, I implemented a contains_unpublished_changes() that recursively scans a container for changes; it uses a fixed 3 queries to determine if a unit has changes, regardless of how many components are in the unit. This should be enough to write the test cases for now and confirm correctness, but doesn't scale above unit size so will need to be replaced by a tracking-based approach or something like a closure table for all the containers in each learning package. (Presumably only for drafts or possibly separate closure tables for draft and published trees.) I kind of like the closure table idea more than a tracking-based approach, as long as its performant enough - it would let us optimize so many different things.

@bradenmacdonald bradenmacdonald force-pushed the braden/containers-units branch 2 times, most recently from cac79e1 to 3545527 Compare February 15, 2025 00:39
@bradenmacdonald bradenmacdonald changed the title Understanding containers/units: High level APIs, high level test cases, open questions Containers/units: High level APIs and expanded test cases Feb 15, 2025

Args:
unit_version_pk: The unit version ID.
def get_components_in_draft_unit(
Copy link
Contributor Author

@bradenmacdonald bradenmacdonald Feb 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit suboptimal to me that we have to have separate APIs for get_components_in_draft_unit and get_components_in_published_unit. This is because when things are unpinned, we need to know whether to use the latest draft or latest published version.

I don't think this is a big deal, but it does "feel" to me like it goes against the spirit of the publishing app, where you have a series of versions and draft/published are just pointers to a particular version. With components (and presumably most PublishableEntities in general other than our new containers), if you ask for "version 37", you get a specific thing regardless of whether it is/was a draft or published. Whereas in this new case, you could try to request a particular version of a unit, but unless we know whether you want it as a draft or you want it as published, we usually can't fully describe that particular version's contents.


That is also why my proposed get_components_in_published_unit API doesn't accept a version number; it can't really get anything other than the current published version. (The unit version is insufficient to specify a snapshot of the unit when things are unpinned.)

That said, there likely will still be a way to get snapshots of any historic published version you want, analogous to requesting a specific component version, but it will require asking for the state of the published unit when the entire LearningPackage was at a specific version, which is given by the PublishLog ID. Edit: I implemented an example of this in the PR as get_components_in_published_unit_as_of()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use case for a Container that holds both pinned and unpinned Entries? I can see why we'd want them all pinned or all unpinned, but not the mixed case.

But regardless, it's possible to combine both get_entities_in_draft_container and get_entities_in_published_container into a single method with a published parameter, which looks a little awkward in the code, but would make the components and unit APIs simpler.

def get_entities_in_container(
    container: Container | ContainerMixin,
    published: bool = False,
) -> list[ContainerEntityListEntry] | None:
    """
    [ 🛑 UNSTABLE ]
    Get the list of entities and their versions in the draft or published
    version of the given container.
    """
    if isinstance(container, ContainerMixin):
        container = container.container
    if isinstance(container, Container):
        cv = container.versioning.published if published else container.versioning.draft
    else:
        raise TypeError(f"Expected Container or ContainerMixin; got {type(container)}")

    if published and cv is None:
        return None  # There is no published version of this container. Should this be an exception?

    assert isinstance(cv, ContainerVersion)
    entity_list = []
    for row in cv.entity_list.entitylistrow_set.order_by("order_num"):
        entity_version = row.entity_version or (
            row.entity.published.version if published else row.entity.draft.version
        )
        if entity_version is not None:  # As long as this hasn't been soft-deleted:
            entity_list.append(ContainerEntityListEntry(
                entity_version=entity_version,
                pinned=row.entity_version is not None,
        ))
        # else should we indicate somehow a deleted item was here?

    return entity_list

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the use case for a Container that holds both pinned and unpinned Entries? I can see why we'd want them all pinned or all unpinned, but not the mixed case.

We don't actually have a use case for pinned entities at all yet, so I'm a bit worried that we won't need it at all in the future, and this is needlessly complicated. But there is a strong possibility that we'll need pinned entities to support CCX-style use cases and/or dynamic containers (problem banks, split test, randomization, adaptive learning). We haven't really got a good handle yet on how either will work, which is why all these APIs are being marked "unstable".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I'm open to combining these into one function, but I think we should always require an explicit published=True or published=False in that case, because Learning Core will be used "equally"? for CMS and LMS use cases and it's not correct to assume that either draft or published is the default.

So I'd like to hear what others think, and also about whether container: Container | ContainerMixin, is a nice way to handle the parameter or if that needs to be changed to container_id: int for consistency with other Learning Core APIs. The other authoring APIs only accept integer IDs or QuerySets as their parameters.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should always require an explicit published=True or published=False in that case, because Learning Core will be used "equally"? for CMS and LMS use cases and it's not correct to assume that either draft or published is the default.

I think it's worth combining these methods into one. 👍 on explicit published param.

whether container: Container | ContainerMixin, is a nice way to handle the parameter or if that needs to be changed to container_id: int for consistency with other Learning Core APIs.

I think it's fine to leave these parameters as objects. Thinking the perspective of the platform REST API: we'll get a UsageKey for a Unit from the user, but since oel doesn't know about usage keys, we'll then need to fetch the Unit (aka ContainerMixin) for that UsageKey anyway. So we might as well pass that Unit and prevent re-fetching it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worth combining these methods into one. 👍 on explicit published param.

OK, I pushed a change to consolidate these into one function in both the containers and units APIs.

@bradenmacdonald
Copy link
Contributor Author

@ormsbee @kdmccormick CC @mariajgrimaldi @pomegranited Could I please get time from you next week to review this PR? The earlier in the week the better. Or let me know if there's some other way you'd like to land this prototype - it doesn't have to be through this big PR.

@bradenmacdonald bradenmacdonald changed the title Containers/units: High level APIs and expanded test cases Containers + Units in Learning Core Feb 26, 2025
@bradenmacdonald bradenmacdonald changed the base branch from braden/containers-units-ref to main February 26, 2025 23:36
@bradenmacdonald
Copy link
Contributor Author

I'd also like to squash this whole thing down to one or two commits by the end of this week, before y'all start reviewing. But I could also open a new PR if you want me to keep the entire history.

@mariajgrimaldi
Copy link
Member

@bradenmacdonald: thanks for tagging me! Count on my review next week :)

Comment on lines 143 to 151
# In the case of multi-table inheritance and mixins, this can get tricky.
# Example:
# content_version_model_cls is UnitVersion, which is a subclass of ContainerVersion
# This versioning helper can be accessed via unit_version.versioning (should return UnitVersion) or
# via container_version.versioning (should return ContainerVersion)
intermediate_model = field_to_pev.model # example: ContainerVersion
# This is the field on the subclass (e.g. UnitVersion) that gets
# the intermediate (e.g. ContainerVersion). Example: "UnitVersion.container_version" (1:1 foreign key)
field_to_intermediate = self.content_version_model_cls._meta.get_ancestor_link(intermediate_model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really appreciate these comments, btw. Thank you.

Comment on lines +30 to +34
entity_list = models.ForeignKey(EntityList, on_delete=models.CASCADE)

# This ordering should be treated as immutable–if the ordering needs to
# change, we create a new EntityList and copy things over.
order_num = models.PositiveIntegerField()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is missing a unique constraint on (entity_list, order_num), so we don't get duplicates in race conditions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking it doesn't really matter if order_num is not unique, but you're right - if it's not unique, it likely indicates a race condition and we should raise an exception rather than ignoring.

Added in e6334fb

container: Container,
*,
published: bool,
) -> list[ContainerEntityListEntry] | None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional, not required for merge, but a future thought): Would it be more natural for folks to query the child entities via attributes on the models instead of having an API function for it?

Copy link
Contributor

@ormsbee ormsbee Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So someone could write something like:

published_unit = unit.versioning.published
if published_unit:
    for component_version in published_unit.children():
        # do something...

The children call could return a queryset.

Copy link
Contributor Author

@bradenmacdonald bradenmacdonald Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me, and I think it would be a nice API. My only hesitation is that in general, it's unclear what you're "allowed" to do vs. not "allowed" to do when Django models are involved in Learning Core's API. We don't want people modifying fields on the models and calling .save() for example, but nothing distinguishes that from a method like .children(). So I guess I prefer to have API consumers treat the models as opaque things, and stick to a functional API in .api wherever possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

published_unit = unit.versioning.published
if published_unit:
    for component_version in published_unit.children():
        # do something...

Oh, also this example reminds me: in this case, I think that published_unit is just a UnitVersion and it doesn't "know" if it's the published or draft or some other version of the unit. And even if it did know that, it doesn't know how to properly retrieve unpinned children. Say that UnitVersion 12 is both the latest published and draft version, but some of its draft children have unpublished changes. when you call unit.versioning.published.children(), it doesn't "know" whether to return the latest draft unpinned children or the latest published ones. That's why any API to retrieve children needs an explicit published=True/False parameter, and also another reason I think it doesn't make sense to have a .children() method, unless it just returns the entity list without specifying what version the children are at.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense to me, and I think it would be a nice API. My only hesitation is that in general, it's unclear what you're "allowed" to do vs. not "allowed" to do when Django models are involved in Learning Core's API. We don't want people modifying fields on the models and calling .save() for example, but nothing distinguishes that from a method like .children(). So I guess I prefer to have API consumers treat the models as opaque things, and stick to a functional API in .api wherever possible.

The guideline I had in mind was to allow folks to query relationships but strictly draw the line at any sort of mutation/write operations.

Oh, also this example reminds me: in this case, I think that published_unit is just a UnitVersion and it doesn't "know" if it's the published or draft or some other version of the unit. And even if it did know that, it doesn't know how to properly retrieve unpinned children.

Oof. Right. So no children call, but maybe draft_children and published_children calls later on.

Comment on lines +34 to +35
ContainerModel = TypeVar('ContainerModel', bound=Container)
ContainerVersionModel = TypeVar('ContainerVersionModel', bound=ContainerVersion)
Copy link
Contributor

@ormsbee ormsbee Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a comment on why this is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

created: datetime,
created_by: int | None,
# The types on the following line are correct, but mypy will complain - https://github.com/python/mypy/issues/3737
container_model: type[ContainerModel] = Container, # type: ignore[assignment]
Copy link
Contributor

@ormsbee ormsbee Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit (optional, bikeshedding): container_class, container_cls or container_type might be less ambiguous than container_model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I went with container_cls instead.

Comment on lines +763 to +764
publishable_entities_pks: list[int] | None,
entity_version_pks: list[int | None] | None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think it's slightly more natural to send tuples than have these separate, but again, non-blocking commentary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to do that, and I'd like to also introduce some sort of verification that the entity version objects actually match the corresponding entity objects, but I think I'll leave that to another PR or get someone from my team to follow up.

if published:
container_version = container.versioning.published
if container_version is None:
return None # There is no published version of this container (yet). Should this be an exception?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'd raise an error here for consistency for now, but if we eventually switch over to an API where they use the relation on the model, this shouldn't be something that we have to worry about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, updated.

Copy link
Contributor

@ormsbee ormsbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits, but nothing blocking.

@bradenmacdonald
Copy link
Contributor Author

@kdmccormick did you want to do another pass on this? GitHub doesn't want me to merge until you clear your "request changes" review.

@bradenmacdonald bradenmacdonald merged commit 0f63879 into openedx:main Mar 14, 2025
9 checks passed
@github-project-automation github-project-automation bot moved this from In Eng Review to Done in Contributions Mar 14, 2025
@bradenmacdonald bradenmacdonald deleted the braden/containers-units branch March 14, 2025 16:40
ChrisChV pushed a commit that referenced this pull request Apr 11, 2025
#300)

This PR addresses in #278 (comment) by creating a dataclass that encapsulates a PublishableEntity.pk with an optional PublishableEntityVersion.pk, so that the container API methods don't have to pass in two separate lists when specifying pinned versions for a container's children.

This change only affects the (unstable) publishing app's container-related API. The (also unstable) Units API, which is used by edx-platform, remains unchanged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

open-source-contribution PR author is not from Axim or 2U

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

6 participants