-
Notifications
You must be signed in to change notification settings - Fork 17
Containers + Units in Learning Core #278
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Containers + Units in Learning Core #278
Conversation
|
Thanks for the pull request, @bradenmacdonald! This repository is currently maintained by Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review. 🔘 Get product approvalIf you haven't already, check this list to see if your contribution needs to go through the product review process.
🔘 Provide contextTo help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:
🔘 Get a green buildIf one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green. Where can I find more information?If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources: When can I expect my changes to be merged?Our goal is to get community contributions seen and reviewed as efficiently as possible. However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:
💡 As a result it may take up to several weeks or months to complete a review and merge your PR. |
I think we'll want some bookkeeping models in the
|
Yeah, I feel like we had this discussion at some point and came to the conclusion that it was because my original sketch for |
I had originally thought that these would be useful for history purposes, but I think it's not necessary if we have minor version tracking. I do wonder if the relationship between containers and versioning is basic enough where |
|
Though I guess one very confusing part about having a major and minor version for containers is that it's possible to publish any particular subset of the children, so it's not like there's really a published "1.12" version–"1.12" just means "twelve changes have happened to some descendent". So it'd be more like we'd be running two minor counters for children–one for draft changes, one for publishes. |
|
Also, I guess we should double-check the actual cost of encoding snapshots of the child versions given the current representation. Trying to sketch some of this out now... |
For now, I implemented a |
cac79e1 to
3545527
Compare
|
|
||
| Args: | ||
| unit_version_pk: The unit version ID. | ||
| def get_components_in_draft_unit( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a bit suboptimal to me that we have to have separate APIs for get_components_in_draft_unit and get_components_in_published_unit. This is because when things are unpinned, we need to know whether to use the latest draft or latest published version.
I don't think this is a big deal, but it does "feel" to me like it goes against the spirit of the publishing app, where you have a series of versions and draft/published are just pointers to a particular version. With components (and presumably most PublishableEntities in general other than our new containers), if you ask for "version 37", you get a specific thing regardless of whether it is/was a draft or published. Whereas in this new case, you could try to request a particular version of a unit, but unless we know whether you want it as a draft or you want it as published, we usually can't fully describe that particular version's contents.
That is also why my proposed get_components_in_published_unit API doesn't accept a version number; it can't really get anything other than the current published version. (The unit version is insufficient to specify a snapshot of the unit when things are unpinned.)
That said, there likely will still be a way to get snapshots of any historic published version you want, analogous to requesting a specific component version, but it will require asking for the state of the published unit when the entire LearningPackage was at a specific version, which is given by the PublishLog ID. Edit: I implemented an example of this in the PR as get_components_in_published_unit_as_of()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the use case for a Container that holds both pinned and unpinned Entries? I can see why we'd want them all pinned or all unpinned, but not the mixed case.
But regardless, it's possible to combine both get_entities_in_draft_container and get_entities_in_published_container into a single method with a published parameter, which looks a little awkward in the code, but would make the components and unit APIs simpler.
def get_entities_in_container(
container: Container | ContainerMixin,
published: bool = False,
) -> list[ContainerEntityListEntry] | None:
"""
[ 🛑 UNSTABLE ]
Get the list of entities and their versions in the draft or published
version of the given container.
"""
if isinstance(container, ContainerMixin):
container = container.container
if isinstance(container, Container):
cv = container.versioning.published if published else container.versioning.draft
else:
raise TypeError(f"Expected Container or ContainerMixin; got {type(container)}")
if published and cv is None:
return None # There is no published version of this container. Should this be an exception?
assert isinstance(cv, ContainerVersion)
entity_list = []
for row in cv.entity_list.entitylistrow_set.order_by("order_num"):
entity_version = row.entity_version or (
row.entity.published.version if published else row.entity.draft.version
)
if entity_version is not None: # As long as this hasn't been soft-deleted:
entity_list.append(ContainerEntityListEntry(
entity_version=entity_version,
pinned=row.entity_version is not None,
))
# else should we indicate somehow a deleted item was here?
return entity_listThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the use case for a Container that holds both pinned and unpinned Entries? I can see why we'd want them all pinned or all unpinned, but not the mixed case.
We don't actually have a use case for pinned entities at all yet, so I'm a bit worried that we won't need it at all in the future, and this is needlessly complicated. But there is a strong possibility that we'll need pinned entities to support CCX-style use cases and/or dynamic containers (problem banks, split test, randomization, adaptive learning). We haven't really got a good handle yet on how either will work, which is why all these APIs are being marked "unstable".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I'm open to combining these into one function, but I think we should always require an explicit published=True or published=False in that case, because Learning Core will be used "equally"? for CMS and LMS use cases and it's not correct to assume that either draft or published is the default.
So I'd like to hear what others think, and also about whether container: Container | ContainerMixin, is a nice way to handle the parameter or if that needs to be changed to container_id: int for consistency with other Learning Core APIs. The other authoring APIs only accept integer IDs or QuerySets as their parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should always require an explicit published=True or published=False in that case, because Learning Core will be used "equally"? for CMS and LMS use cases and it's not correct to assume that either draft or published is the default.
I think it's worth combining these methods into one. 👍 on explicit published param.
whether container: Container | ContainerMixin, is a nice way to handle the parameter or if that needs to be changed to container_id: int for consistency with other Learning Core APIs.
I think it's fine to leave these parameters as objects. Thinking the perspective of the platform REST API: we'll get a UsageKey for a Unit from the user, but since oel doesn't know about usage keys, we'll then need to fetch the Unit (aka ContainerMixin) for that UsageKey anyway. So we might as well pass that Unit and prevent re-fetching it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worth combining these methods into one. 👍 on explicit published param.
OK, I pushed a change to consolidate these into one function in both the containers and units APIs.
ed88f06 to
f31c1de
Compare
443fe4b to
12c6d3a
Compare
|
@ormsbee @kdmccormick CC @mariajgrimaldi @pomegranited Could I please get time from you next week to review this PR? The earlier in the week the better. Or let me know if there's some other way you'd like to land this prototype - it doesn't have to be through this big PR. |
|
I'd also like to squash this whole thing down to one or two commits by the end of this week, before y'all start reviewing. But I could also open a new PR if you want me to keep the entire history. |
|
@bradenmacdonald: thanks for tagging me! Count on my review next week :) |
openedx_learning/apps/authoring/publishing/model_mixins/publishable_entity.py
Outdated
Show resolved
Hide resolved
| # In the case of multi-table inheritance and mixins, this can get tricky. | ||
| # Example: | ||
| # content_version_model_cls is UnitVersion, which is a subclass of ContainerVersion | ||
| # This versioning helper can be accessed via unit_version.versioning (should return UnitVersion) or | ||
| # via container_version.versioning (should return ContainerVersion) | ||
| intermediate_model = field_to_pev.model # example: ContainerVersion | ||
| # This is the field on the subclass (e.g. UnitVersion) that gets | ||
| # the intermediate (e.g. ContainerVersion). Example: "UnitVersion.container_version" (1:1 foreign key) | ||
| field_to_intermediate = self.content_version_model_cls._meta.get_ancestor_link(intermediate_model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really appreciate these comments, btw. Thank you.
| entity_list = models.ForeignKey(EntityList, on_delete=models.CASCADE) | ||
|
|
||
| # This ordering should be treated as immutable–if the ordering needs to | ||
| # change, we create a new EntityList and copy things over. | ||
| order_num = models.PositiveIntegerField() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is missing a unique constraint on (entity_list, order_num), so we don't get duplicates in race conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking it doesn't really matter if order_num is not unique, but you're right - if it's not unique, it likely indicates a race condition and we should raise an exception rather than ignoring.
Added in e6334fb
| container: Container, | ||
| *, | ||
| published: bool, | ||
| ) -> list[ContainerEntityListEntry] | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(optional, not required for merge, but a future thought): Would it be more natural for folks to query the child entities via attributes on the models instead of having an API function for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So someone could write something like:
published_unit = unit.versioning.published
if published_unit:
for component_version in published_unit.children():
# do something...The children call could return a queryset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me, and I think it would be a nice API. My only hesitation is that in general, it's unclear what you're "allowed" to do vs. not "allowed" to do when Django models are involved in Learning Core's API. We don't want people modifying fields on the models and calling .save() for example, but nothing distinguishes that from a method like .children(). So I guess I prefer to have API consumers treat the models as opaque things, and stick to a functional API in .api wherever possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
published_unit = unit.versioning.published
if published_unit:
for component_version in published_unit.children():
# do something...Oh, also this example reminds me: in this case, I think that published_unit is just a UnitVersion and it doesn't "know" if it's the published or draft or some other version of the unit. And even if it did know that, it doesn't know how to properly retrieve unpinned children. Say that UnitVersion 12 is both the latest published and draft version, but some of its draft children have unpublished changes. when you call unit.versioning.published.children(), it doesn't "know" whether to return the latest draft unpinned children or the latest published ones. That's why any API to retrieve children needs an explicit published=True/False parameter, and also another reason I think it doesn't make sense to have a .children() method, unless it just returns the entity list without specifying what version the children are at.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense to me, and I think it would be a nice API. My only hesitation is that in general, it's unclear what you're "allowed" to do vs. not "allowed" to do when Django models are involved in Learning Core's API. We don't want people modifying fields on the models and calling
.save()for example, but nothing distinguishes that from a method like.children(). So I guess I prefer to have API consumers treat the models as opaque things, and stick to a functional API in.apiwherever possible.
The guideline I had in mind was to allow folks to query relationships but strictly draw the line at any sort of mutation/write operations.
Oh, also this example reminds me: in this case, I think that
published_unitis just aUnitVersionand it doesn't "know" if it's the published or draft or some other version of the unit. And even if it did know that, it doesn't know how to properly retrieve unpinned children.
Oof. Right. So no children call, but maybe draft_children and published_children calls later on.
| ContainerModel = TypeVar('ContainerModel', bound=Container) | ||
| ContainerVersionModel = TypeVar('ContainerVersionModel', bound=ContainerVersion) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a comment on why this is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
| created: datetime, | ||
| created_by: int | None, | ||
| # The types on the following line are correct, but mypy will complain - https://github.com/python/mypy/issues/3737 | ||
| container_model: type[ContainerModel] = Container, # type: ignore[assignment] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit (optional, bikeshedding): container_class, container_cls or container_type might be less ambiguous than container_model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I went with container_cls instead.
| publishable_entities_pks: list[int] | None, | ||
| entity_version_pks: list[int | None] | None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think it's slightly more natural to send tuples than have these separate, but again, non-blocking commentary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to do that, and I'd like to also introduce some sort of verification that the entity version objects actually match the corresponding entity objects, but I think I'll leave that to another PR or get someone from my team to follow up.
| if published: | ||
| container_version = container.versioning.published | ||
| if container_version is None: | ||
| return None # There is no published version of this container (yet). Should this be an exception? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'd raise an error here for consistency for now, but if we eventually switch over to an API where they use the relation on the model, this shouldn't be something that we have to worry about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, updated.
ormsbee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor nits, but nothing blocking.
|
@kdmccormick did you want to do another pass on this? GitHub doesn't want me to merge until you clear your "request changes" review. |
#300) This PR addresses in #278 (comment) by creating a dataclass that encapsulates a PublishableEntity.pk with an optional PublishableEntityVersion.pk, so that the container API methods don't have to pass in two separate lists when specifying pinned versions for a container's children. This change only affects the (unstable) publishing app's container-related API. The (also unstable) Units API, which is used by edx-platform, remains unchanged.
This PR builds on #254 and implements new
containersandunitsapps in Learning Core - one for abstract containers in general and one for units in particular. The recent commits are mine, but much of the implementation and conceptual work was done by @mariajgrimaldi (and @ormsbee) (Thanks!).This is not a complete backend implementation of "units" on its own - I've sketched out the remaining work in this issue.
Changes in this PR
Compared to the prototype in #254 :
containersmodels/API is now part of thepublishingapp (NEW)defined_list,initial_list, andfrozen_listinto a singleentity_list. Removed functions related todefined_list/initial_list/frozen_listfrom theapifor now. These are more of an internal implementation detail and we don't want the test cases tied too heavily to the internal details.EntitityListRow: combineddraft_versionandpublished_versioninto justentity_version.get_components_in_draft_unitandget_components_in_published_unitcontains_unpublished_changesto check if a unit's unpinned children have unpublished changes, even if the unit itself is unchanged.get_components_in_published_unit_as_ofthat can get the state of the unit at any previous point in the learning package's publication history.get_containers_with_entitythat can find which containers contain the given PublishableEntity, e.g. which units contain a component that we're about to delete.Old Questions (mostly answered now - ignore these)
UX specs say modifying a child will mark the unit as "having unpublished changes" but ADR says otherwise. Does it make sense to implement a container_has_unpublished_changes() helper function that recursively checks for unpublished changes in a container (and its child containers, and so on...), so we can display this in the UI? See slack discussion
Do I understand correctly then that the version history of units should NOT be thought of as a series of snapshots of the unit (as I had been thinking) but rather a changelog of edits made to the unit itself (but ignoring most changes made its the children)?
PublishLogPK. Should we have an API to get an old version of a unit by passing in a PublishLog PK (or timestamp)?get_components_in_published_unit_as_ofIs it correct that units allow you to pin the version of a component, which means you always get exactly that component, but sequences/subsections will not really support pinning in the same way, because even if you pin a unit to a specific version, the unpinned components in that unit can still be updated?
Do we have any plans to implement pinned/unpinned components in the UI or is everything unpinned?
If we have unpinned published versions, why do we need the various defined_list, initial_list, frozen_list etc? So far in the test cases I've been writing I haven't found a need to even reference
initial_listnorfrozen_list.Why do we specifydraft_version_pksandpublished_version_pkswhen creating a new draft version of a unit? Seems wrong to say anything at all about the published version when it doesn't yet exist and may never exist.Note to self: if soft deleting a component doesn't mark the unit as changed, we need to update the UI to show soft deleted draft components and allow publishing the deletion. Or maybe this is handled by a
container_has_unpublished_changesand recursivelying pseudo-publishing the container (no changes to container but force publish the children including soft deletes).Private ref: FAL-4051