Skip to content

Conversation

@dwong2708
Copy link
Contributor

@dwong2708 dwong2708 commented Oct 4, 2025

Resolves: #386

Description

This PR introduces improvements to the validation process for restoring learning packages, along with minor adjustments to the backup process.

Changes

Backup

  • Removed uuid field
  • Removed [entity.container] section from component TOML files

Restore

  • Improved validation logic
  • Added a learning package serializer to align with the pattern of extract → validate → save
  • Introduced a structured response for the load API to provide clearer information to the restore endpoint
  • Added a preliminary dump file validation to check file structure before restoring, enabling early validation utilities prior to invoking the async task

@openedx-webhooks
Copy link

openedx-webhooks commented Oct 4, 2025

Thanks for the pull request, @dwong2708!

This repository is currently maintained by @axim-engineering.

Once you've gone through the following steps feel free to tag them in a comment and let them know that your changes are ready for engineering review.

🔘 Get product approval

If you haven't already, check this list to see if your contribution needs to go through the product review process.

  • If it does, you'll need to submit a product proposal for your contribution, and have it reviewed by the Product Working Group.
    • This process (including the steps you'll need to take) is documented here.
  • If it doesn't, simply proceed with the next step.
🔘 Provide context

To help your reviewers and other members of the community understand the purpose and larger context of your changes, feel free to add as much of the following information to the PR description as you can:

  • Dependencies

    This PR must be merged before / after / at the same time as ...

  • Blockers

    This PR is waiting for OEP-1234 to be accepted.

  • Timeline information

    This PR must be merged by XX date because ...

  • Partner information

    This is for a course on edx.org.

  • Supporting documentation
  • Relevant Open edX discussion forum threads
🔘 Get a green build

If one or more checks are failing, continue working on your changes until this is no longer the case and your build turns green.


Where can I find more information?

If you'd like to get more details on all aspects of the review process for open source pull requests (OSPRs), check out the following resources:

When can I expect my changes to be merged?

Our goal is to get community contributions seen and reviewed as efficiently as possible.

However, the amount of time that it takes to review and merge a PR can vary significantly based on factors such as:

  • The size and impact of the changes that it introduces
  • The need for product review
  • Maintenance status of the parent repository

💡 As a result it may take up to several weeks or months to complete a review and merge your PR.

@openedx-webhooks openedx-webhooks added the open-source-contribution PR author is not from Axim or 2U label Oct 4, 2025
@github-project-automation github-project-automation bot moved this to Needs Triage in Contributions Oct 4, 2025
@dwong2708 dwong2708 marked this pull request as ready for review October 5, 2025 19:01
@dwong2708 dwong2708 requested a review from ormsbee October 5, 2025 19:01
@mphilbrick211 mphilbrick211 added the mao-onboarding Reviewing this will help onboard devs from an Axim mission-aligned organization (MAO). label Oct 6, 2025
@mphilbrick211 mphilbrick211 moved this from Needs Triage to Ready for Review in Contributions Oct 6, 2025
load_dump_zip_file(file_name)
message = f'{file_name} loaded successfully'
start_time = time.time()
response = load_dump_zip_file(file_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"response" might be misinterpreted to be related to HTTP request/response. If there's no better, more specific thing to call this, we can call this result for now. But load_dump_zip_file is also confusing function name, because "load" and "dump" are often opposite of each other (e.g. json.load(), json.dump())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I have adjusted. Thanks

Comment on lines 437 to 441
return {
"status": "error",
"log_file_error": self._write_errors(), # return a StringIO with the errors
"general_info": None
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to define an actual data structure here to return, like a dataclass.

It's also not clear what "general_info" means in this context.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I've added it. Thanks

# --------------------------

@transaction.atomic
def load(self) -> dict[str, Any]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to want an arg here to allow the target key of the LearningPackage to be passed in here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though that can be a separate PR if you want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might also be an optional param to the init() instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to do this before this PR can be merged. Otherwise I can do other PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to this comment the staged key generation logic was added.

"general_info": {
"learning_package_key": learning_package.key,
"learning_package_title": learning_package.title,
"containers": num_containers,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're probably going to want the breakdown of sections/subsections/units here, not just total containers. I think you can just leave this out until we get UX guidance.

}

def preliminary_check(self) -> Tuple[list[dict[str, Any]], dict[str, Any]]:
"""Performs a preliminary check of the zip file structure and mandatory files."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this function is only doing a check for the existence of the package file, then please rename it accordingly. The term "preliminary check" is overly vague.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

for valid_draft in containers.get("unit_drafts", []):
entity_key = valid_draft.pop("entity_key")
version_num = valid_draft["version_num"] # Should exist, validated earlier
entity_version_identifier = f"{entity_key}__v{version_num}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use tuples instead, so that other code later doesn't have to try to parse it back out. Tuples work fine as dict keys.

Copy link
Contributor Author

@dwong2708 dwong2708 Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes have been applied. Thanks

entity_key = valid_draft.pop("entity_key")
version_num = valid_draft["version_num"] # Should exist, validated earlier
entity_version_identifier = f"{entity_key}__v{version_num}"
if entity_version_identifier in self.all_published_entities_versions:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could make a helper method here and avoid having to comment about the same pattern multiple times, e.g.

Suggested change
if entity_version_identifier in self.all_published_entities_versions:
if self.version_already_exists(entity_version_identifier): # then the comment goes in the helper

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@ormsbee ormsbee moved this from Ready for Review to In Eng Review in Contributions Oct 9, 2025
@dwong2708 dwong2708 requested a review from ormsbee October 10, 2025 17:06
@dwong2708 dwong2708 force-pushed the dwong2708/backup_restore_lib_adjustments branch from b25c186 to 0931130 Compare October 10, 2025 17:48
Copy link
Contributor

@ormsbee ormsbee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor requests to adjust the interface and try to reduce edge cases.

Thank you!



def load_dump_zip_file(path: str) -> None:
def load_library_from_zip(path: str, user: UserType | None = None, use_staged_lp_key: bool = False) -> dict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few things on this:

  1. This is a LearningPackage, not a Library. The distinction isn't important now, but it will be once we start storing courses in LC in the coming months. load_learning_package() is fine.
  2. Instead of making it take a used_stage_lp_key argument, please have it take a key arg and generate if key==None. The default behavior of this function should not trust the key that was dumped in the archive itself, and it will be a potential security issue if people do trust that later on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So maybe something like:

Suggested change
def load_library_from_zip(path: str, user: UserType | None = None, use_staged_lp_key: bool = False) -> dict:
def load_learning_package(path: str, key: str = None, user: UserType | None = None) -> dict:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this update, we’ll have two cases:

  • Key provided: The user doesn’t need to generate the staged key.
  • Key not provided: The user must generate the staged key.

Changes applied.

The timestamp at the end ensures the key is unique.
"""
username = user.username if user else DEFAULT_USERNAME
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use a DEFAULT_USERNAME because "command" (or anything like that) is a valid username that someone could plausibly pick. It may be that nothing you have here is a security concern, but at some point, someone is likely to add something that assumes that a user has access to things that are under their username and won't check this edge case. It's also likely to not get cleaned up properly.

Honestly, I'd just force the user to put in a username when invoking the management command--it will reduce edge cases, and if someone has shell access to run management commands, they basically have access to everything anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Generate a staged learning package key based on the given base key.
Arguments:
lp_key (str): The base key of the learning package.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should specify that this is the archive's LearningPackage key. Maybe even call it archive_lp_key to make sure it's not confused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


_, org_slug, lp_slug = parts[:3]
timestamp = int(time.time() * 1000) # Current time in milliseconds
return f"lib-restore:{username}:{org_slug}:{lp_slug}:{timestamp}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this temp name is coming from LC now instead of from libraries code (when I initially sketched it out), please make a prefix that's not "lib", so something like lp-restore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied. Thanks

@dwong2708 dwong2708 requested a review from ormsbee October 16, 2025 05:34
@ormsbee ormsbee merged commit 8e71a58 into openedx:main Oct 16, 2025
11 checks passed
@github-project-automation github-project-automation bot moved this from In Eng Review to Done in Contributions Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

mao-onboarding Reviewing this will help onboard devs from an Axim mission-aligned organization (MAO). open-source-contribution PR author is not from Axim or 2U

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

End-To-End Testing and Adjustments for lp_load command

4 participants