-
Notifications
You must be signed in to change notification settings - Fork 245
Fix modelstore deleting logic when multiple reference refer to the same blob/snapshot #1620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Reviewer's GuideImplements reference-counted deletion of blobs and snapshots, adds download caching and safe symlink creation to prevent redundant operations, and includes new bats tests covering Hugging Face URIs and quant tags. Sequence diagram for reference-counted snapshot/blob deletionsequenceDiagram
actor User
participant ModelStore
participant RefFile
participant FileSystem
User->>ModelStore: remove_snapshot(model_tag)
ModelStore->>RefFile: get_ref_file(model_tag)
ModelStore->>ModelStore: _get_refcounts(ref_file.hash)
ModelStore->>FileSystem: Check blob refcounts
alt blob refcount <= 1
ModelStore->>FileSystem: Remove blob file(s)
else blob refcount > 1
ModelStore->>FileSystem: Keep blob file(s)
end
alt snapshot refcount <= 1
ModelStore->>FileSystem: Remove snapshot directory
else snapshot refcount > 1
ModelStore->>FileSystem: Keep snapshot directory
end
ModelStore->>FileSystem: Remove ref file
Sequence diagram for improved download and symlink logicsequenceDiagram
participant ModelStore
participant FileSystem
participant Downloader
ModelStore->>FileSystem: Check if blob_file_path exists
alt blob does not exist
ModelStore->>Downloader: download_file(url, headers, dest_path)
else blob exists
ModelStore->>FileSystem: Use cached blob
end
ModelStore->>FileSystem: Check if symlink exists
alt symlink does not exist
ModelStore->>FileSystem: Create symlink
else symlink exists
ModelStore->>FileSystem: Skip symlink creation
end
Class diagram for updated ModelStore deletion logicclassDiagram
class ModelStore {
+remove_snapshot(model_tag: str)
+_get_refcounts(snapshot_hash: str) tuple[int, Counter[str]]
-_remove_blob_file(snapshot_file_path: str)
...
}
ModelStore --> RefFile : uses
class RefFile {
+from_path(path: str) RefFile
+filenames: List[str]
+hash: str
...
}
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @olliewalsh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances the robustness of the model store's deletion mechanism. It addresses a critical issue where deleting a model reference could inadvertently remove shared underlying data, even if other references still depended on it. By introducing a comprehensive reference counting system for both blobs and snapshots, the changes ensure that data persistence aligns with its usage across multiple model tags. Additionally, improvements to the download and symlink creation process contribute to a more efficient and resilient model management system.
Highlights
- Model Deletion Logic: Implemented reference counting for blobs and snapshots within the model store. This ensures that underlying data (blobs and snapshot directories) are only deleted when there are no remaining references to them, preventing accidental data loss when multiple model tags point to the same content.
- Caching and Efficiency Improvements: Added checks to prevent re-downloading blob files if they already exist locally and to avoid re-creating symbolic links if they are already present. This improves the efficiency and robustness of the model download and snapshot creation process.
- New System Tests: Introduced new
bats
system tests specifically designed to validate the multi-reference deletion logic for Hugging Face models, ensuring that shared blobs and snapshots are handled correctly during removal operations.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @olliewalsh - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `ramalama/model_store.py:54` </location>
<code_context>
- dest_path=blob_file_path,
- show_progress=self.should_show_progress,
- )
+ if not os.path.exists(blob_file_path):
+ download_file(
+ url=self.url,
+ headers=self.header,
+ dest_path=blob_file_path,
+ show_progress=self.should_show_progress,
+ )
+ else:
+ logger.debug(f"Using cached blob for {self.name} ({os.path.basename(blob_file_path)})")
return os.path.relpath(blob_file_path, start=snapshot_dir)
</code_context>
<issue_to_address>
Potential race condition if multiple processes download the same blob concurrently.
Both processes could write to the file simultaneously, causing corruption. Use file locking or atomic operations to prevent this.
</issue_to_address>
### Comment 2
<location> `ramalama/model_store.py:618` </location>
<code_context>
- if ref_file is not None:
- for file in ref_file.filenames:
+ for file in ref_file.filenames:
+ blob_refcount = blob_refcounts.get(file, 0)
+ if blob_refcount <= 1:
self._remove_blob_file(self.get_snapshot_file_path(ref_file.hash, file))
self._remove_blob_file(self.get_partial_blob_file_path(ref_file.hash))
</code_context>
<issue_to_address>
Partial blob file removal is attempted for every file, which may be redundant.
Move the call to remove the partial blob file outside the loop or ensure it is only executed once per snapshot to avoid redundant operations.
</issue_to_address>
<suggested_fix>
<<<<<<< SEARCH
# Remove all blobs first
for file in ref_file.filenames:
blob_refcount = blob_refcounts.get(file, 0)
if blob_refcount <= 1:
self._remove_blob_file(self.get_snapshot_file_path(ref_file.hash, file))
self._remove_blob_file(self.get_partial_blob_file_path(ref_file.hash))
else:
logger.debug(f"Not removing blob {file} refcount={blob_refcount}")
=======
# Remove all blobs first
any_blob_removed = False
for file in ref_file.filenames:
blob_refcount = blob_refcounts.get(file, 0)
if blob_refcount <= 1:
self._remove_blob_file(self.get_snapshot_file_path(ref_file.hash, file))
any_blob_removed = True
else:
logger.debug(f"Not removing blob {file} refcount={blob_refcount}")
if any_blob_removed:
self._remove_blob_file(self.get_partial_blob_file_path(ref_file.hash))
>>>>>>> REPLACE
</suggested_fix>
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request fixes the deletion logic for models to correctly handle multiple references to the same blob or snapshot by implementing reference counting. It also adds caching for downloaded blobs and symlinks to avoid redundant operations. The changes are supported by new system tests that verify the reference-counted deletion and caching behavior. My review identified a couple of high-severity issues in the implementation: one related to checking for symlink existence which could lead to errors, and another regarding the incorrect logic for removing partial download files. I've provided suggestions to fix both.
Please consider the AI Reviews, Overall LGTM |
When deleting a reference, count the remaining references to the snapshot/blobs to determine if they should be deleted. Signed-off-by: Oliver Walsh <[email protected]>
Signed-off-by: Oliver Walsh <[email protected]>
Signed-off-by: Oliver Walsh <[email protected]>
28927bf
to
80fd6d9
Compare
Previously would always remove this partial blob file. Note: this assumes the blob hash equals the snapshot hash, which is only true for repos with a single model Signed-off-by: Oliver Walsh <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Lets tackle the proper cleanup of .partials in a follow-up PR.
When deleting a reference, count the remaining references to the
snapshot/blobs to determine if they should be deleted.
Also add bats tests for
hf:/<org>/<model>:<quant>
urisSummary by Sourcery
Fix deletion logic to handle multiple references to the same blob or snapshot, enhance caching and symlink handling, and add system tests to verify pull and removal behavior for models with multiple tags.
Bug Fixes:
Enhancements:
Tests: