-
Notifications
You must be signed in to change notification settings - Fork 50
Generate one manifest per array #598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
15be7df to
68f549a
Compare
3da62ea to
efb4a89
Compare
icechunk/src/change_set.rs
Outdated
| new_zarr_meta, | ||
| new_manifests.unwrap_or_default(), | ||
| ), | ||
| node_data: NodeData::Array(new_zarr_meta, vec![]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment here would be useful. Why are we ignoring existing manifests? And where do we apply the new ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm is this a bug? why aren't we finding it? Need to understand this better... great catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I don't understand it. We should have the old manifests in there AFAICT but also it should be covered by tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok it's not a bug because I'm getting the old manifest refs from the snapshot, but I'll change this. Getting from here is a more clear and efficient way to do it. Thank you Deepak!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because get_chunk_ref will get nothing from this node, and then it'll go back to the snapshot? If you can make it clearer, that would be great.
|
|
||
| let len = buffer.len() as u64; | ||
| let id = new_manifest.id.clone(); | ||
| // TODO: we should compress only when the manifest reaches a certain size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could also set level=-7 by default below a certain size threshold. This is apparently the lowest Zstd compression level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ouch, I think I currently giving the level attribute a u8, I thought levels where positive. We can change later.
| Ok(()) | ||
| } | ||
|
|
||
| async fn write_manifest_for_existing_node( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I got this correct: this is basically identical to the previous "new node" function, we just use a different chunk iterator to apply the changeset updates. In the future, the logic may change to not rewrite the whole manifest if we can. Did I get it right? A short comment would be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is slightly different in that in eagerly write the full manifest for the node (in the future we'll have to wait). It writes the manifest, looking at chunk changes, and also registers the new manifest in FlushProcess, so at the end, we can include those manifests in the snapshot
| from.0 = Vec::new(); | ||
| to.0 = Vec::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what's happening here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh god, this is awful code, awful .... I'll comment it profusely. There has to be better ways to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant just those two lines specifically.
The rest was fine!
This is not an end state but an intermediate step. We won't release this as a version, but once we have one manifest per array, it's easier to group those arrays and pack them. This commit also implements array extents tracking. So now, the `ManifestRef` in the snapshot identifies, for each manifest, what part of the array is contained there.
Co-authored-by: Deepak Cherian <[email protected]>
6cc27fc to
2d95f70
Compare
2d95f70 to
00a4e36
Compare
This is not an end state but an intermediate step. We won't release this as a version, but once we have one manifest per array, it's easier to group those arrays and pack them.
Tasks:
[ ] Handle metadata edit caseleft for later