Skip to content

Conversation

@paraseba
Copy link
Collaborator

@paraseba paraseba commented Jan 21, 2025

This is not an end state but an intermediate step. We won't release this as a version, but once we have one manifest per array, it's easier to group those arrays and pack them.

Tasks:

  • Cleanup
  • [ ] Handle metadata edit case left for later
  • Think of other edge cases
  • Performance, parallelization, async, etc.

@paraseba paraseba requested a review from dcherian January 21, 2025 01:39
@paraseba paraseba force-pushed the push-vvyyulupkwtx branch 3 times, most recently from 15be7df to 68f549a Compare January 22, 2025 03:02
@paraseba paraseba marked this pull request as ready for review January 22, 2025 03:02
@paraseba paraseba force-pushed the push-vvyyulupkwtx branch 2 times, most recently from 3da62ea to efb4a89 Compare January 22, 2025 03:08
new_zarr_meta,
new_manifests.unwrap_or_default(),
),
node_data: NodeData::Array(new_zarr_meta, vec![]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment here would be useful. Why are we ignoring existing manifests? And where do we apply the new ones?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm is this a bug? why aren't we finding it? Need to understand this better... great catch!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I don't understand it. We should have the old manifests in there AFAICT but also it should be covered by tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok it's not a bug because I'm getting the old manifest refs from the snapshot, but I'll change this. Getting from here is a more clear and efficient way to do it. Thank you Deepak!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because get_chunk_ref will get nothing from this node, and then it'll go back to the snapshot? If you can make it clearer, that would be great.


let len = buffer.len() as u64;
let id = new_manifest.id.clone();
// TODO: we should compress only when the manifest reaches a certain size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could also set level=-7 by default below a certain size threshold. This is apparently the lowest Zstd compression level.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ouch, I think I currently giving the level attribute a u8, I thought levels where positive. We can change later.

Ok(())
}

async fn write_manifest_for_existing_node(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure I got this correct: this is basically identical to the previous "new node" function, we just use a different chunk iterator to apply the changeset updates. In the future, the logic may change to not rewrite the whole manifest if we can. Did I get it right? A short comment would be useful.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is slightly different in that in eagerly write the full manifest for the node (in the future we'll have to wait). It writes the manifest, looking at chunk changes, and also registers the new manifest in FlushProcess, so at the end, we can include those manifests in the snapshot

Comment on lines +1421 to +1445
from.0 = Vec::new();
to.0 = Vec::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what's happening here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh god, this is awful code, awful .... I'll comment it profusely. There has to be better ways to do this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant just those two lines specifically.

The rest was fine!

paraseba and others added 2 commits January 23, 2025 16:59
This is not an end state but an intermediate step. We won't release this
as a version, but once we have one manifest per array, it's easier to
group those arrays and pack them.

This commit also implements array extents tracking. So now, the
`ManifestRef` in the snapshot identifies, for each manifest, what part of
the array is contained there.
@paraseba paraseba merged commit 26f0381 into main Jan 23, 2025
4 checks passed
@paraseba paraseba deleted the push-vvyyulupkwtx branch January 23, 2025 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants