Implementing ByteSlice and ByteSliceMut for Vec<u8> #1045

shashitnak · 2024-03-14T18:24:54Z

Adresses #992

google-cla · 2024-03-14T18:24:58Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

jswrenn · 2024-03-14T19:23:46Z

src/lib.rs

+unsafe impl ByteSlice for Vec<u8> {}
+
+#[cfg(any(feature = "alloc", test))]
+// SAFETY: This uses safe a method from stdlib.


The safety comment needs to prove that the safety conditions of SplitByteSlice are totally satisfied. Unfortunately, I believe safety conditions are unsatisfiable in this case. It asks:

In particular, given B: SplitByteSlice and b: B, if b.deref() returns a byte slice with address addr and length len, then if split <= len, b.split_at(split) will return (first, second) such that:

first's address is addr and its length is split

second's address is addr + split and its length is len - split

Your implementation does not satisfy the second condition, because the two Vecs will almost certainly not be adjacent in memory.

To add a bit of extra detail here: Your implementations of ByteSlice and ByteSliceMut are fine so long as you remove the implementation of SplitByteSlice.

@jswrenn @joshlf Pushed a naive solution where I somehow managed to make it work. Am I at least on the right path? Or is it unacceptable?

I think if we give up the restriction of only allowing pair of Self to be returned from SplitByteSlice::split_at, we can use a type similar to VecSlice, from my example, and can split a Vec for free. And then we can also implement SplitByteSlice for VecSlice and then we can split a Vec however many times we want at zero cost. Is there a reason we would not want to remove this restriction?

IIUC this is still unsound - how do you prevent the CanDrop variant from being dropped before the DontDrop variant? If that happens, then the DontDrop variant points to freed memory. I could imagine solving this with lifetimes, but you might need to make SplitByteSlice even more complex to be able to carry those lifetimes (or maybe not? I'm genuinely not sure).

In the abstract I think this is an interesting idea, but it will take some work to rejigger the API to support it without adding a lot of complexity that other users (who aren't using Vec) will have to know about.

Is there a reason that your use case doesn't allow either just operating on slices (e.g. borrowing the Vec) or making a clone of the Vec into a RefCell<[u8]> (which we do support)?

Here's a new thing I tried that is probably better than the previous implementation. The VecSlice type now looks like the following

pub struct VecSlice { slice: VirtualVec, ghost: Rc<GhostVec>, } struct VirtualVec { ptr: *mut u8, len: usize, cap: usize, } struct GhostVec(Vec<u8>);

With every split, new VirtualVecs are created but the GhostVec is shared by all the splits and will only drop when the last VecSlice is dropped.

Although, now when I think about it, this is just a more sophisticated way of achieving what can easily be achieved by calling split_at on Vec not as Vec but as slice. This doesn't seem to provide much value I guess.

Regarding your question, I was just curious about if it was possible to split Vec at zero cost.

@joshlf removed all the unnecessary code. It now only contains implementations of ByteSlice and ByteSliceMut

joshlf · 2024-03-27T20:06:22Z

src/lib.rs

    }
 }

+// TODO(#429): Add a "SAFETY" comment and remove this `allow`.


Can you add safety comments here and below? Our policy is not to introduce any new undocumented unsafe code while we burn down the existing TODOs.

On further reflection, I don't believe the safety conditions of ByteSlice are satisfiable:

Safety

Implementations of ByteSlice must promise that their implementations of Deref and DerefMut are "stable". In particular, given B: ByteSlice and b: B, b must always dereference to a byte slice with the same address and length. This is true for both b.deref() and b.deref_mut(). If B: Copy or B: Clone, then the same is also true of copies or clones of b. For example, b.deref_mut() must return a byte slice with the same address and length as b.clone().deref().

For Vec, a growable buffer, this cannot be the case: the length changes as elements are added and removed, and the address changes as elements are added.

jswrenn · 2024-03-28T16:44:41Z

src/lib.rs

    }
 }

+// TODO(#429): Add a "SAFETY" comment and remove this `allow`.


On further reflection, I don't believe the safety conditions of ByteSlice are satisfiable:

Safety

Implementations of ByteSlice must promise that their implementations of Deref and DerefMut are "stable". In particular, given B: ByteSlice and b: B, b must always dereference to a byte slice with the same address and length. This is true for both b.deref() and b.deref_mut(). If B: Copy or B: Clone, then the same is also true of copies or clones of b. For example, b.deref_mut() must return a byte slice with the same address and length as b.clone().deref().

For Vec, a growable buffer, this cannot be the case: the length changes as elements are added and removed, and the address changes as elements are added.

jswrenn · 2024-03-28T17:38:50Z

src/lib.rs

+
 // TODO(#429): Add a "SAFETY" comment and remove this `allow`.
 #[allow(clippy::undocumented_unsafe_blocks)]
 unsafe impl<'a> ByteSliceMut for &'a mut [u8] {}


@joshlf, in light of the above comment, I'm not certain that this satisfies our documented safety condition either. It's trivial to define a function that changes both the length and address of a &mut [u8] between calls to deref:

fn change_add_and_length(mut slice: &mut [u8]) { slice = &mut slice[1..]; }

I think this indicates that our safety documentation on ByteSlice is missing some critical nuance.

joshlf · 2024-05-09T16:56:29Z

Okay, now that #1215 and #1218 have landed, this can be rebased and the soundness issues should be addressable. In particular, we can implement ByteSlice and ByteSliceMut for Vec<u8>, but we can't implement any of the other traits.

shashitnak force-pushed the byte-slice-vec branch from 9b595b1 to b588db3 Compare March 14, 2024 18:46

jswrenn reviewed Mar 14, 2024

View reviewed changes

shashitnak force-pushed the byte-slice-vec branch 10 times, most recently from 645e425 to a5fc43b Compare March 20, 2024 05:10

Implementing ByteSlice and ByteSliceMut for Vec<u8>

a4d7277

shashitnak force-pushed the byte-slice-vec branch from 1fffa6a to a4d7277 Compare March 23, 2024 14:43

joshlf requested changes Mar 27, 2024

View reviewed changes

jswrenn reviewed Mar 28, 2024

View reviewed changes

Implementing ByteSlice and ByteSliceMut for Vec<u8> #1045

Are you sure you want to change the base?

Implementing ByteSlice and ByteSliceMut for Vec<u8> #1045

Uh oh!

Conversation

shashitnak commented Mar 14, 2024

Uh oh!

google-cla bot commented Mar 14, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shashitnak Mar 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joshlf Mar 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Safety

Uh oh!

Choose a reason for hiding this comment

Safety

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joshlf commented May 9, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shashitnak Mar 17, 2024 •

edited

Loading

joshlf Mar 19, 2024 •

edited

Loading