Support `TryFromBytes` - conditional conversion analogous to `FromBytes`

*Co-authored with @jswrenn.*

## Overview

Add a `TryFromBytes` trait, which supports byte-to-type conversions for non-`FromBytes` types by performing runtime validation. Add a custom derive which generates this validation code automatically.

*Many thanks to @kupiakos and @djkoloski for providing invaluable feedback and input on this design.*

## Progress

- [x] Add `TryFromBytes` trait definition
  - https://github.com/google/zerocopy/pull/641
- [x] Implement `TryFromBytes` for existing `FromBytes` types
  - https://github.com/google/zerocopy/pull/644
- [x] Add `try_from_ref` method; impl for `bool`
  - https://github.com/google/zerocopy/pull/647
- [x] Implement derive for structs
  - https://github.com/google/zerocopy/pull/665
- [x] Implement for slices
  - https://github.com/google/zerocopy/pull/666
- [x] Implement for arrays
  - https://github.com/google/zerocopy/pull/799
- [x] Allow deriving on `repr(packed)` structs
  - https://github.com/google/zerocopy/pull/794
- [x] Allow deriving on unions
  - See for context: https://github.com/google/zerocopy/discussions/696
  - https://github.com/google/zerocopy/pull/800
- [x] Allow deriving on field-less enums with primitive reprs (`u8`, `i16`, etc)
  - https://github.com/google/zerocopy/pull/803
- [x] Allow deriving on field-less enums with `repr(C)` by treating the discriminant type as `[u8; size_of::<Self>()]`
  - https://github.com/google/zerocopy/pull/806
- [x] https://github.com/google/zerocopy/issues/873
- [x] Implement `TryFromBytes` for `fn()` and `extern "C" fn()` types
  - #904 
- [x] Implement `TryFromBytes` for `UnsafeCell<T>`
  - #905
- [x] Make `TryFromBytes` a super-trait of `FromZeros`
  - #962
- [x] Remove `#[doc(hidden)]` from all items which are intended to be public
  - https://github.com/google/zerocopy/pull/976
- [x] Add to `TryFromBytes` docs to explain that you can't always round trip `T -> [u8] -> T` (notably for pointer types), which could be confusing given that, for `TryFromBytes`, the failure would show up at runtime
  - #860
- [x] Rename methods consistent with #1095
  - https://github.com/google/zerocopy/pull/1119
- [x] `TryFromBtyes` doc comment currently incorrectly says: "zerocopy does not permit implementing `TryFromBytes` for any union type"
  - https://github.com/google/zerocopy/pull/1259
- [x] Consider [this comment](https://github.com/google/zerocopy/issues/5#issuecomment-2120952779)
- Non-breaking/blocking
  - [x] Allow deriving on data-full enums
    - https://github.com/google/zerocopy/pull/879
- Non-breaking/non-blocking
  - [x] Add `try_from_mut` and `try_read_from` methods
    - https://github.com/google/zerocopy/pull/892
    - https://github.com/google/zerocopy/pull/808
  - [x] Implement for unsized `UnsafeCell`
    - Consider that we may not need to require `T: Sized` (described in [#251](https://github.com/google/zerocopy/issues/251)) if we use the design in [#905](https://github.com/google/zerocopy/issues/905)
    - #1619
  - [x] Remove `Self: NoCell` bound from `try_read_from`
    - https://github.com/google/zerocopy/pull/970
  - [ ] Support deriving on unions without `Immutable` bound
    - #1832
  - [ ] `is_bit_valid` should promise not to mutate its argument's referent
    - #1831
  - [ ] #1330

## Motivation

Many use cases involve types whose layout is well-defined, but which cannot implement `FromBytes` because there exist bit patterns which are invalid (either they are unsound in terms of language semantics or they are unsafe in the sense of violating a library invariant).

Consider, for example, parsing an RPC message format. It would be desirable for performance reasons to be able to read a message into local memory, validate its structure, and if validation succeeds, treat that memory as containing a parsed message rather than needing to copy the message in order to transform it into a native Rust representation.

Here's a simple, hypothetical example of an RPC to request log messages from a process:

```rust
/// The arguments to the `RequestLogs` RPC (auto-generated by the RPC compiler).
#[repr(C)]
struct RequestLogsArgs {
    max_logs: u64,
    since: LogTime,
    level: LogLevel,
}

/// Log time, measured as time on the process's monotonic clock.
#[repr(C)]
struct LogTime {
    secs: u64,
    // Invariant: In the range [0, 10^9)
    nsecs: u32,
}

/// Level of log messages requested from `RequestLogs`.
#[repr(u8)]
enum LogLevel {
    Trace,
    Debug,
    Info,
    Warn,
    Error,
}
```

None of these types can be `FromBytes`. For `LogLevel`, only the `u8` values 0 through 4 correspond to enum variants, and constructing a `LogLevel` from any other `u8` would be unsound. For `LogTime`, any sequence of the appropriate number of bytes would constitute a valid instance of `LogTime` from Rust's perspective - it would not cause unsoundness - but some such sequences would violate the invariant that the `nsecs` field is in the range `[0, 10^9)`.

While these types can't be `FromBytes`, we'd still like to be able to conditionally reinterpret a sequence of bytes as a `RequestLogsArgs` - it's just that we need to perform runtime validation first. Ideally, we'd be able to write code like:

```rust
/// The arguments to the `RequestLogs` RPC (auto-generated by the RPC compiler).
#[derive(TryFromBytes)]
#[repr(C)]
struct RequestLogsArgs {
    max_stats: u64,
    since: LogTime,
    level: LogLevel,
}

/// Log time, measured as time on the process's monotonic clock.
#[derive(TryFromBytes)]
#[TryFromBytes(validator = "is_valid")]
#[repr(C)]
struct LogTime {
    secs: u64,
    // Invariant: In the range [0, 10^9)
    nsecs: u32,
}

impl LogTime {
    fn is_valid(&self) -> bool {
        self.nsecs < 1_000_000_000
    }
}

/// Level of log messages requested from `RequestLogs`.
#[derive(TryFromBytes)]
#[repr(u8)]
enum LogLevel {
    Trace,
    Debug,
    Info,
    Warn,
    Error,
}
```

The `TryFromBytes` trait - the subject of this design - provides the ability to fallibly convert a byte sequence to a type, performing validation at runtime. At a minimum, the validation code simply ensures soundness - for example, in the case of `LogLevel`, validating that byte values are in the range [0, 4]. The custom derive also supports user-defined validation like the `LogTime::is_valid` method (note the `validator` annotation on `LogTime`), which can be used to enforce safety invariants that go above and beyond soundness.

Given these derives of `TryFromBytes`, an implementation of this RPC could be as simple as:

```rust
fn serve_request_logs_rpc<F: FnMut(&RequestLogsArgs)>(server: &mut RpcServer, f: F) -> Result<()> {
    loop {
        let bytes = [0u8; mem::size_of::<RequestLogsArgs>()];
        server.read_request(&mut bytes[..])?;
        let args = RequestLogsArgs::try_from_bytes(&bytes[..]).ok_or(ParseError)?;
        f(args);
    }
}
```

The design proposed in this issue implements this API.

## Design

TODO

This design builds on the following features:
- https://github.com/google/zerocopy/issues/29
- https://github.com/google/zerocopy/issues/196

```rust
/// A value which might or might not constitute a valid instance of `T`.
// Builds on the custom MaybeUninit type described in #29
pub struct MaybeValid<T: AsMaybeUninit + ?Sized>(MaybeUninit<T>);

// Allows us to use the `project!` macro for field projection (proposed in #196)
unsafe impl<T, F> Projectable<F, AlignedByteArray<F>> for AlignedByteArray<T> {
    type Inner = T;
}

impl<T> MaybeValid<T> {
    /// Converts this `MaybeValid<T>` to a `T`.
    ///
    /// # Safety
    ///
    /// `self` must contain a valid `T`.
    pub const unsafe fn assume_valid(self) -> T { ... }

    /// Converts this `&MaybeValid<T>` to a `&T`.
    ///
    /// # Safety
    ///
    /// `self` must contain a valid `T`.
    pub const unsafe fn assume_valid_ref(&self) -> &T { ... }

    /// Converts this `&mut MaybeValid<T>` to a `&mut T`.
    ///
    /// # Safety
    ///
    /// `self` must contain a valid `T`.
    pub unsafe fn assume_valid_mut(&mut self) -> &mut T { ... }
}

/// # Safety
///
/// `is_bit_valid` is correct. If not, can cause UB.
pub unsafe trait TryFromBytes {
    fn is_bit_valid(bytes: &MaybeValid<Self>) -> bool;

    fn try_from_ref(bytes: &[u8]) -> Option<&Self> {
        let maybe_valid = Ref::<_, MaybeValid<T>>::new(bytes)?.into_ref();
        if Self::is_bit_valid(maybe_valid) {
            // SAFETY: `is_bit_valid` promises that it only returns true if
            // its argument contains a valid `T`. This is exactly the safety
            // precondition of `MaybeValid::assume_valid_ref`.
            Some(unsafe { maybe_valid.assume_valid_ref() })
        } else {
            None
        }
    }

    fn try_from_mut(bytes: &mut [u8]) -> Option<&mut Self>
    where
        Self: AsBytes + Sized,
    {
        let maybe_valid = Ref::<_, MaybeValid<T>>::new(bytes)?.into_mut();
        if Self::is_bit_valid(maybe_valid) {
            // SAFETY: `is_bit_valid` promises that it only returns true if
            // its argument contains a valid `T`. This is exactly the safety
            // precondition of `MaybeValid::assume_valid_ref`.
            Some(unsafe { maybe_valid.assume_valid_ref() })
        } else {
            None
        }
    }

    fn try_read_from(bytes: &[u8]) -> Option<Self>
    where
        Self: Sized
    {
        let maybe_valid = <MaybeValid<T> as FromBytes>::read_from(bytes)?;
        if Self::is_bit_valid(&maybe_valid) {
            // SAFETY: `is_bit_valid` promises that it only returns true if
            // its argument contains a valid `T`. This is exactly the safety
            // precondition of `MaybeValid::assume_valid`.
            Some(unsafe { maybe_valid.assume_valid() })
        } else {
            None
        }
    }
}
```

Here's an example usage:

```rust
/// A type without any safety invariants.
#[derive(TryFromBytes)]
#[repr(C)]
struct MySimpleType {
    b: bool,
}

// Code emitted by `derive(TryFromBytes)`
unsafe impl TryFromBytes for MySimpleType {
    fn is_bit_valid(bytes: &MaybeValid<Self>) -> bool {
        // `project!` is described in #196
        let b: &MaybeValid<bool> = project!(&bytes.b);
        TryFromBytes::is_bit_valid(b)
    }
}

/// A type with invariants encoded using `validate`.
#[derive(TryFromBytes)]
#[TryFromBytes(validator = "validate")]
#[repr(C)]
struct MyComplexType {
    b: bool,
}

// Code emitted by `derive(TryFromBytes)`
unsafe impl TryFromBytes for MyComplexType {
    fn is_bit_valid(bytes: &AlignedByteArray<Self>) -> bool {
        // `project!` is described in #196
        let b: &MaybeValid<bool> = project!(&bytes.b);
        if !TryFromBytes::is_bit_valid(b) { return false; }
        // If there's no interior mutability, then we know this is sound because of preceding
        // validation. TODO: What to do about interior mutability?
        let slf: &MyComplexType = ...;
        MyComplexType::validate(slf)
    }
}

impl MyComplexType {
    fn validate(slf: &MyComplexType) -> bool { ... }
}
```

### Unions

See for a discussion of how to support unions in `TryFromBytes`: https://github.com/google/zerocopy/discussions/696

### Relationship with other traits

There are obvious relationships between `TryFromBytes` and the existing `FromZeroes` and `FromBytes` traits:
- If a type is `FromZeroes`, then it should probably be `TryFromBytes` (at a minimum, we must know something about the type's layout and bit validity to determine that it is genuinely `FromZeroes`)
  - This implies that we should change `FromZeroes` to be `FromZeroes: TryFromBytes`
- If a type is `FromBytes`, then it is trivially `TryFromBytes` (where `is_bit_valid` unconditionally returns `true`)
  - This implies that we should provide a blanket impl `impl<T: FromBytes> TryFromBytes for T`

Unfortunately, neither of these are possible today.

#### `FromZeroes: TryFromBytes`

The reason this bound doesn't work has to do with unsized types. As described in the previous section, working with unsized types is difficult. Luckily for `FromZeroes`, it doesn't have to _do anything_ with the types it's implemented for - it's just a marker trait. It can happily represent a claim about the bit validity of a type even if that type isn't constructible in practice (over time, `FromZeroes` will become more useful as more unsized types become constructible). By contrast, `TryFromBytes` is only useful if we can emit validation code (namely, `is_bit_valid`). For that reason, we require that `TryFromBytes: AsMaybeUninit` since that bound is required in order to support the `MaybeValid` type required by `is_bit_valid`.

This means that we have two options if we want `FromZeroes: TryFromBytes`:
- We can keep `TryFromBytes: AsMaybeUninit`. As a result, some types which are `FromZeroes` today can no longer be `FromZeroes`, and some blanket impls of `FromZeroes` would require more complex bounds (e.g., today we write `impl<T: FromZeroes> FromZeroes for Wrapping<T>`; under this system, we'd need to write `impl<T: FromZeroes> FromZeroes for Wrapping<T> where <T as AsMaybeUninit>::MaybeUninit: Sized`, or alternatively we'd need to write one impl for `T` and a different one for `[T]`).
- We could move the `AsMaybeUninit` bound out of definition of `TryFromBytes` and into `is_bit_valid` (and callers). As a result, we can keep existing impls of `FromZeroes`, but now `T: TryFromBytes` is essentially useless - to do anything useful, you need to specify `T: TryFromBytes + AsMaybeUninit`.

Neither option seems preferable to just omitting `FromZeroes: TryFromBytes`. Callers who require both can simply write `T: FromZeroes + TryFromBytes`.

(Note that the same points apply if we consider `FromBytes: TryFromBytes`)

#### `impl<T: FromBytes> TryFromBytes for T`

This conflicts with other blanket impls which we need for completeness:
- `impl<T: TryFromBytes> TryFromBytes for [T]`
- `impl<const N: usize, T: TryFromBytes> TryFromBytes for [T; N]`

As a result, we have to leave `TryFromBytes` and `FromBytes` as orthogonal. We may want to make it so that `derive(FromBytes)` automatically emits an impl of `TryFromBytes`, although in the general case that may require custom DST support.

## Open questions

- Is there any way to recover the blanket impl of `TryFromBytes` for `T: FromBytes`? Unlike `FromZeroes: TryFromBytes`, where you may need to perform runtime validation, if you know that `T: FromBytes`, then in principle you know that `is_bit_valid` can unconditionally return `true` without inspecting its argument, and so in principle it shouldn't matter whether you can construct a `MaybeValid<Self>`. Is there some way that we could allow `FromBytes` types to specify `<Self as AsMaybeUninit>::MaybeUninit = ()` or similar in order to bypass the "only sized types or slices can implement `AsMaybeUninit`" problem?
  - One approach is to wait until a [`KnownLayout` trait lands](https://github.com/google/zerocopy/issues/29). There's a good chance that, under that design, we'd end up with `FromZeroes: KnownLayout`. If `KnownLayout: AsMaybeUninit` (or just absorbs the current definition of `AsMaybeUninit` into itself), it'd solve this problem since all zerocopy traits would imply support for `MaybeValid`.
- In the first version of this feature, could we relax the `Self: Sized` bounds on `try_from_ref` and `try_from_mut` (without needing full custom-DST support)?
- Should `derive(FromBytes)` emit an impl of `TryFromBytes`? What about custom DSTs?
- What should the behavior for unions be? Should it validate that at least one variant is valid, or that all variants are valid? (This hinges somewhat on the outcome of [rust-lang/unsafe-code-guidelines#438](https://github.com/rust-lang/unsafe-code-guidelines/issues/438).)
- What bounds should we place on `T` when implementing `TryFromBytes` for `Unalign<T>` ([#320](https://github.com/google/zerocopy/issues/320))?

## Future directions

- In this design, we ban interior mutability entirely. For references, this is unavoidable - e.g., if we were to allow types containing `UnsafeCell` in `try_from_ref`, then the user could obtain an `&UnsafeCell` and a `&[u8]` view of the same memory, which is unsound (it's unsound to even exist under Stacked Borrows, and unsound to expose to safe code in all cases). For values (i.e., `try_read_from`), we'd like to be able to support this - as long as we have some way of performing validation, it should be fine to return an `UnsafeCell` by value even if its bytes were copied from a `&[u8]`. Actually supporting this in practice is complicated for a number of reasons, but perhaps a future extension could support it. Reasons it's complicated:
  - `is_bit_valid` operates on a `NonNull<Self>`, so interior mutability isn't inherently a problem. However, it needs to be able to call a user's custom validator, which instead operates on a `&Self`, which *is* a problem.
  - Even if we could solve the previous problem somehow, we'd need to have `is_bit_valid` require that it's argument not be either experiencing interior mutation or, under Stacked Borrows, contain any `UnsafeCell`s at all. When the `NonNull<Self>` is synthesized from a `&[u8]`, this isn't a problem, but if in the future we want to support type-to-type conditional transmutation, it might be a problem. If, in the future, merely *containing* `UnsafeCell`s is fine, then we could potentially design a wrapper type which "disables" interior mutation and supports field projection. This might allow us to solve this problem.
- https://github.com/google/zerocopy/issues/590

## Prior art

The bytemuck crate defines a [`CheckedBitPattern`](https://docs.rs/bytemuck/latest/bytemuck/checked/trait.CheckedBitPattern.html) trait which serves a similar role to the proposed `TryFromBytes`.

Unlike `TryFromBytes`, `CheckedBitPattern` introduces a separate associated `Bits` type which is a type with the same layout as `Self` except that all bit patterns are valid. This serves the same role as `MaybeValid<Self>` in our design. One advantage for the `Bits` type is that it may be more ergonomic to write validation code for it, which is important for manual implementations of `CheckedBitPattern`. However, our design expects that manual implementations of `TryFromBytes` will be very rare. Since `CheckedBitPattern`'s derive doesn't support custom validation, any type with safety invariants would need a manual implementation. By contrast, the `TryFromBytes` derive's support for a custom validation function means that, from a completeness standpoint, it should never be necessary to implement `TryFromBytes` manually. The only case in which a manual implementation might be warranted would be for performance reasons.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support `TryFromBytes` - conditional conversion analogous to `FromBytes` #5

Overview

Progress

Motivation

Design

Unions

Relationship with other traits

`FromZeroes: TryFromBytes`

`impl<T: FromBytes> TryFromBytes for T`

Open questions

Future directions

Prior art

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support TryFromBytes - conditional conversion analogous to FromBytes #5

Description

Overview

Progress

Motivation

Design

Unions

Relationship with other traits

FromZeroes: TryFromBytes

impl<T: FromBytes> TryFromBytes for T

Open questions

Future directions

Prior art

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Support `TryFromBytes` - conditional conversion analogous to `FromBytes` #5

`FromZeroes: TryFromBytes`

`impl<T: FromBytes> TryFromBytes for T`