-
Notifications
You must be signed in to change notification settings - Fork 130
Description
Co-authored with @jswrenn.
Overview
Add a TryFromBytes trait, which supports byte-to-type conversions for non-FromBytes types by performing runtime validation. Add a custom derive which generates this validation code automatically.
Many thanks to @kupiakos and @djkoloski for providing invaluable feedback and input on this design.
Progress
- Add
TryFromBytestrait definition - Implement
TryFromBytesfor existingFromBytestypes - Add
try_from_refmethod; impl forbool - Implement derive for structs
- Implement for slices
- Implement for arrays
- Allow deriving on
repr(packed)structs - Allow deriving on unions
- Allow deriving on field-less enums with primitive reprs (
u8,i16, etc) - Allow deriving on field-less enums with
repr(C)by treating the discriminant type as[u8; size_of::<Self>()] -
Ptrtype should reason aboutUnsafeCelloverlap #873 - Implement
TryFromBytesforfn()andextern "C" fn()types - Implement
TryFromBytesforUnsafeCell<T> - Make
TryFromBytesa super-trait ofFromZeros - Remove
#[doc(hidden)]from all items which are intended to be public - Add to
TryFromBytesdocs to explain that you can't always round tripT -> [u8] -> T(notably for pointer types), which could be confusing given that, forTryFromBytes, the failure would show up at runtime - Rename methods consistent with Revising the (`Try`)`FromBytes` Conversion Methods in 0.8 #1095
-
TryFromBtyesdoc comment currently incorrectly says: "zerocopy does not permit implementingTryFromBytesfor any union type" - Consider this comment
- Non-breaking/blocking
- Allow deriving on data-full enums
- Non-breaking/non-blocking
- Add
try_from_mutandtry_read_frommethods - Implement for unsized
UnsafeCell- Consider that we may not need to require
T: Sized(described in #251) if we use the design in #905 - Implement TryFromBytes for unsized UnsafeCell #1619
- Consider that we may not need to require
- Remove
Self: NoCellbound fromtry_read_from - Support deriving on unions without
Immutablebound -
is_bit_validshould promise not to mutate its argument's referent - Support custom validators for
TryFromBytes#1330
- Add
Motivation
Many use cases involve types whose layout is well-defined, but which cannot implement FromBytes because there exist bit patterns which are invalid (either they are unsound in terms of language semantics or they are unsafe in the sense of violating a library invariant).
Consider, for example, parsing an RPC message format. It would be desirable for performance reasons to be able to read a message into local memory, validate its structure, and if validation succeeds, treat that memory as containing a parsed message rather than needing to copy the message in order to transform it into a native Rust representation.
Here's a simple, hypothetical example of an RPC to request log messages from a process:
/// The arguments to the `RequestLogs` RPC (auto-generated by the RPC compiler).
#[repr(C)]
struct RequestLogsArgs {
max_logs: u64,
since: LogTime,
level: LogLevel,
}
/// Log time, measured as time on the process's monotonic clock.
#[repr(C)]
struct LogTime {
secs: u64,
// Invariant: In the range [0, 10^9)
nsecs: u32,
}
/// Level of log messages requested from `RequestLogs`.
#[repr(u8)]
enum LogLevel {
Trace,
Debug,
Info,
Warn,
Error,
}None of these types can be FromBytes. For LogLevel, only the u8 values 0 through 4 correspond to enum variants, and constructing a LogLevel from any other u8 would be unsound. For LogTime, any sequence of the appropriate number of bytes would constitute a valid instance of LogTime from Rust's perspective - it would not cause unsoundness - but some such sequences would violate the invariant that the nsecs field is in the range [0, 10^9).
While these types can't be FromBytes, we'd still like to be able to conditionally reinterpret a sequence of bytes as a RequestLogsArgs - it's just that we need to perform runtime validation first. Ideally, we'd be able to write code like:
/// The arguments to the `RequestLogs` RPC (auto-generated by the RPC compiler).
#[derive(TryFromBytes)]
#[repr(C)]
struct RequestLogsArgs {
max_stats: u64,
since: LogTime,
level: LogLevel,
}
/// Log time, measured as time on the process's monotonic clock.
#[derive(TryFromBytes)]
#[TryFromBytes(validator = "is_valid")]
#[repr(C)]
struct LogTime {
secs: u64,
// Invariant: In the range [0, 10^9)
nsecs: u32,
}
impl LogTime {
fn is_valid(&self) -> bool {
self.nsecs < 1_000_000_000
}
}
/// Level of log messages requested from `RequestLogs`.
#[derive(TryFromBytes)]
#[repr(u8)]
enum LogLevel {
Trace,
Debug,
Info,
Warn,
Error,
}The TryFromBytes trait - the subject of this design - provides the ability to fallibly convert a byte sequence to a type, performing validation at runtime. At a minimum, the validation code simply ensures soundness - for example, in the case of LogLevel, validating that byte values are in the range [0, 4]. The custom derive also supports user-defined validation like the LogTime::is_valid method (note the validator annotation on LogTime), which can be used to enforce safety invariants that go above and beyond soundness.
Given these derives of TryFromBytes, an implementation of this RPC could be as simple as:
fn serve_request_logs_rpc<F: FnMut(&RequestLogsArgs)>(server: &mut RpcServer, f: F) -> Result<()> {
loop {
let bytes = [0u8; mem::size_of::<RequestLogsArgs>()];
server.read_request(&mut bytes[..])?;
let args = RequestLogsArgs::try_from_bytes(&bytes[..]).ok_or(ParseError)?;
f(args);
}
}The design proposed in this issue implements this API.
Design
TODO
This design builds on the following features:
- Support
KnownLayouttrait and custom DSTs #29 - Support field projection in any
#[repr(transparent)]wrapper type #196
/// A value which might or might not constitute a valid instance of `T`.
// Builds on the custom MaybeUninit type described in #29
pub struct MaybeValid<T: AsMaybeUninit + ?Sized>(MaybeUninit<T>);
// Allows us to use the `project!` macro for field projection (proposed in #196)
unsafe impl<T, F> Projectable<F, AlignedByteArray<F>> for AlignedByteArray<T> {
type Inner = T;
}
impl<T> MaybeValid<T> {
/// Converts this `MaybeValid<T>` to a `T`.
///
/// # Safety
///
/// `self` must contain a valid `T`.
pub const unsafe fn assume_valid(self) -> T { ... }
/// Converts this `&MaybeValid<T>` to a `&T`.
///
/// # Safety
///
/// `self` must contain a valid `T`.
pub const unsafe fn assume_valid_ref(&self) -> &T { ... }
/// Converts this `&mut MaybeValid<T>` to a `&mut T`.
///
/// # Safety
///
/// `self` must contain a valid `T`.
pub unsafe fn assume_valid_mut(&mut self) -> &mut T { ... }
}
/// # Safety
///
/// `is_bit_valid` is correct. If not, can cause UB.
pub unsafe trait TryFromBytes {
fn is_bit_valid(bytes: &MaybeValid<Self>) -> bool;
fn try_from_ref(bytes: &[u8]) -> Option<&Self> {
let maybe_valid = Ref::<_, MaybeValid<T>>::new(bytes)?.into_ref();
if Self::is_bit_valid(maybe_valid) {
// SAFETY: `is_bit_valid` promises that it only returns true if
// its argument contains a valid `T`. This is exactly the safety
// precondition of `MaybeValid::assume_valid_ref`.
Some(unsafe { maybe_valid.assume_valid_ref() })
} else {
None
}
}
fn try_from_mut(bytes: &mut [u8]) -> Option<&mut Self>
where
Self: AsBytes + Sized,
{
let maybe_valid = Ref::<_, MaybeValid<T>>::new(bytes)?.into_mut();
if Self::is_bit_valid(maybe_valid) {
// SAFETY: `is_bit_valid` promises that it only returns true if
// its argument contains a valid `T`. This is exactly the safety
// precondition of `MaybeValid::assume_valid_ref`.
Some(unsafe { maybe_valid.assume_valid_ref() })
} else {
None
}
}
fn try_read_from(bytes: &[u8]) -> Option<Self>
where
Self: Sized
{
let maybe_valid = <MaybeValid<T> as FromBytes>::read_from(bytes)?;
if Self::is_bit_valid(&maybe_valid) {
// SAFETY: `is_bit_valid` promises that it only returns true if
// its argument contains a valid `T`. This is exactly the safety
// precondition of `MaybeValid::assume_valid`.
Some(unsafe { maybe_valid.assume_valid() })
} else {
None
}
}
}Here's an example usage:
/// A type without any safety invariants.
#[derive(TryFromBytes)]
#[repr(C)]
struct MySimpleType {
b: bool,
}
// Code emitted by `derive(TryFromBytes)`
unsafe impl TryFromBytes for MySimpleType {
fn is_bit_valid(bytes: &MaybeValid<Self>) -> bool {
// `project!` is described in #196
let b: &MaybeValid<bool> = project!(&bytes.b);
TryFromBytes::is_bit_valid(b)
}
}
/// A type with invariants encoded using `validate`.
#[derive(TryFromBytes)]
#[TryFromBytes(validator = "validate")]
#[repr(C)]
struct MyComplexType {
b: bool,
}
// Code emitted by `derive(TryFromBytes)`
unsafe impl TryFromBytes for MyComplexType {
fn is_bit_valid(bytes: &AlignedByteArray<Self>) -> bool {
// `project!` is described in #196
let b: &MaybeValid<bool> = project!(&bytes.b);
if !TryFromBytes::is_bit_valid(b) { return false; }
// If there's no interior mutability, then we know this is sound because of preceding
// validation. TODO: What to do about interior mutability?
let slf: &MyComplexType = ...;
MyComplexType::validate(slf)
}
}
impl MyComplexType {
fn validate(slf: &MyComplexType) -> bool { ... }
}Unions
See for a discussion of how to support unions in TryFromBytes: #696
Relationship with other traits
There are obvious relationships between TryFromBytes and the existing FromZeroes and FromBytes traits:
- If a type is
FromZeroes, then it should probably beTryFromBytes(at a minimum, we must know something about the type's layout and bit validity to determine that it is genuinelyFromZeroes)- This implies that we should change
FromZeroesto beFromZeroes: TryFromBytes
- This implies that we should change
- If a type is
FromBytes, then it is triviallyTryFromBytes(whereis_bit_validunconditionally returnstrue)- This implies that we should provide a blanket impl
impl<T: FromBytes> TryFromBytes for T
- This implies that we should provide a blanket impl
Unfortunately, neither of these are possible today.
FromZeroes: TryFromBytes
The reason this bound doesn't work has to do with unsized types. As described in the previous section, working with unsized types is difficult. Luckily for FromZeroes, it doesn't have to do anything with the types it's implemented for - it's just a marker trait. It can happily represent a claim about the bit validity of a type even if that type isn't constructible in practice (over time, FromZeroes will become more useful as more unsized types become constructible). By contrast, TryFromBytes is only useful if we can emit validation code (namely, is_bit_valid). For that reason, we require that TryFromBytes: AsMaybeUninit since that bound is required in order to support the MaybeValid type required by is_bit_valid.
This means that we have two options if we want FromZeroes: TryFromBytes:
- We can keep
TryFromBytes: AsMaybeUninit. As a result, some types which areFromZeroestoday can no longer beFromZeroes, and some blanket impls ofFromZeroeswould require more complex bounds (e.g., today we writeimpl<T: FromZeroes> FromZeroes for Wrapping<T>; under this system, we'd need to writeimpl<T: FromZeroes> FromZeroes for Wrapping<T> where <T as AsMaybeUninit>::MaybeUninit: Sized, or alternatively we'd need to write one impl forTand a different one for[T]). - We could move the
AsMaybeUninitbound out of definition ofTryFromBytesand intois_bit_valid(and callers). As a result, we can keep existing impls ofFromZeroes, but nowT: TryFromBytesis essentially useless - to do anything useful, you need to specifyT: TryFromBytes + AsMaybeUninit.
Neither option seems preferable to just omitting FromZeroes: TryFromBytes. Callers who require both can simply write T: FromZeroes + TryFromBytes.
(Note that the same points apply if we consider FromBytes: TryFromBytes)
impl<T: FromBytes> TryFromBytes for T
This conflicts with other blanket impls which we need for completeness:
impl<T: TryFromBytes> TryFromBytes for [T]impl<const N: usize, T: TryFromBytes> TryFromBytes for [T; N]
As a result, we have to leave TryFromBytes and FromBytes as orthogonal. We may want to make it so that derive(FromBytes) automatically emits an impl of TryFromBytes, although in the general case that may require custom DST support.
Open questions
- Is there any way to recover the blanket impl of
TryFromBytesforT: FromBytes? UnlikeFromZeroes: TryFromBytes, where you may need to perform runtime validation, if you know thatT: FromBytes, then in principle you know thatis_bit_validcan unconditionally returntruewithout inspecting its argument, and so in principle it shouldn't matter whether you can construct aMaybeValid<Self>. Is there some way that we could allowFromBytestypes to specify<Self as AsMaybeUninit>::MaybeUninit = ()or similar in order to bypass the "only sized types or slices can implementAsMaybeUninit" problem?- One approach is to wait until a
KnownLayouttrait lands. There's a good chance that, under that design, we'd end up withFromZeroes: KnownLayout. IfKnownLayout: AsMaybeUninit(or just absorbs the current definition ofAsMaybeUninitinto itself), it'd solve this problem since all zerocopy traits would imply support forMaybeValid.
- One approach is to wait until a
- In the first version of this feature, could we relax the
Self: Sizedbounds ontry_from_refandtry_from_mut(without needing full custom-DST support)? - Should
derive(FromBytes)emit an impl ofTryFromBytes? What about custom DSTs? - What should the behavior for unions be? Should it validate that at least one variant is valid, or that all variants are valid? (This hinges somewhat on the outcome of rust-lang/unsafe-code-guidelines#438.)
- What bounds should we place on
Twhen implementingTryFromBytesforUnalign<T>(#320)?
Future directions
- In this design, we ban interior mutability entirely. For references, this is unavoidable - e.g., if we were to allow types containing
UnsafeCellintry_from_ref, then the user could obtain an&UnsafeCelland a&[u8]view of the same memory, which is unsound (it's unsound to even exist under Stacked Borrows, and unsound to expose to safe code in all cases). For values (i.e.,try_read_from), we'd like to be able to support this - as long as we have some way of performing validation, it should be fine to return anUnsafeCellby value even if its bytes were copied from a&[u8]. Actually supporting this in practice is complicated for a number of reasons, but perhaps a future extension could support it. Reasons it's complicated:is_bit_validoperates on aNonNull<Self>, so interior mutability isn't inherently a problem. However, it needs to be able to call a user's custom validator, which instead operates on a&Self, which is a problem.- Even if we could solve the previous problem somehow, we'd need to have
is_bit_validrequire that it's argument not be either experiencing interior mutation or, under Stacked Borrows, contain anyUnsafeCells at all. When theNonNull<Self>is synthesized from a&[u8], this isn't a problem, but if in the future we want to support type-to-type conditional transmutation, it might be a problem. If, in the future, merely containingUnsafeCells is fine, then we could potentially design a wrapper type which "disables" interior mutation and supports field projection. This might allow us to solve this problem.
- Extend TryFromBytes to support validation context #590
Prior art
The bytemuck crate defines a CheckedBitPattern trait which serves a similar role to the proposed TryFromBytes.
Unlike TryFromBytes, CheckedBitPattern introduces a separate associated Bits type which is a type with the same layout as Self except that all bit patterns are valid. This serves the same role as MaybeValid<Self> in our design. One advantage for the Bits type is that it may be more ergonomic to write validation code for it, which is important for manual implementations of CheckedBitPattern. However, our design expects that manual implementations of TryFromBytes will be very rare. Since CheckedBitPattern's derive doesn't support custom validation, any type with safety invariants would need a manual implementation. By contrast, the TryFromBytes derive's support for a custom validation function means that, from a completeness standpoint, it should never be necessary to implement TryFromBytes manually. The only case in which a manual implementation might be warranted would be for performance reasons.