Skip to content

Conversation

@ollie-etl
Copy link
Contributor

@ollie-etl ollie-etl commented Jan 19, 2023

Extends both registries to accept any Iter of IoBufMut, instead of Vec.

This is motivated by my usecase, which cannot safely provide its memory as a Vec. Whilst I can unsafely create one,
it would be invalid to call drop on the created Vec

This implementation, which explicitly keeps the backing buffers present, rather then mem::forget and recreate means the Drop handling is preserved.

As a trivial example, this would now admit Box<[u8]> as a buffer.

I realize I could provide my own implementation of the pools, but it seem unnecessary - the generalized version serves both purposes. The overhead is an additional allocation on pool creation, although this will be optimized away by in place specialisation, for the common use case.

Extends both registries to accept any `Iter` of `IoBufMut`, instead
of Vec<u8>.

This is motivated by my usecase, which cannot safely provide its memory
as a Vec, and needs custom cleanup.

This implementation, which explcitly keeps the backing buffers present,
rather then mem::forget and recreate means the Drop handling is preserved.

As a trivial example, this would now admit Box<[u8]> as a buffer.
@ollie-etl
Copy link
Contributor Author

@mzabaluev and @FrankReh I'd appreciate your feedback.

@FrankReh
Copy link
Collaborator

This looks good to me. The way drops are handled now, would it let me carve out the buffers from a single mmap'ed block if I wrote the iterator to unsafely step through the allocated memory? That's been my last sticking point as I think some kernel interfaces are more efficient with page aligned buffers or in some cases, even require page aligned buffers.

@mzabaluev
Copy link
Contributor

The way drops are handled now, would it let me carve out the buffers from a single mmap'ed block

I think that's what the system allocator does in some cases, or maybe that's only if the allocation sizes are sufficiently big.

some kernel interfaces are more efficient with page aligned buffers or in some cases, even require page aligned buffers.

I believe this does not apply to io-uring: I/O operations and IORING_REGISTER_BUFFERS do not require page alignment. The latter, in fact, locates and maps the pages that contain the buffers, precisely to make operations with fixed buffers maximally efficient.

@FrankReh
Copy link
Collaborator

I believe this does not apply to io-uring: I/O operations and IORING_REGISTER_BUFFERS do not require page alignment

I beg to differ. (But I'm often wrong.) The tests that come with the liburing C library itself use O_DIRECT in many places, as well as mmap for that very reason.

When I have an app that will use the same buffers countless times for I/O, I will certainly want them page aligned. For small use cases, I won't even go to the trouble of registering the buffers first.

@ollie-etl
Copy link
Contributor Author

This looks good to me. The way drops are handled now, would it let me carve out the buffers from a single mmap'ed block if I wrote the iterator to unsafely step through the allocated memory? That's been my last sticking point as I think some kernel interfaces are more efficient with page aligned buffers or in some cases, even require page aligned buffers.

@FrankReh Yes: thats what I do with it.

ollie-etl added a commit to ollie-etl/tokio-uring that referenced this pull request Jan 20, 2023
Copy link
Collaborator

@FrankReh FrankReh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it.

This and the related PR about supporting a next Future are making this feel like a very tight aspect of the crate.

@mzabaluev Would you be willing to give this a thumbs up?

@ollie-etl
Copy link
Contributor Author

@FrankReh are all of your concerns addressed?

@ollie-etl
Copy link
Contributor Author

@mzabaluev, is there anything you would like addressing?

@ollie-etl
Copy link
Contributor Author

ollie-etl commented Jan 20, 2023

@FrankReh I wanted to reply to this.

This looks good to me. The way drops are handled now, would it let me carve out the buffers from a single mmap'ed block if I wrote the iterator to unsafely step through the allocated memory? That's been my last sticking point as I think some kernel interfaces are more efficient with page aligned buffers or in some cases, even require page aligned buffers.

You can do this in a unsafe way if that is your use case. Your iterator will produce into something like Segment, and @mzabaluev is correct that normally you'd pay the (small) Rc penalty. They'd be something like

struct Segment {
   backing_buffer: Rc<MmapRaw>,
   ptr: *mut u8,
   init: usize,
   len: usize
}

However, because you know they that if the pool is being dropped, none are in use, and that all remain alive until the pool is dropped, you can replace backing_buffer with Option<MmapRaw, with all but a single Segment being None, removing the Rc overhead. Yuk, I know.

@FrankReh
Copy link
Collaborator

You can do this in a unsafe way if that is your use case. Your iterator will produce into something like Segment, and @mzabaluev is correct that normally you'd pay the (small) Rc penalty.

Thanks. I don't mind tapping into unsafe. I had hoped the iterator could be implemented programmatically, but really just to keep from polluting the heap with unnecessary stuff - not to safe space or startup/tear-down time really. For me, setting up a pool would be once at app, or thread start up, and then the buffers get reused for hours or days before a reboot anyway.

Also for pools that I want to use, I don't see that tracking init per buf is important. If the compiler lets me, I would avoid tracking init and simply zero all the memory ahead of time.

Copy link
Collaborator

@FrankReh FrankReh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. I'll plan on merging tomorrow to give others time to still comment. Thanks!

@mzabaluev
Copy link
Contributor

mzabaluev commented Jan 21, 2023

I still think this is a wrong place to extend, because it's an imperfect fit for either of the discussed use cases:

  1. It introduces redundant bookkeeping in the Vec<u8> case.
  2. It forces the "segments out of a large mmap" case into extra refcounting in the buffer handles.

I like generics, but not when they de-optimize their most used instantiations. And we already have an extension point in the FixedBuffers vtable, which has a runtime cost. So we might as well amortize the flexibility into that.

Also for pools that I want to use, I don't see that tracking init per buf is important. If the compiler lets me, I would avoid tracking init and simply zero all the memory ahead of time.

While this is a special case where it may be not important (I think zero-initializing is better than admitting UB into your program, though), the proposed change introduces a generic implementation that should work correctly for all buffer types.
Hence why I'm suggesting to address specialized cases at the FixedBuffers level which the current design already allows (pending making the trait public).

@ollie-etl
Copy link
Contributor Author

ollie-etl commented Jan 21, 2023

  1. It introduces redundant bookkeeping in the Vec case.

I really don't follow this logic, sorry. I see no extra overhead.

@mzabaluev
Copy link
Contributor

When I have an app that will use the same buffers countless times for I/O, I will certainly want them page aligned.

I believe you get fairly aligned buffers with something like iter::repeat_with(|| Vec::with_capacity(16_384)).take(N).
Also, I've learned that any speculative talks about performance are empty until demonstrated with an actual benchmark.

@mzabaluev
Copy link
Contributor

mzabaluev commented Jan 21, 2023

  1. It introduces redundant bookkeeping in the Vec case.

I really don't follow this logic, sorry. I see no extra overhead.

A member of the buffers array has exactly the same information as the corresponding iovec and the init_len member of the BufState::Free struct. Furthermore, it's currently allowed to go out of sync with the init_len value that is updated on buffer check-ins, which technically results in UB when the buffers' Vec<u8> instances are dropped is probably OK for this specific case, but not generically.

@ollie-etl
Copy link
Contributor Author

ollie-etl commented Jan 21, 2023

A member of the buffers array has exactly the same information as the corresponding iovec and the init_len member of the BufState::Free struct.

Yes. But:

  1. its no more memory allocated, the buffers are present but "forgotten" in the current implementation.
  2. No extra accounting is done on those buffers (until registry drop)

Are you objecting to the presence of the Vec, (pointer, len and capacity) stored?

Furthermore, it's currently allowed to go out of sync with the init_len value that is updated on buffer check-ins, which technically results in UB when the buffers' Vec instances are dropped is probably OK for this specific case, but not generically.

Its not UB. However, i did update the drop implementation to update the buffers on drop, should they be some Rc backed implementation which takes them elsewhere, and you want to know how many bytes currently rest in them. I'd be very upset if the compiler didn't strip that completely in the case you do just drop, so don't believe this is an overhead.

@ollie-etl
Copy link
Contributor Author

ollie-etl commented Jan 21, 2023

Hmm, that last change does appear to have broken test - will check

Fixed, I think the original had a bug which has just been exposed

@ollie-etl
Copy link
Contributor Author

@FrankReh You'll want to review the changes here also. I don't think they change the Pr much

@FrankReh
Copy link
Collaborator

I have to wait until much later today to look. Family dentistry emergency.

I like the discussion, when it's not personal. It seems the code is getting better as the two main protagonists here are talking.

I'll remind everyone the whole idea behind supporting uring is to get at performance gains between the userland and the kernel/hardware and we generally do it in good faith, without benchmarks to prove that we are on the right track. And my opinion is we try to present the most generic approach we can that fits into the boundaries of the tokio ecosystem. If things work out of the box, that makes for a nice user experience.

(I hope my opinions aren't used against me.)

Thanks. (And a reminder for everyone to take care of their teeth while they can.)

@ollie-etl
Copy link
Contributor Author

ollie-etl commented Jan 21, 2023

I like the discussion, when it's not personal. It seems the code is getting better as the two main protagonists here are talking.

Indeed. Whilst its true i can find getting things reviewed in this particular repository frustrating, I'll have to admit the review always resulted in better code. As for contributers, I've nothing but thanks for people who give theirs (or their companies time) for improving OS code. I'll note that a lot of time seems to occur here out of hours.

Copy link
Collaborator

@FrankReh FrankReh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking good. A lot of brain cycles have been put into this little corner of the crate.

Copy link
Collaborator

@FrankReh FrankReh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One unresolved comment left I think.

@ollie-etl
Copy link
Contributor Author

ollie-etl commented Jan 23, 2023

All comments addressed! LGTM?

Copy link
Collaborator

@FrankReh FrankReh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well done sir!

@FrankReh FrankReh merged commit 0acb432 into tokio-rs:master Jan 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants