-
-
Notifications
You must be signed in to change notification settings - Fork 42
New article: VecDeque::resize() optimization #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
90efbfe
New article: VecDeque::resize() optimization
Folyd dad2e7d
Improve the article by ChatGPT.
Folyd f29f4b1
Unpublish the article
Folyd b41b95e
Improve links
Folyd 6c45c7f
Apply suggestions from code review
Folyd 3bd243b
Add an alternative section and mention PR #106564
Folyd 72c6a22
Improve by ChatGPT
Folyd fae3b62
Add release note links
Folyd File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,268 @@ | ||
| > This is the first article on **pr-demystifying** topics. Each article labeled **pr-demystifying** will attempt to demystify the details behind the PR. | ||
|
|
||
| While browsing the rust-lang repository's list of pull requests last month, I came across PR [#104435], titled `VecDeque::resize should re-use the buffer in the passed-in element`. This PR caught my attention because it seemed interesting and I wanted to understand more about it. I began to wonder why it was necessary to optimize `VecDeque::resize()` and how the old version might be lacking. I also wanted to know how the author had optimized the new version. After delving into the code in the PR, I was able to gain a deeper understanding of these issues. | ||
|
|
||
| ## VecDeque::resize() | ||
|
|
||
| Firstly, let's get familiar with the [VecDeque::resize()](https://doc.rust-lang.org/std/collections/struct.VecDeque.html#method.resize). | ||
|
|
||
| > `pub fn resize(&mut self, new_len: usize, value: T)` | ||
| > | ||
| > Modifies the deque in-place so that `len()` is equal to `new_len`, either by removing excess elements from the back or by appending clones of `value` to the back. | ||
|
|
||
| ```rust | ||
| use std::collections::VecDeque; | ||
|
|
||
| let mut buf = VecDeque::new(); | ||
| buf.push_back(5); | ||
| buf.push_back(10); | ||
| buf.push_back(15); | ||
| assert_eq!(buf, [5, 10, 15]); | ||
|
|
||
| buf.resize(2, 0); | ||
| assert_eq!(buf, [5, 10]); | ||
|
|
||
| buf.resize(5, 20); | ||
| assert_eq!(buf, [5, 10, 20, 20, 20]); | ||
| ``` | ||
|
|
||
| The `VecDeque::resize()` API is simple to use. It takes two arguments: the new length to which the `VecDeque` should be resized, and a value to use for any new elements that are added to the `VecDeque` when it expands. | ||
|
|
||
| ## The problem | ||
|
|
||
| However, if we don't look at the implementation details of this function, we might not realize that there is room for optimization. As the PR's author [@scottmcm] pointed out, the old version did not reuse the value that was passed in as the default, resulting in unnecessary cloning of values. | ||
|
|
||
| ```rust | ||
| use std::collections::VecDeque; | ||
|
|
||
| let mut buf = VecDeque::new(); | ||
| buf.resize(5, String::from("rust")); | ||
| ``` | ||
|
|
||
| For example, the string "rust" was cloned five times, even though only four were needed, because the `VecDeque::resize()` function used `VecDeque::resize_with()` underneath, which passed a closure to the `repeat_with().take()`. | ||
|
|
||
| ```rust | ||
| pub fn resize(&mut self, new_len: usize, value: T) { | ||
| self.resize_with(new_len, || value.clone()); | ||
| } | ||
|
|
||
| pub fn resize_with(&mut self, new_len: usize, generator: impl FnMut() -> T) { | ||
| let len = self.len(); | ||
|
|
||
| if new_len > len { | ||
| self.extend(repeat_with(generator).take(new_len - len)) | ||
| } else { | ||
| self.truncate(new_len); | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| This closure was called repeatedly until it reached the `take` limit, causing unnecessary cloning. | ||
|
|
||
| ```rust | ||
| pub fn repeat_with<A, F: FnMut() -> A>(repeater: F) -> RepeatWith<F> { | ||
| RepeatWith { repeater } | ||
| } | ||
|
|
||
| /// An iterator that repeats elements of type `A` endlessly by | ||
| /// applying the provided closure `F: FnMut() -> A`. | ||
| #[derive(Copy, Clone, Debug)] | ||
| pub struct RepeatWith<F> { | ||
| repeater: F, | ||
| } | ||
|
|
||
| impl<A, F: FnMut() -> A> Iterator for RepeatWith<F> { | ||
| type Item = A; | ||
|
|
||
| #[inline] | ||
| fn next(&mut self) -> Option<A> { | ||
| Some((self.repeater)()) | ||
| } | ||
|
|
||
| #[inline] | ||
| fn size_hint(&self) -> (usize, Option<usize>) { | ||
| (usize::MAX, None) | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| Now that we have identified the problem, let's move on to how the author fixed it. | ||
|
|
||
| ## iter::repeat_n | ||
|
|
||
| The most significant change made in PR [#104435] was the replacement of `repeat_with().take()` with `repeat_n()`. | ||
|
|
||
| ```diff | ||
| pub fn resize(&mut self, new_len: usize, value: T) { | ||
| if new_len > self.len() { | ||
| let extra = new_len - self.len(); | ||
| - self.extend(repeat_with(generator).take(new_len - len)) | ||
| + self.extend(repeat_n(value, extra)) | ||
| } else { | ||
| self.truncate(new_len); | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| We can learn more about `repeat_n` from [ACP: Uplift iter::repeat_n from itertools](https://github.com/rust-lang/libs-team/issues/120). The author proposes to uplift [itertools::repeat_n()](https://docs.rs/itertools/latest/itertools/fn.repeat_n.html) into the standard library, just like how [iter::repeat_with()]() has obviated [itertools::repeat_call()](https://docs.rs/itertools/latest/itertools/fn.repeat_call.html). | ||
|
|
||
| How does `repeat_n()` avoid the unnecessary cloning? Let’s dive into the code: | ||
|
|
||
| ```rust | ||
| pub fn repeat_n<T: Clone>(element: T, count: usize) -> RepeatN<T> { | ||
| let mut element = ManuallyDrop::new(element); | ||
|
|
||
| if count == 0 { | ||
| // SAFETY: we definitely haven't dropped it yet, since we only just got | ||
| // passed it in, and because the count is zero the instance we're about | ||
| // to create won't drop it, so to avoid leaking we need to now. | ||
| unsafe { ManuallyDrop::drop(&mut element) }; | ||
| } | ||
|
|
||
| RepeatN { element, count } | ||
| } | ||
|
|
||
| pub struct RepeatN<A> { | ||
| count: usize, | ||
| // Invariant: has been dropped iff count == 0. | ||
| element: ManuallyDrop<A>, | ||
| } | ||
|
|
||
| impl<A: Clone> Iterator for RepeatN<A> { | ||
| type Item = A; | ||
|
|
||
| #[inline] | ||
| fn next(&mut self) -> Option<A> { | ||
| if self.count == 0 { | ||
| return None; | ||
| } | ||
|
|
||
| self.count -= 1; | ||
| Some(if self.count == 0 { | ||
| // SAFETY: the check above ensured that the count used to be non-zero, | ||
| // so element hasn't been dropped yet, and we just lowered the count to | ||
| // zero so it won't be dropped later, and thus it's okay to take it here. | ||
| unsafe { ManuallyDrop::take(&mut self.element) } | ||
| } else { | ||
| A::clone(&mut self.element) | ||
| }) | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| Not too much code, we can easily grasp it. The `RepeatN` struct returned by `repeat_n()` is the point. To save a cloning, `RepeatN` declares its `element` as the `ManuallyDrop` type. | ||
|
|
||
| ## ManuallyDrop\<T\> | ||
|
|
||
| [ManuallyDrop\<T\>](https://doc.rust-lang.org/std/mem/struct.ManuallyDrop.html) is a zero-cost wrapper to inhibit compiler from automatically calling **T**’s destructor unless you call `ManuallyDrop::drop()` method. | ||
|
|
||
| The two most important APIs that can help us avoid unnecessary cloning are: | ||
|
|
||
| - `ManuallyDrop::take()` | ||
|
|
||
| ```rust | ||
| pub unsafe fn take(slot: &mut ManuallyDrop<T>) -> T { | ||
| // SAFETY: we are reading from a reference, which is guaranteed | ||
| // to be valid for reads. | ||
| unsafe { ptr::read(&slot.value) } | ||
| } | ||
| ``` | ||
|
|
||
| - The `DerefMut` implementation | ||
|
|
||
| In the `Iterator` implementation of `RepeatN`, the `next()` method clones the element in each iteration until the `count` reaches 0. In the final iteration, the `ManuallyDrop::take()` function is used to reuse the buffer and avoid an extra clone. | ||
|
|
||
| Wait, but why does `A::clone(&mut self.element)` will get an instance of `A`? The type of `&mut self.element` is `&mut ManuallyDrop`, not `&mut A`. Well, the use of `ManuallyDrop` may seem obscure at first, but it becomes clearer when we consider the `Deref` and `DerefMut` traits that it implements. These traits allow for a type like `&mut ManuallyDrop<A>` to be treated as if it were a type like `&mut A`. This is known as [Deref coercion](https://doc.rust-lang.org/std/ops/trait.DerefMut.html#more-on-deref-coercion). As an example, consider the following code: | ||
|
|
||
| ```rust | ||
| fn main() { | ||
| let mut a = String::from("A"); | ||
| test(&mut a); | ||
| } | ||
|
|
||
| fn test(s: &str) { | ||
| println!("{s}"); | ||
| } | ||
| ``` | ||
|
|
||
| Here, we are able to pass a `&mut String` value to the `test()` function, even though the function's argument type is `&str`. This is because `String` implements both [Deref](https://doc.rust-lang.org/src/alloc/string.rs.html#2455) and [DerefMut](https://doc.rust-lang.org/src/alloc/string.rs.html#2465), allowing it to be treated as if it were a `&str` value. | ||
|
|
||
| Similarly, `ManuallyDrop<A>` also implements `DerefMut`, allowing it to be treated as if it were an `&mut A` value. | ||
|
|
||
| ```rust | ||
| impl<T: ?Sized> const DerefMut for ManuallyDrop<T> { | ||
| #[inline(always)] | ||
| fn deref_mut(&mut self) -> &mut T { | ||
| &mut self.value | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| It's important to note that the `Deref` trait also has a implementation for `&mut T`: | ||
|
|
||
| ```rust | ||
| impl<T: ?Sized> const Deref for &mut T { | ||
| type Target = T; | ||
|
|
||
| fn deref(&self) -> &T { | ||
| *self | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| So `A::clone(&mut self.element)` works because `&mut ManuallyDrop<A>` can convert to `&mut A` due to **Deref coercion**, then `&mut A` can convert to `&A` also due to **Deref coercion**. | ||
|
|
||
| > Thanks to [@scottmcm]'s kindful reminder, we can change `A::clone(&mut self.element)` to `A::clone(&self.element)`. So I submitted a PR([#106564]) to fix it. | ||
|
|
||
| ## Alternative? | ||
|
|
||
| During the review of this article, [@XeCycle] suggested that it might be possible to achieve the same result without using *unsafe* code by using `Option<(NonZeroUsize, T)>` instead. [@XeCycle] provided a [playground example](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=fd82d81d0f485acd769c3b0aaeeb5a3b) to demonstrate this idea. | ||
|
|
||
| However, [@scottmcm] also commented on this issue, explaining that `ManuallyDrop` is a more appropriate choice for this case because it is designed for *niche-filling*. As an example, consider the following code: | ||
|
|
||
| ```rust | ||
| use core::mem::MaybeUninit; | ||
| use core::mem::ManuallyDrop; | ||
| use core::num::NonZeroUsize; | ||
| use core::alloc::Layout; | ||
|
|
||
| fn main() { | ||
| dbg!(Layout::new::<Option<Option<(NonZeroUsize, String)>>>()); | ||
| dbg!(Layout::new::<Option<(usize, MaybeUninit<String>)>>()); | ||
| dbg!(Layout::new::<Option<(usize, ManuallyDrop<String>)>>()); | ||
| } | ||
| ``` | ||
|
|
||
| The output of this code is: | ||
|
|
||
| ``` | ||
| [src/main.rs:6] Layout::new::<Option<Option<(NonZeroUsize, String)>>>() = Layout { | ||
| size: 40, | ||
| align: 8 (1 << 3), | ||
| } | ||
| [src/main.rs:7] Layout::new::<Option<(usize, MaybeUninit<String>)>>() = Layout { | ||
| size: 40, | ||
| align: 8 (1 << 3), | ||
| } | ||
| [src/main.rs:8] Layout::new::<Option<(usize, ManuallyDrop<String>)>>() = Layout { | ||
| size: 32, | ||
| align: 8 (1 << 3), | ||
| } | ||
| ``` | ||
|
|
||
| As we can see, using `Option<(usize, ManuallyDrop<String>)>` results in a smaller size than the other options. Using `ManuallyDrop` allows `Option<RepeatN<String>>` to have the same size as `RepeatN<String>`, which is not the case with the other options. | ||
|
|
||
| > For more information on this topic, you can check out the article PR [#104435]. | ||
|
|
||
| ## Conclusion | ||
|
|
||
| As a Rust developer, I am often most concerned with the changes listed in the stable [release notes](https://github.com/rust-lang/rust/blob/master/RELEASES.md). However, this does not mean that I should not be interested in the individual pull requests (PRs) that are being merged into the project. There are hundreds of PRs merged each week, and each one has a story and an author behind it. That's why I propose the creation of a topic called [#pr-demystifying], where we can share articles about interesting or educational PRs in the Rust community. The PR [#104435], for example, may not be a major optimization, but it allowed me to learn a lot. I would like to thank the author [@scottmcm] for their work on this PR. I hope that this article and others like it will be helpful to others in the community. | ||
|
|
||
| > Thanks to [@scottmcm], [@XeCycle], [@zhanghandong] for proof reading this post! | ||
|
|
||
| [#pr-demystifying]: /topic/pr-demystifying | ||
| [#104435]: https://github.com/rust-lang/rust/pull/104435 | ||
| [#106564]: https://github.com/rust-lang/rust/pull/106564 | ||
| [@scottmcm]: https://github.com/scottmcm | ||
| [@XeCycle]: https://github.com/XeCycle | ||
| [@zhanghandong]: https://github.com/zhanghandong | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.