Skip to content

Commit 2b38396

Browse files
committed
initial commit
1 parent 8fb8742 commit 2b38396

File tree

1 file changed

+304
-0
lines changed

1 file changed

+304
-0
lines changed

text/0000-memory-model-strike-team.md

Lines changed: 304 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,304 @@
1+
- Feature Name: N/A
2+
- Start Date: (fill me in with today's date, YYYY-MM-DD)
3+
- RFC PR: (leave this empty)
4+
- Rust Issue: (leave this empty)
5+
6+
# Summary
7+
[summary]: #summary
8+
9+
Incorporate a strike team dedicated to preparing rules and guidelines
10+
for writing unsafe code in Rust (commonly referred to as Rust's
11+
"memory model"), in cooperation with the lang team. The discussion
12+
will generally proceed in phases, starting with establishing
13+
high-level principles and gradually getting down to the nitty gritty
14+
details (though some back and forth is expected). The strike team will
15+
produce various intermediate documents that will be submitted as
16+
normal RFCs.
17+
18+
# Motivation
19+
[motivation]: #motivation
20+
21+
Rust's safe type system offers very strong aliasing information that
22+
promises to be a rich source of compiler optimization. For example,
23+
in safe code, the compiler can infer that if a function takes two
24+
`&mut T` parameters, those two parameters must reference disjoint
25+
areas of memory (this allows optimizations similar to C99's `restrict`
26+
keyword, except that it is both automatic and fully enforced). The
27+
compiler also knows that given a shared reference type `&T`, the
28+
referent is immutable, except for data contained in an `UnsafeCell`.
29+
30+
Unfortunately, there is a fly in the ointment. Unsafe code can easily
31+
be made to violate these sorts of rules. For example, using unsafe
32+
code, it is trivial to create two `&mut` references that both refer to
33+
the same memory (and which are simultaneously usable). In that case,
34+
if the unsafe code were to (say) return those two points to safe code,
35+
that would undermine Rust's safety guarantees -- hence it's clear that
36+
this code would be "incorrect".
37+
38+
But things become more subtle when we just consider what happens
39+
*within* the abstraction. For example, is unsafe code allowed to use
40+
two overlapping `&mut` references internally, without returning it to
41+
the wild? Is it all right to overlap with `*mut`? And so forth.
42+
43+
It is the contention of this RFC that a complete guidelines for unsafe
44+
code are far too big a topic to be fruitfully addressed in a single
45+
RFC. Therefore, this RFC proposes the formation of a dedicated
46+
**strike team** (that is, a temporary, single-purpose team) that will
47+
work on hammering out the details over time.
48+
49+
The unsafe guidelines work will proceed in rough stages, described
50+
below. An initial goal is to produce a **high-level summary detailing
51+
the general approach of the guidelines.** Ideally, this summary should
52+
be sufficient to help guide unsafe authors in best practices that are
53+
most likely to be forwards compatible. Further work will then expand
54+
on the model to produce a more **detailed set of rules**, which may in
55+
turn require revisiting the high-level summary if contradictions are
56+
uncovered.
57+
58+
This new "unsafe code" strike team is intended to work in
59+
collaboration with the existing lang team. Ultimately, whatever rules
60+
are crafted must be adopted with the **general consensus of both the
61+
strike team and the lang team**. It is expected that lang team members
62+
will be more involved in the early discussions that govern the overall
63+
direction and less involved in the fine details.
64+
65+
#### History and recent discussions
66+
67+
The history of optimizing C can be instructive. All code in C is
68+
effectively unsafe, and so in order to perform optimizations,
69+
compilers have come to lean heavily on the notion of "undefined
70+
behavior" as well as various ad-hoc rules about what programs ought
71+
not to do (see e.g. [these][cl1] [three][cl2] [posts][cl3] entitled
72+
"What Every C Programmer Should Know About Undefined Behavior", by
73+
Chris Lattner). This can cause some very surprising behavior (see e.g.
74+
["What Every Compiler Author Should Know About Programmers"][cap] or
75+
[this blog post by John Regehr][jr], which is quite humorous). Note that
76+
Rust has a big advantage over C here, in that only the authors of
77+
unsafe code should need to worry about these rules.
78+
79+
[cl1]: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
80+
[cl2]: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html
81+
[cl3]: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_21.html
82+
[cap]: http://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_2015_submission_29.pdf
83+
[jr]: http://blog.regehr.org/archives/761
84+
85+
In terms of Rust itself, there has been a large amount of discussion
86+
over the years. Here is a (non-comprehensive) set of relevant links,
87+
with a strong bias towards recent discussion:
88+
89+
- [RFC Issue #1447](https://github.com/rust-lang/rfcs/issues/1447) provides
90+
a general set of links as well as some discussion.
91+
- [RFC #1578](https://github.com/rust-lang/rfcs/pull/1578) is an initial
92+
proposal for a Rust memory model by ubsan.
93+
- The
94+
[Tootsie Pop](http://smallcultfollowing.com/babysteps/blog/2016/05/27/the-tootsie-pop-model-for-unsafe-code/)
95+
blog post by nmatsakis proposed an alternative approach, building on
96+
[background about unsafe abstractions](http://smallcultfollowing.com/babysteps/blog/2016/05/23/unsafe-abstractions/)
97+
described in an earlir post. There is also a lot of valuable
98+
discussion in
99+
[the corresponding internals thread](http://smallcultfollowing.com/babysteps/blog/2016/05/23/unsafe-abstractions/).
100+
101+
#### Other factors
102+
103+
Another factor that must be considered is the interaction with weak
104+
memory models. Most of the links above focus purely on sequential
105+
code: Rust has more-or-less adopted the C++ memory model for governing
106+
interactions across threads. But there may well be subtle cases that
107+
arise we delve deeper. For more on the C++ memory model, see
108+
[Hans Boehm's excellent webpage](http://www.hboehm.info/c++mm/).
109+
110+
# Detailed design
111+
[design]: #detailed-design
112+
113+
## Scope
114+
115+
Here are some of the issues that should be resolved as part of these
116+
unsafe code guidelines. The following list is not intended as
117+
comprehensive (suggestions for additions welcome):
118+
119+
- Legal aliasing rules and patterns of memory accesses
120+
- e.g., which of the patterns listed in [rust-lang/rust#19733](https://github.com/rust-lang/rust/issues/19733)
121+
are legal?
122+
- can unsafe code create (but not use) overlapping `&mut`?
123+
- under what conditions is it legal to dereference a `*mut T`?
124+
- when can an `&mut T` legally alias an `*mut T`?
125+
- Struct layout guarantees
126+
- Interactions around zero-sized types
127+
- e.g., what pointer values can legally be considered a `Box<ZST>`?
128+
- Allocator dependencies
129+
130+
One specific area that we can hopefully "outsource" is detailed rules
131+
regarding the interaction of different threads. Rust exposes atomics
132+
that roughly correspond to C++11 atomics, and the intention is that we
133+
can layer our rules for sequential execution atop those rules for
134+
parallel execution.
135+
136+
## Time frame
137+
138+
Working out a a set of rules for unsafe code is a detailed process and
139+
is expected to take months (or longer, depending on the level of
140+
detail we ultimately aim for). However, the intention is to publish
141+
preliminary documents as RFCs as we go, so hopefully we can be
142+
providing ever more specific guidance for unsafe code authors.
143+
144+
Note that even once an initial set of guidelines is adopted, problems
145+
or inconsistencies may be found. If that happens, the guidelines will
146+
be adjusted as needed to correct the problem, naturally with an eye
147+
towards backwards compatibility. In other words, the unsafe
148+
guidelines, like the rules for Rust language itself, should be
149+
considered a "living document".
150+
151+
As a note of caution, experience from other languages such as Java or
152+
C++ suggests that the work on memory models can take years. Moreover,
153+
even once a memory model is adopted, it can be unclear whether
154+
[common compiler optimizations are actually permitted](http://www.di.ens.fr/~zappa/readings/c11comp.pdf)
155+
under the model. The hope is that by focusing on sequential and
156+
Rust-specific issues we can sidestep some of these quandries.
157+
158+
## Intermediate documents
159+
160+
Because hammering out the finer points of the memory model is expected
161+
to possibly take some time, it is important to produce intermediate
162+
agreements. This section describes some of the documents that may be
163+
useful. These also serve as a rough guideline to the overall "phases"
164+
of discussion that are expected, though in practice discussion will
165+
likely go back and forth:
166+
167+
- **Key examples and optimizations**: highlighting code examples that
168+
ought to work, or optimizations we should be able to do, as well as
169+
some that will not work, or those whose outcome is in doubt.
170+
- **High-level design**: describe the rules at a high-level. This
171+
would likely be the document that unsafe code authors would read to
172+
know if their code is correct in the majority of scenarios. Think of
173+
this as the "user's guide".
174+
- **Detailed rules**: More comprehensive rules. Think of this as the
175+
"reference manual".
176+
177+
Note that both the "high-level design" and "detailed rules", once
178+
considered complete, will be submitted as RFCs and undergo the usual
179+
final comment period.
180+
181+
### Key examples and optimizations
182+
183+
Probably a good first step is to agree on some key examples and
184+
overall principles. Examples would fall into several categories:
185+
186+
- Unsafe code that we feel **must** be considered **legal** by any model
187+
- Unsafe code that we feel **must** be considered **illegal** by any model
188+
- Unsafe code that we feel **may or may not** be considered legal
189+
- Optimizations that we **must** be able to perform
190+
- Optimizations that we **should not** expect to be able to perform
191+
- Optimizations that it would be nice to have, but which may be sacrificed
192+
if needed
193+
194+
Having such guiding examples naturally helps to steer the effort, but
195+
it also helps to provide guidance for unsafe code authors in the
196+
meantime. These examples illustrate patterns that one can adopt with
197+
reasonable confidence.
198+
199+
Deciding about these examples should also help in enumerating the
200+
guiding principles we would like to adhere to. The design of a memory
201+
model ultimately requires balancing several competing factors and it
202+
may be useful to state our expectations up front on how these will be
203+
weighed:
204+
205+
- **Optimization.** The stricter the rules, the more we can optimize.
206+
- on the other hand, rules that are overly strict may prevent people
207+
from writing unsafe code that they would like to write, ultimately
208+
leading to slower exeution.
209+
- **Comprehensibility.** It is important to strive for rules that end
210+
users can readily understand. If learning the rules requires diving
211+
into academic papers or using Coq, it's a non-starter.
212+
- **Effect on existing code.** No matter what model we adopt, existing
213+
unsafe code may or may not comply. If we then proceed to optimize,
214+
this could cause running code to stop working. While
215+
[RFC 1122](https://github.com/rust-lang/rfcs/blob/master/text/1122-language-semver.md)
216+
explicitly specified that the rules for unsafe code may change, we
217+
will have to decide where to draw the line in terms of how much to
218+
weight backwards compatibility.
219+
220+
It is expected that the lang team will be **highly involved** in this discussion.
221+
222+
It is also expected that we will gather examples in the following ways:
223+
224+
- survey existing unsafe code;
225+
- solicit suggestions of patterns from the Rust-using public:
226+
- scenarios where they would like an official judgement;
227+
- interesting questions involving the standard library.
228+
229+
### High-level design
230+
231+
The next document to produce is to settle on a high-level
232+
design. There have already been several approaches floated. This phase
233+
should build on the examples from before, in that proposals can be
234+
weighed against their effect on the examples and optimizations.
235+
236+
There will likely also be some feedback between this phase and the
237+
previosu: as new proposals are considered, that may generate new
238+
examples that were not relevant previously.
239+
240+
Note that even once a high-level design is adopted, it will be
241+
considered "tentative" and "unstable" until the detailed rules have
242+
been worked out to a reasonable level of confidence.
243+
244+
Once a high-level design is adopted, it may also be used by the
245+
compiler team to inform which optimizations are legal or illegal.
246+
However, if changes are later made, the compiler will naturally have
247+
to be adjusted to match.
248+
249+
It is expected that the lang team will be **highly involved** in this discussion.
250+
251+
### Detailed rules
252+
253+
Once we've settled on a high-level path -- and, no doubt, while in the
254+
process of doing so as well -- we can begin to enumerate more detailed
255+
rules. It is also expected that working out the rules may uncover
256+
contradictions or other problems that require revisiting the
257+
high-level design.
258+
259+
### Lints and other checkers
260+
261+
Ideally, the team will also consider whether automated checking for
262+
conformance is possible. It is not a responsibility of this strike
263+
team to produce such automated checking, but automated checking is
264+
naturally a big plus!
265+
266+
## Repository
267+
268+
In general, the memory model discussion will be centered on a specific
269+
repository (perhaps
270+
<https://github.com/nikomatsakis/rust-memory-model>, but perhaps moved
271+
to the rust-lang organization). This allows for multi-faced
272+
discussion: for example, we can open issues on particular questions,
273+
as well as storing the various proposals and litmus tests in their own
274+
directories. We'll work out and document the procedures and
275+
conventions here as we go.
276+
277+
# Drawbacks
278+
[drawbacks]: #drawbacks
279+
280+
The main drawback is that this discussion will require time and energy
281+
which could be spent elsewhere. The justification for spending time on
282+
developing the memory model instead is that it is crucial to enable
283+
the compiler to perform aggressive optimizations. Until now, we've
284+
limited ourselves by and large to conservative optimizations (though
285+
we do supply some LLVM aliasing hints that can be affected by unsafe
286+
code). As the transition to MIR comes to fruition, it is clear that we
287+
will be in a place to perform more aggressive optimization, and hence
288+
the need for rules and guidelines is becoming more acute. We can
289+
continue to adopt a conservative course, but this risks growing an
290+
ever larger body of code dependent on the compiler not performing
291+
aggressive optimization, which may close those doors forever.
292+
293+
# Alternatives
294+
[alternatives]: #alternatives
295+
296+
- Adopt a memory model in one fell swoop:
297+
- considered too complicated
298+
- Defer adopting a memory model for longer:
299+
- considered too risky
300+
301+
# Unresolved questions
302+
[unresolved]: #unresolved-questions
303+
304+
None.

0 commit comments

Comments
 (0)