Skip to content

Commit af83792

Browse files
committed
initial commit
1 parent 8fb8742 commit af83792

File tree

1 file changed

+306
-0
lines changed

1 file changed

+306
-0
lines changed

text/0000-memory-model-strike-team.md

+306
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,306 @@
1+
- Feature Name: N/A
2+
- Start Date: (fill me in with today's date, YYYY-MM-DD)
3+
- RFC PR: (leave this empty)
4+
- Rust Issue: (leave this empty)
5+
6+
# Summary
7+
[summary]: #summary
8+
9+
Incorporate a strike team dedicated to preparing rules and guidelines
10+
for writing unsafe code in Rust (commonly referred to as Rust's
11+
"memory model"), in cooperation with the lang team. The discussion
12+
will generally proceed in phases, starting with establishing
13+
high-level principles and gradually getting down to the nitty gritty
14+
details (though some back and forth is expected). The strike team will
15+
produce various intermediate documents that will be submitted as
16+
normal RFCs.
17+
18+
# Motivation
19+
[motivation]: #motivation
20+
21+
Rust's safe type system offers very strong aliasing information that
22+
promises to be a rich source of compiler optimization. For example,
23+
in safe code, the compiler can infer that if a function takes two
24+
`&mut T` parameters, those two parameters must reference disjoint
25+
areas of memory (this allows optimizations similar to C99's `restrict`
26+
keyword, except that it is both automatic and fully enforced). The
27+
compiler also knows that given a shared reference type `&T`, the
28+
referent is immutable, except for data contained in an `UnsafeCell`.
29+
30+
Unfortunately, there is a fly in the ointment. Unsafe code can easily
31+
be made to violate these sorts of rules. For example, using unsafe
32+
code, it is trivial to create two `&mut` references that both refer to
33+
the same memory (and which are simultaneously usable). In that case,
34+
if the unsafe code were to (say) return those two points to safe code,
35+
that would undermine Rust's safety guarantees -- hence it's clear that
36+
this code would be "incorrect".
37+
38+
But things become more subtle when we just consider what happens
39+
*within* the abstraction. For example, is unsafe code allowed to use
40+
two overlapping `&mut` references internally, without returning it to
41+
the wild? Is it all right to overlap with `*mut`? And so forth.
42+
43+
It is the contention of this RFC that a complete guidelines for unsafe
44+
code are far too big a topic to be fruitfully addressed in a single
45+
RFC. Therefore, this RFC proposes the formation of a dedicated
46+
**strike team** (that is, a temporary, single-purpose team) that will
47+
work on hammering out the details over time. Precise membership of
48+
this team is not part of this RFC, but will be determined by the lang
49+
team as well as the strike team itself.
50+
51+
The unsafe guidelines work will proceed in rough stages, described
52+
below. An initial goal is to produce a **high-level summary detailing
53+
the general approach of the guidelines.** Ideally, this summary should
54+
be sufficient to help guide unsafe authors in best practices that are
55+
most likely to be forwards compatible. Further work will then expand
56+
on the model to produce a more **detailed set of rules**, which may in
57+
turn require revisiting the high-level summary if contradictions are
58+
uncovered.
59+
60+
This new "unsafe code" strike team is intended to work in
61+
collaboration with the existing lang team. Ultimately, whatever rules
62+
are crafted must be adopted with the **general consensus of both the
63+
strike team and the lang team**. It is expected that lang team members
64+
will be more involved in the early discussions that govern the overall
65+
direction and less involved in the fine details.
66+
67+
#### History and recent discussions
68+
69+
The history of optimizing C can be instructive. All code in C is
70+
effectively unsafe, and so in order to perform optimizations,
71+
compilers have come to lean heavily on the notion of "undefined
72+
behavior" as well as various ad-hoc rules about what programs ought
73+
not to do (see e.g. [these][cl1] [three][cl2] [posts][cl3] entitled
74+
"What Every C Programmer Should Know About Undefined Behavior", by
75+
Chris Lattner). This can cause some very surprising behavior (see e.g.
76+
["What Every Compiler Author Should Know About Programmers"][cap] or
77+
[this blog post by John Regehr][jr], which is quite humorous). Note that
78+
Rust has a big advantage over C here, in that only the authors of
79+
unsafe code should need to worry about these rules.
80+
81+
[cl1]: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html
82+
[cl2]: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html
83+
[cl3]: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_21.html
84+
[cap]: http://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_2015_submission_29.pdf
85+
[jr]: http://blog.regehr.org/archives/761
86+
87+
In terms of Rust itself, there has been a large amount of discussion
88+
over the years. Here is a (non-comprehensive) set of relevant links,
89+
with a strong bias towards recent discussion:
90+
91+
- [RFC Issue #1447](https://github.com/rust-lang/rfcs/issues/1447) provides
92+
a general set of links as well as some discussion.
93+
- [RFC #1578](https://github.com/rust-lang/rfcs/pull/1578) is an initial
94+
proposal for a Rust memory model by ubsan.
95+
- The
96+
[Tootsie Pop](http://smallcultfollowing.com/babysteps/blog/2016/05/27/the-tootsie-pop-model-for-unsafe-code/)
97+
blog post by nmatsakis proposed an alternative approach, building on
98+
[background about unsafe abstractions](http://smallcultfollowing.com/babysteps/blog/2016/05/23/unsafe-abstractions/)
99+
described in an earlir post. There is also a lot of valuable
100+
discussion in
101+
[the corresponding internals thread](http://smallcultfollowing.com/babysteps/blog/2016/05/23/unsafe-abstractions/).
102+
103+
#### Other factors
104+
105+
Another factor that must be considered is the interaction with weak
106+
memory models. Most of the links above focus purely on sequential
107+
code: Rust has more-or-less adopted the C++ memory model for governing
108+
interactions across threads. But there may well be subtle cases that
109+
arise we delve deeper. For more on the C++ memory model, see
110+
[Hans Boehm's excellent webpage](http://www.hboehm.info/c++mm/).
111+
112+
# Detailed design
113+
[design]: #detailed-design
114+
115+
## Scope
116+
117+
Here are some of the issues that should be resolved as part of these
118+
unsafe code guidelines. The following list is not intended as
119+
comprehensive (suggestions for additions welcome):
120+
121+
- Legal aliasing rules and patterns of memory accesses
122+
- e.g., which of the patterns listed in [rust-lang/rust#19733](https://github.com/rust-lang/rust/issues/19733)
123+
are legal?
124+
- can unsafe code create (but not use) overlapping `&mut`?
125+
- under what conditions is it legal to dereference a `*mut T`?
126+
- when can an `&mut T` legally alias an `*mut T`?
127+
- Struct layout guarantees
128+
- Interactions around zero-sized types
129+
- e.g., what pointer values can legally be considered a `Box<ZST>`?
130+
- Allocator dependencies
131+
132+
One specific area that we can hopefully "outsource" is detailed rules
133+
regarding the interaction of different threads. Rust exposes atomics
134+
that roughly correspond to C++11 atomics, and the intention is that we
135+
can layer our rules for sequential execution atop those rules for
136+
parallel execution.
137+
138+
## Time frame
139+
140+
Working out a a set of rules for unsafe code is a detailed process and
141+
is expected to take months (or longer, depending on the level of
142+
detail we ultimately aim for). However, the intention is to publish
143+
preliminary documents as RFCs as we go, so hopefully we can be
144+
providing ever more specific guidance for unsafe code authors.
145+
146+
Note that even once an initial set of guidelines is adopted, problems
147+
or inconsistencies may be found. If that happens, the guidelines will
148+
be adjusted as needed to correct the problem, naturally with an eye
149+
towards backwards compatibility. In other words, the unsafe
150+
guidelines, like the rules for Rust language itself, should be
151+
considered a "living document".
152+
153+
As a note of caution, experience from other languages such as Java or
154+
C++ suggests that the work on memory models can take years. Moreover,
155+
even once a memory model is adopted, it can be unclear whether
156+
[common compiler optimizations are actually permitted](http://www.di.ens.fr/~zappa/readings/c11comp.pdf)
157+
under the model. The hope is that by focusing on sequential and
158+
Rust-specific issues we can sidestep some of these quandries.
159+
160+
## Intermediate documents
161+
162+
Because hammering out the finer points of the memory model is expected
163+
to possibly take some time, it is important to produce intermediate
164+
agreements. This section describes some of the documents that may be
165+
useful. These also serve as a rough guideline to the overall "phases"
166+
of discussion that are expected, though in practice discussion will
167+
likely go back and forth:
168+
169+
- **Key examples and optimizations**: highlighting code examples that
170+
ought to work, or optimizations we should be able to do, as well as
171+
some that will not work, or those whose outcome is in doubt.
172+
- **High-level design**: describe the rules at a high-level. This
173+
would likely be the document that unsafe code authors would read to
174+
know if their code is correct in the majority of scenarios. Think of
175+
this as the "user's guide".
176+
- **Detailed rules**: More comprehensive rules. Think of this as the
177+
"reference manual".
178+
179+
Note that both the "high-level design" and "detailed rules", once
180+
considered complete, will be submitted as RFCs and undergo the usual
181+
final comment period.
182+
183+
### Key examples and optimizations
184+
185+
Probably a good first step is to agree on some key examples and
186+
overall principles. Examples would fall into several categories:
187+
188+
- Unsafe code that we feel **must** be considered **legal** by any model
189+
- Unsafe code that we feel **must** be considered **illegal** by any model
190+
- Unsafe code that we feel **may or may not** be considered legal
191+
- Optimizations that we **must** be able to perform
192+
- Optimizations that we **should not** expect to be able to perform
193+
- Optimizations that it would be nice to have, but which may be sacrificed
194+
if needed
195+
196+
Having such guiding examples naturally helps to steer the effort, but
197+
it also helps to provide guidance for unsafe code authors in the
198+
meantime. These examples illustrate patterns that one can adopt with
199+
reasonable confidence.
200+
201+
Deciding about these examples should also help in enumerating the
202+
guiding principles we would like to adhere to. The design of a memory
203+
model ultimately requires balancing several competing factors and it
204+
may be useful to state our expectations up front on how these will be
205+
weighed:
206+
207+
- **Optimization.** The stricter the rules, the more we can optimize.
208+
- on the other hand, rules that are overly strict may prevent people
209+
from writing unsafe code that they would like to write, ultimately
210+
leading to slower exeution.
211+
- **Comprehensibility.** It is important to strive for rules that end
212+
users can readily understand. If learning the rules requires diving
213+
into academic papers or using Coq, it's a non-starter.
214+
- **Effect on existing code.** No matter what model we adopt, existing
215+
unsafe code may or may not comply. If we then proceed to optimize,
216+
this could cause running code to stop working. While
217+
[RFC 1122](https://github.com/rust-lang/rfcs/blob/master/text/1122-language-semver.md)
218+
explicitly specified that the rules for unsafe code may change, we
219+
will have to decide where to draw the line in terms of how much to
220+
weight backwards compatibility.
221+
222+
It is expected that the lang team will be **highly involved** in this discussion.
223+
224+
It is also expected that we will gather examples in the following ways:
225+
226+
- survey existing unsafe code;
227+
- solicit suggestions of patterns from the Rust-using public:
228+
- scenarios where they would like an official judgement;
229+
- interesting questions involving the standard library.
230+
231+
### High-level design
232+
233+
The next document to produce is to settle on a high-level
234+
design. There have already been several approaches floated. This phase
235+
should build on the examples from before, in that proposals can be
236+
weighed against their effect on the examples and optimizations.
237+
238+
There will likely also be some feedback between this phase and the
239+
previosu: as new proposals are considered, that may generate new
240+
examples that were not relevant previously.
241+
242+
Note that even once a high-level design is adopted, it will be
243+
considered "tentative" and "unstable" until the detailed rules have
244+
been worked out to a reasonable level of confidence.
245+
246+
Once a high-level design is adopted, it may also be used by the
247+
compiler team to inform which optimizations are legal or illegal.
248+
However, if changes are later made, the compiler will naturally have
249+
to be adjusted to match.
250+
251+
It is expected that the lang team will be **highly involved** in this discussion.
252+
253+
### Detailed rules
254+
255+
Once we've settled on a high-level path -- and, no doubt, while in the
256+
process of doing so as well -- we can begin to enumerate more detailed
257+
rules. It is also expected that working out the rules may uncover
258+
contradictions or other problems that require revisiting the
259+
high-level design.
260+
261+
### Lints and other checkers
262+
263+
Ideally, the team will also consider whether automated checking for
264+
conformance is possible. It is not a responsibility of this strike
265+
team to produce such automated checking, but automated checking is
266+
naturally a big plus!
267+
268+
## Repository
269+
270+
In general, the memory model discussion will be centered on a specific
271+
repository (perhaps
272+
<https://github.com/nikomatsakis/rust-memory-model>, but perhaps moved
273+
to the rust-lang organization). This allows for multi-faced
274+
discussion: for example, we can open issues on particular questions,
275+
as well as storing the various proposals and litmus tests in their own
276+
directories. We'll work out and document the procedures and
277+
conventions here as we go.
278+
279+
# Drawbacks
280+
[drawbacks]: #drawbacks
281+
282+
The main drawback is that this discussion will require time and energy
283+
which could be spent elsewhere. The justification for spending time on
284+
developing the memory model instead is that it is crucial to enable
285+
the compiler to perform aggressive optimizations. Until now, we've
286+
limited ourselves by and large to conservative optimizations (though
287+
we do supply some LLVM aliasing hints that can be affected by unsafe
288+
code). As the transition to MIR comes to fruition, it is clear that we
289+
will be in a place to perform more aggressive optimization, and hence
290+
the need for rules and guidelines is becoming more acute. We can
291+
continue to adopt a conservative course, but this risks growing an
292+
ever larger body of code dependent on the compiler not performing
293+
aggressive optimization, which may close those doors forever.
294+
295+
# Alternatives
296+
[alternatives]: #alternatives
297+
298+
- Adopt a memory model in one fell swoop:
299+
- considered too complicated
300+
- Defer adopting a memory model for longer:
301+
- considered too risky
302+
303+
# Unresolved questions
304+
[unresolved]: #unresolved-questions
305+
306+
None.

0 commit comments

Comments
 (0)