|
| 1 | +- Feature Name: N/A |
| 2 | +- Start Date: (fill me in with today's date, YYYY-MM-DD) |
| 3 | +- RFC PR: (leave this empty) |
| 4 | +- Rust Issue: (leave this empty) |
| 5 | + |
| 6 | +# Summary |
| 7 | +[summary]: #summary |
| 8 | + |
| 9 | +Incorporate a strike team dedicated to preparing rules and guidelines |
| 10 | +for writing unsafe code in Rust (commonly referred to as Rust's |
| 11 | +"memory model"), in cooperation with the lang team. The discussion |
| 12 | +will generally proceed in phases, starting with establishing |
| 13 | +high-level principles and gradually getting down to the nitty gritty |
| 14 | +details (though some back and forth is expected). The strike team will |
| 15 | +produce various intermediate documents that will be submitted as |
| 16 | +normal RFCs. |
| 17 | + |
| 18 | +# Motivation |
| 19 | +[motivation]: #motivation |
| 20 | + |
| 21 | +Rust's safe type system offers very strong aliasing information that |
| 22 | +promises to be a rich source of compiler optimization. For example, |
| 23 | +in safe code, the compiler can infer that if a function takes two |
| 24 | +`&mut T` parameters, those two parameters must reference disjoint |
| 25 | +areas of memory (this allows optimizations similar to C99's `restrict` |
| 26 | +keyword, except that it is both automatic and fully enforced). The |
| 27 | +compiler also knows that given a shared reference type `&T`, the |
| 28 | +referent is immutable, except for data contained in an `UnsafeCell`. |
| 29 | + |
| 30 | +Unfortunately, there is a fly in the ointment. Unsafe code can easily |
| 31 | +be made to violate these sorts of rules. For example, using unsafe |
| 32 | +code, it is trivial to create two `&mut` references that both refer to |
| 33 | +the same memory (and which are simultaneously usable). In that case, |
| 34 | +if the unsafe code were to (say) return those two points to safe code, |
| 35 | +that would undermine Rust's safety guarantees -- hence it's clear that |
| 36 | +this code would be "incorrect". |
| 37 | + |
| 38 | +But things become more subtle when we just consider what happens |
| 39 | +*within* the abstraction. For example, is unsafe code allowed to use |
| 40 | +two overlapping `&mut` references internally, without returning it to |
| 41 | +the wild? Is it all right to overlap with `*mut`? And so forth. |
| 42 | + |
| 43 | +It is the contention of this RFC that a complete guidelines for unsafe |
| 44 | +code are far too big a topic to be fruitfully addressed in a single |
| 45 | +RFC. Therefore, this RFC proposes the formation of a dedicated |
| 46 | +**strike team** (that is, a temporary, single-purpose team) that will |
| 47 | +work on hammering out the details over time. Precise membership of |
| 48 | +this team is not part of this RFC, but will be determined by the lang |
| 49 | +team as well as the strike team itself. |
| 50 | + |
| 51 | +The unsafe guidelines work will proceed in rough stages, described |
| 52 | +below. An initial goal is to produce a **high-level summary detailing |
| 53 | +the general approach of the guidelines.** Ideally, this summary should |
| 54 | +be sufficient to help guide unsafe authors in best practices that are |
| 55 | +most likely to be forwards compatible. Further work will then expand |
| 56 | +on the model to produce a more **detailed set of rules**, which may in |
| 57 | +turn require revisiting the high-level summary if contradictions are |
| 58 | +uncovered. |
| 59 | + |
| 60 | +This new "unsafe code" strike team is intended to work in |
| 61 | +collaboration with the existing lang team. Ultimately, whatever rules |
| 62 | +are crafted must be adopted with the **general consensus of both the |
| 63 | +strike team and the lang team**. It is expected that lang team members |
| 64 | +will be more involved in the early discussions that govern the overall |
| 65 | +direction and less involved in the fine details. |
| 66 | + |
| 67 | +#### History and recent discussions |
| 68 | + |
| 69 | +The history of optimizing C can be instructive. All code in C is |
| 70 | +effectively unsafe, and so in order to perform optimizations, |
| 71 | +compilers have come to lean heavily on the notion of "undefined |
| 72 | +behavior" as well as various ad-hoc rules about what programs ought |
| 73 | +not to do (see e.g. [these][cl1] [three][cl2] [posts][cl3] entitled |
| 74 | +"What Every C Programmer Should Know About Undefined Behavior", by |
| 75 | +Chris Lattner). This can cause some very surprising behavior (see e.g. |
| 76 | +["What Every Compiler Author Should Know About Programmers"][cap] or |
| 77 | +[this blog post by John Regehr][jr], which is quite humorous). Note that |
| 78 | +Rust has a big advantage over C here, in that only the authors of |
| 79 | +unsafe code should need to worry about these rules. |
| 80 | + |
| 81 | +[cl1]: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html |
| 82 | +[cl2]: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html |
| 83 | +[cl3]: http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_21.html |
| 84 | +[cap]: http://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_2015_submission_29.pdf |
| 85 | +[jr]: http://blog.regehr.org/archives/761 |
| 86 | + |
| 87 | +In terms of Rust itself, there has been a large amount of discussion |
| 88 | +over the years. Here is a (non-comprehensive) set of relevant links, |
| 89 | +with a strong bias towards recent discussion: |
| 90 | + |
| 91 | +- [RFC Issue #1447](https://github.com/rust-lang/rfcs/issues/1447) provides |
| 92 | + a general set of links as well as some discussion. |
| 93 | +- [RFC #1578](https://github.com/rust-lang/rfcs/pull/1578) is an initial |
| 94 | + proposal for a Rust memory model by ubsan. |
| 95 | +- The |
| 96 | + [Tootsie Pop](http://smallcultfollowing.com/babysteps/blog/2016/05/27/the-tootsie-pop-model-for-unsafe-code/) |
| 97 | + blog post by nmatsakis proposed an alternative approach, building on |
| 98 | + [background about unsafe abstractions](http://smallcultfollowing.com/babysteps/blog/2016/05/23/unsafe-abstractions/) |
| 99 | + described in an earlir post. There is also a lot of valuable |
| 100 | + discussion in |
| 101 | + [the corresponding internals thread](http://smallcultfollowing.com/babysteps/blog/2016/05/23/unsafe-abstractions/). |
| 102 | + |
| 103 | +#### Other factors |
| 104 | + |
| 105 | +Another factor that must be considered is the interaction with weak |
| 106 | +memory models. Most of the links above focus purely on sequential |
| 107 | +code: Rust has more-or-less adopted the C++ memory model for governing |
| 108 | +interactions across threads. But there may well be subtle cases that |
| 109 | +arise we delve deeper. For more on the C++ memory model, see |
| 110 | +[Hans Boehm's excellent webpage](http://www.hboehm.info/c++mm/). |
| 111 | + |
| 112 | +# Detailed design |
| 113 | +[design]: #detailed-design |
| 114 | + |
| 115 | +## Scope |
| 116 | + |
| 117 | +Here are some of the issues that should be resolved as part of these |
| 118 | +unsafe code guidelines. The following list is not intended as |
| 119 | +comprehensive (suggestions for additions welcome): |
| 120 | + |
| 121 | +- Legal aliasing rules and patterns of memory accesses |
| 122 | + - e.g., which of the patterns listed in [rust-lang/rust#19733](https://github.com/rust-lang/rust/issues/19733) |
| 123 | + are legal? |
| 124 | + - can unsafe code create (but not use) overlapping `&mut`? |
| 125 | + - under what conditions is it legal to dereference a `*mut T`? |
| 126 | + - when can an `&mut T` legally alias an `*mut T`? |
| 127 | +- Struct layout guarantees |
| 128 | +- Interactions around zero-sized types |
| 129 | + - e.g., what pointer values can legally be considered a `Box<ZST>`? |
| 130 | +- Allocator dependencies |
| 131 | + |
| 132 | +One specific area that we can hopefully "outsource" is detailed rules |
| 133 | +regarding the interaction of different threads. Rust exposes atomics |
| 134 | +that roughly correspond to C++11 atomics, and the intention is that we |
| 135 | +can layer our rules for sequential execution atop those rules for |
| 136 | +parallel execution. |
| 137 | + |
| 138 | +## Time frame |
| 139 | + |
| 140 | +Working out a a set of rules for unsafe code is a detailed process and |
| 141 | +is expected to take months (or longer, depending on the level of |
| 142 | +detail we ultimately aim for). However, the intention is to publish |
| 143 | +preliminary documents as RFCs as we go, so hopefully we can be |
| 144 | +providing ever more specific guidance for unsafe code authors. |
| 145 | + |
| 146 | +Note that even once an initial set of guidelines is adopted, problems |
| 147 | +or inconsistencies may be found. If that happens, the guidelines will |
| 148 | +be adjusted as needed to correct the problem, naturally with an eye |
| 149 | +towards backwards compatibility. In other words, the unsafe |
| 150 | +guidelines, like the rules for Rust language itself, should be |
| 151 | +considered a "living document". |
| 152 | + |
| 153 | +As a note of caution, experience from other languages such as Java or |
| 154 | +C++ suggests that the work on memory models can take years. Moreover, |
| 155 | +even once a memory model is adopted, it can be unclear whether |
| 156 | +[common compiler optimizations are actually permitted](http://www.di.ens.fr/~zappa/readings/c11comp.pdf) |
| 157 | +under the model. The hope is that by focusing on sequential and |
| 158 | +Rust-specific issues we can sidestep some of these quandries. |
| 159 | + |
| 160 | +## Intermediate documents |
| 161 | + |
| 162 | +Because hammering out the finer points of the memory model is expected |
| 163 | +to possibly take some time, it is important to produce intermediate |
| 164 | +agreements. This section describes some of the documents that may be |
| 165 | +useful. These also serve as a rough guideline to the overall "phases" |
| 166 | +of discussion that are expected, though in practice discussion will |
| 167 | +likely go back and forth: |
| 168 | + |
| 169 | +- **Key examples and optimizations**: highlighting code examples that |
| 170 | + ought to work, or optimizations we should be able to do, as well as |
| 171 | + some that will not work, or those whose outcome is in doubt. |
| 172 | +- **High-level design**: describe the rules at a high-level. This |
| 173 | + would likely be the document that unsafe code authors would read to |
| 174 | + know if their code is correct in the majority of scenarios. Think of |
| 175 | + this as the "user's guide". |
| 176 | +- **Detailed rules**: More comprehensive rules. Think of this as the |
| 177 | + "reference manual". |
| 178 | + |
| 179 | +Note that both the "high-level design" and "detailed rules", once |
| 180 | +considered complete, will be submitted as RFCs and undergo the usual |
| 181 | +final comment period. |
| 182 | + |
| 183 | +### Key examples and optimizations |
| 184 | + |
| 185 | +Probably a good first step is to agree on some key examples and |
| 186 | +overall principles. Examples would fall into several categories: |
| 187 | + |
| 188 | +- Unsafe code that we feel **must** be considered **legal** by any model |
| 189 | +- Unsafe code that we feel **must** be considered **illegal** by any model |
| 190 | +- Unsafe code that we feel **may or may not** be considered legal |
| 191 | +- Optimizations that we **must** be able to perform |
| 192 | +- Optimizations that we **should not** expect to be able to perform |
| 193 | +- Optimizations that it would be nice to have, but which may be sacrificed |
| 194 | + if needed |
| 195 | + |
| 196 | +Having such guiding examples naturally helps to steer the effort, but |
| 197 | +it also helps to provide guidance for unsafe code authors in the |
| 198 | +meantime. These examples illustrate patterns that one can adopt with |
| 199 | +reasonable confidence. |
| 200 | + |
| 201 | +Deciding about these examples should also help in enumerating the |
| 202 | +guiding principles we would like to adhere to. The design of a memory |
| 203 | +model ultimately requires balancing several competing factors and it |
| 204 | +may be useful to state our expectations up front on how these will be |
| 205 | +weighed: |
| 206 | + |
| 207 | +- **Optimization.** The stricter the rules, the more we can optimize. |
| 208 | + - on the other hand, rules that are overly strict may prevent people |
| 209 | + from writing unsafe code that they would like to write, ultimately |
| 210 | + leading to slower exeution. |
| 211 | +- **Comprehensibility.** It is important to strive for rules that end |
| 212 | + users can readily understand. If learning the rules requires diving |
| 213 | + into academic papers or using Coq, it's a non-starter. |
| 214 | +- **Effect on existing code.** No matter what model we adopt, existing |
| 215 | + unsafe code may or may not comply. If we then proceed to optimize, |
| 216 | + this could cause running code to stop working. While |
| 217 | + [RFC 1122](https://github.com/rust-lang/rfcs/blob/master/text/1122-language-semver.md) |
| 218 | + explicitly specified that the rules for unsafe code may change, we |
| 219 | + will have to decide where to draw the line in terms of how much to |
| 220 | + weight backwards compatibility. |
| 221 | + |
| 222 | +It is expected that the lang team will be **highly involved** in this discussion. |
| 223 | + |
| 224 | +It is also expected that we will gather examples in the following ways: |
| 225 | + |
| 226 | +- survey existing unsafe code; |
| 227 | +- solicit suggestions of patterns from the Rust-using public: |
| 228 | + - scenarios where they would like an official judgement; |
| 229 | + - interesting questions involving the standard library. |
| 230 | + |
| 231 | +### High-level design |
| 232 | + |
| 233 | +The next document to produce is to settle on a high-level |
| 234 | +design. There have already been several approaches floated. This phase |
| 235 | +should build on the examples from before, in that proposals can be |
| 236 | +weighed against their effect on the examples and optimizations. |
| 237 | + |
| 238 | +There will likely also be some feedback between this phase and the |
| 239 | +previosu: as new proposals are considered, that may generate new |
| 240 | +examples that were not relevant previously. |
| 241 | + |
| 242 | +Note that even once a high-level design is adopted, it will be |
| 243 | +considered "tentative" and "unstable" until the detailed rules have |
| 244 | +been worked out to a reasonable level of confidence. |
| 245 | + |
| 246 | +Once a high-level design is adopted, it may also be used by the |
| 247 | +compiler team to inform which optimizations are legal or illegal. |
| 248 | +However, if changes are later made, the compiler will naturally have |
| 249 | +to be adjusted to match. |
| 250 | + |
| 251 | +It is expected that the lang team will be **highly involved** in this discussion. |
| 252 | + |
| 253 | +### Detailed rules |
| 254 | + |
| 255 | +Once we've settled on a high-level path -- and, no doubt, while in the |
| 256 | +process of doing so as well -- we can begin to enumerate more detailed |
| 257 | +rules. It is also expected that working out the rules may uncover |
| 258 | +contradictions or other problems that require revisiting the |
| 259 | +high-level design. |
| 260 | + |
| 261 | +### Lints and other checkers |
| 262 | + |
| 263 | +Ideally, the team will also consider whether automated checking for |
| 264 | +conformance is possible. It is not a responsibility of this strike |
| 265 | +team to produce such automated checking, but automated checking is |
| 266 | +naturally a big plus! |
| 267 | + |
| 268 | +## Repository |
| 269 | + |
| 270 | +In general, the memory model discussion will be centered on a specific |
| 271 | +repository (perhaps |
| 272 | +<https://github.com/nikomatsakis/rust-memory-model>, but perhaps moved |
| 273 | +to the rust-lang organization). This allows for multi-faced |
| 274 | +discussion: for example, we can open issues on particular questions, |
| 275 | +as well as storing the various proposals and litmus tests in their own |
| 276 | +directories. We'll work out and document the procedures and |
| 277 | +conventions here as we go. |
| 278 | + |
| 279 | +# Drawbacks |
| 280 | +[drawbacks]: #drawbacks |
| 281 | + |
| 282 | +The main drawback is that this discussion will require time and energy |
| 283 | +which could be spent elsewhere. The justification for spending time on |
| 284 | +developing the memory model instead is that it is crucial to enable |
| 285 | +the compiler to perform aggressive optimizations. Until now, we've |
| 286 | +limited ourselves by and large to conservative optimizations (though |
| 287 | +we do supply some LLVM aliasing hints that can be affected by unsafe |
| 288 | +code). As the transition to MIR comes to fruition, it is clear that we |
| 289 | +will be in a place to perform more aggressive optimization, and hence |
| 290 | +the need for rules and guidelines is becoming more acute. We can |
| 291 | +continue to adopt a conservative course, but this risks growing an |
| 292 | +ever larger body of code dependent on the compiler not performing |
| 293 | +aggressive optimization, which may close those doors forever. |
| 294 | + |
| 295 | +# Alternatives |
| 296 | +[alternatives]: #alternatives |
| 297 | + |
| 298 | +- Adopt a memory model in one fell swoop: |
| 299 | + - considered too complicated |
| 300 | +- Defer adopting a memory model for longer: |
| 301 | + - considered too risky |
| 302 | + |
| 303 | +# Unresolved questions |
| 304 | +[unresolved]: #unresolved-questions |
| 305 | + |
| 306 | +None. |
0 commit comments