-
Notifications
You must be signed in to change notification settings - Fork 120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Perform borrow checking through a capsule-based API #361
Conversation
74b7c1b
to
6fbbec1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this makes sense, though am slightly cautious. I see the benefit and the global borrow checking is a clear improvement.
Will there need to be a way for the capsule contents to be "upgraded" at runtime if a too old version is present? Not really sure what functionality might be broken if a newer rust-numpy has to use an older capsule version. (Which might be unpredictable due to initialization order being deterministic but hard to predict, I think.)
Also, is it a safety/security concern if arbitrary code could spoof the capsule before this loads?
src/borrow/shared.rs
Outdated
let capsule = PyCapsule::new_with_destructor( | ||
py, | ||
shared, | ||
Some(CString::new("_BORROW_CHECKING_API").unwrap()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be _RUST_NUMPY_BORROW_CHECKING_API
to make it clear where it originates from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered both options. I think including RUST_NUMPY
in the name is the safe choice as this limits the intent of this API to extensions using rust-numpy. The generic name would cater more general aspirations of getting other languages to participate in the borrow checking along the lines @alex envisions in his blog post.
Personally, I am very much open to advice here and am fine with including RUST_NUMPY
in the name. Especially as those more general aspirations are nothing that has been discussed with larger the community of potential users beforehand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a separate commit changing this which can keep or drop after further discussion.
src/borrow/shared.rs
Outdated
fn insert_shared(py: Python) -> PyResult<*const Shared> { | ||
let module = get_array_module(py)?; | ||
|
||
let capsule: &PyCapsule = match module.getattr("_BORROW_CHECKING_API") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly can use PyCapsule::import
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried, but I don't think it is too helpful here: This code path is only taken if static Shared: SharedPtr;
is still null, i.e. this extension tries to access the capsule for the first time, meaning that this is always a slow path.
In this case, it is also highly likely that the capsule is not present at all which means I do need the PyModule
to call setattr
on it.
Additionally, I cannot reasonably determine T
prior to the import as I need to check the version
field first. I could theoretically get a reference to u64
to just access the version and then cast that reference into a pointer and then a pointer the relevant struct and so on, but that feels more complicated than helpful as well.
At the moment, I would not think this reasonably possible as the existing state cannot be safely transformed by an extension which wishes to upgrade the API if just due to ABI concerns as both extensions might have been built using different Rust versions. (Or there might eventually be implementations not written in Rust at all having a completely incompatible internal representation behind the However, since changes to this API should be additive with older parts of it kept working (by just ignoring the additional fields), I suspect that we might be able to degrade gracefully if a newer rust-numpy encounters an older API version, i.e. functionality which worked before continues to work and newer functionality fails with a hopefully clear enough error message (by failing functions needing fields not yet present). I think the main problem is that I am not confident whether even the promise of backwards compatibility can be kept since this is relatively new and we have yet to get any external feedback on the borrow checking we released with rust-numpy 0.17.0. IMHO, this is a good argument for outright waiting with the capsule API for a few rust-numpy releases to gather more insight whether at least the functionality we have now can be expected to stay in place for a reasonable amount of time. |
6fbbec1
to
991ffa7
Compare
I think this is certainly a safety concern, but I am not sure it can be helped. It is always possible to subvert Rust's type system by malicious or misguided action. Similarly, the borrow checking itself is not a sandbox, it just prevents programmers from making genuine mistakes. More specially, loading e.g. a no-op borrow checker API would be similar to preloading libeatmydata to turn |
…by all extensions.
This places the global state required for dynamic borrow checking of NumPy arrays into a capsule exposed as the numpy.core.multiarray._BORROW_CHECKING_API attribute. This has two benefits: First, all extensions built using rust-numpy will share this global state and hence cooperate in borrow checking. Second, in the future, other libraries that want to borrow check their access to NumPy array can cooperate using this C-compatible API. This does not checking the borrow checking implementation at all, but instead of accessing Rust static data, all accesses are funneled through the above-mentioned capsule API with access to it being cached for performance reasons as we already do for the _ARRAY_API and _UFUNC_API capsules. This means that eventually, the implementation of the borrow checking API could be different from the one the current extension would provide with the adaptors to the C API taking of any differences in e.g. the definition of `BorrowKey`. For now, they will of course usually be the same and this change just adds a huge amount of boiler plate to go from Rust to C-compatible back to Rust.
… misunderstands on its scope.
991ffa7
to
b877d96
Compare
This improves the changes of actually keeping it backwards compatible while precluding the reuse of base address and borrow key for cloning and dropping.
I added another commit which makes the shared API much narrower, taking only |
Since the capsule API looks as minimal as it will get, I ran the bechmarks:
All of them being regressions are expected as there is the added cost of crossing the FFI boundary and using dynamic dispatch via the capsule. Finally, at less than 150 ns/iter for all three operations, I think the improved prospects for backwards compatibility and future improvements of the internal data structures used to perform the borrow checking which the minimal API provides are worth it. Most applications will likely spent much more time operating on the resulting array views themselves and if not, there is always the unsafe escape hatch that bypasses the dynamic borrow checking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the long delay in revisiting this. This looks good to me to merge!
The benefit this allows multiple extensions using rust-numpy
to work together is definitely worth the slight additional overhead and compatibility constraints.
I agree, the cross extension borrow checking alone makes this worth it IMHO. @alex Any thoughts on the cross language borrow checking discussed above? @kngwyu Any concerns before I merge this which imples that it will be part of the 0.18 release thereby committing us to backwards compatibility? |
Unfortunately not really, I haven't spent that much time thinking about what it would look like for extensions to provide this functionality themselves, most of my thoughts have been about what'd it mean to have this for buffers in general. That said, I think the hardest question is: how (if at all) do you handle objects that represents sub-slices (views) of memory owned by some other object? |
Our existing dynamic borrow checking tries to handle this by always traversing the chain of NumPy views via the (A complete solution of the Diophantine equation governing aliasing between views is possible, but more involved and slower and we released the current version to solicit feedback whether the approximate checking is a problem for anyone.) |
5ff2cf9
to
f52f4a6
Compare
This places the global state required for dynamic borrow checking of NumPy arrays into a capsule exposed as the numpy.core.multiarray._BORROW_CHECKING_API attribute.
This has two benefits: First, all extensions built using rust-numpy will share this global state and hence cooperate in borrow checking. Second, in the future,
other libraries that want to borrow check their access to NumPy array can cooperate using this C-compatible API.
This does not checking the borrow checking implementation at all, but instead of accessing Rust static data, all accesses are funneled through the above-mentioned capsule API with access to it being cached for performance reasons as we already do for the _ARRAY_API and _UFUNC_API capsules.
This means that eventually, the implementation of the borrow checking API could be different from the one the current extension would provide with the adaptors to the C API taking of any differences in e.g. the definition of
BorrowKey
. For now, they will of course usually be the same and this change just adds a huge amount of boiler plate to go from Rust to C-compatible back to Rust.Closes #359
cc @alex This is not the buffer-oriented protocol change you proposed in "Buffers on the edge", but may be of interested anyway as it is related if limited to NumPy arrays.