Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking PyReadonlyArray #258

Closed
adamreichold opened this issue Jan 13, 2022 · 21 comments · Fixed by #274
Closed

Breaking PyReadonlyArray #258

adamreichold opened this issue Jan 13, 2022 · 21 comments · Fixed by #274

Comments

@adamreichold
Copy link
Member

adamreichold commented Jan 13, 2022

I am sorry for repeatedly opening soundness issues, but I fear that using NumPy's writeable is insufficient to protect safe code from mutable aliasing.

At least it seems incompatible with a safe as_cell_slice as demonstrated by the following test failing

#[test]
fn alias_readonly_cell_slice() {
    Python::with_gil(|py| {
        let array = PyArray::<i32, _>::zeros(py, (2, 3), false);

        let slice = array.as_cell_slice().unwrap();

        let array = array.readonly();
        let value = array.get((0, 0)).unwrap();

        assert_eq!(*value, 0);
        slice[0].set(1);
        assert_eq!(*value, 0);
    });
}

which implies that as_cell_slice cannot be safe. (I am not sure if it adds anything over as_array(_mut) at all viewed this way. I think a safe method that yields an ndarray::RawArraView might be preferable to handle situations involving aliasing.)

Another issue I see is that Python does not need to leave the writeable flag alone, i.e. the test

#[test]
fn alias_readonly_python() {
    use pyo3::py_run;

    Python::with_gil(|py| {
        let array = PyArray::<i32, _>::zeros(py, (2, 3), false);

        let array = array.readonly();
        let value = array.get((0, 0)).unwrap();

        assert_eq!(*value, 0);
        py_run!(py, array, "array.flags.writeable = True\narray[(0,0)] = 1");
        assert_eq!(*value, 0);
    });
}

also fails, but I have to admit that this might be a case of "you're holding it wrong" as I am not sure what guarantees we can expect from the Python code? (After all PyArray<A,D>: !Send which Python code does not need to respect either even just trying to access an array and thereby checking its flags from another thread might race with us modifying the flags without atomics.)

@adamreichold
Copy link
Member Author

I wonder whether we should use PyAny::get_refcnt to allow safe mutable access if the reference count is one, similar to Rc::get_mut. But I am not sure whether common patterns of being called by Python code actually enable this or whether the reference count would be almost always elevated above one?

@adamreichold
Copy link
Member Author

But I am not sure whether common patterns of being called by Python code actually enable this or whether the reference count would be almost always elevated above one?

Trying this out using our examples here, it looks like even simple code like

x = np.array([
    [1, 0],
    [0, 2],
], dtype=np.float64)
assert sys.getrefcount(x) == 1

will fail with an elevated reference count of 2 instead of 1. (Probably for passing the argument to getrefcount by value?)

@aldanor
Copy link
Contributor

aldanor commented Jan 13, 2022

In this example:

py_run!(py, array, "array.flags.writeable = True\narray[(0,0)] = 1");

What is the array.flags.owndata? If it's True, Python can modify it as much as it wants, toggle it on and off, because it's the master view. However (!) if you were to set it to False in Rust and then return a view with owndata=False, writeable=False, Python wouldn't be able to touch it and turn it back on.

@aldanor
Copy link
Contributor

aldanor commented Jan 13, 2022

On a side note, skimming through array.rs, I don't see any mentions of owndata, nor is there a .view() method... perhaps we need one as readonly stuff doesn't make much sense without it.

@adamreichold
Copy link
Member Author

adamreichold commented Jan 14, 2022

On a side note, skimming through array.rs, I don't see any mentions of owndata, nor is there a .view() method... perhaps we need one as readonly stuff doesn't make much sense without it.

I think the reason is that it does not really matter for the aliasing problem as you can have multiple pointers to the same "master view" on the Python heap. All of those see owndata == True yet we are none the wiser whether other code will be modifying the elements through this "master view" while we are holding on to our PyArray<A, D> pointing to the same "master view".

What is the array.flags.owndata?

It is True. One could also have the test case with Python creating the array and passing it to us. Writing a Rust unit test was just easier to do. As written above, I think the aliasing problem begins with the pointer to the array itself, not with its data pointer aliasing with another array.

However (!) if you were to set it to False in Rust and then return a view with owndata=False, writeable=False, Python wouldn't be able to touch it and turn it back on.

This can only be done by setting flags.writeable := False by the master view whose flags.owndata == True - e.g. the following example would throw an exception:

Therefore I think that owndata in general does not help us with the fundamental aliasing problem of reference counted Python objects. More importantly when arrays are passed into Rust code than when Rust code produces them. As written in #258 (comment), even passing an array to a function will elevate its reference count to at least two.

I fear we might need to bite the bullet and make all functions which produce &A or &mut A (or ndarray views containing those) unsafe as we cannot reasonably control for aliasing. Or we take the point of view of e.g. cxx.rs and say that unsafe should not be used to lint against other languages standard practice and make all of them safe (assuming PyArray<A,D> were a normal Rust type) but add strongly worded documentation that Python and Rust have some rather different notions on aliasing.

Or maybe we could kindly ask NumPy to wrap all of their arrays in PyCell for runtime borrow checking... ;-)

@adamreichold
Copy link
Member Author

I fear we might need to bite the bullet and make all functions which produce &A or &mut A (or ndarray views containing those) unsafe as we cannot reasonably control for aliasing. Or we take the point of view of e.g. cxx.rs and say that unsafe should not be used to lint against other languages standard practice and make all of them safe (assuming PyArray<A,D> were a normal Rust type) but add strongly worded documentation that Python and Rust have some rather different notions on aliasing.

@kngwyu @davidhewitt Do PyO3 or rust-numpy have an already formulated position on the "unsafe-for-all-FFI-or-not" debate?

@adamreichold
Copy link
Member Author

One thing I just understood is why this does not affect the other wrappers for native types like PyList: Those have operations like PyList::get_item which derive a pointer into the Python heap from another one, while we here are deriving Rust references into the "interior" of an object on the Python heap.

@davidhewitt
Copy link
Member

Sorry it's taken me a few days to have a chance to sit down and read this! Comments below...

I am sorry for repeatedly opening soundness issues, but I fear that using NumPy's writeable is insufficient to protect safe code from mutable aliasing.

Please don't be sorry for this! I think soundness issues are critical and we should be striving for an API that is both safe and sound!

which implies that as_cell_slice cannot be safe.

Reading through your example, I can understand this viewpoint, however I was to argue a different angle. I claim there are two issues at play:

  • we have a logical bug (not soundness) that as_cell_slice allows writing to a readonly array
  • we treat readonly array as safe to treat as a Rust slice, however at any point the readonly flag can be changed and the underlying data written to. This is a soundness issue because it can lead to incorrect optimizations. (This is sort of what the write from as_cell_slice is emulating - you can imagine it momentarily makes the array writable, does a write, and then makes it readonly again.)

I am reading the definition of the WRITEABLE flag from https://numpy.org/doc/stable/reference/generated/numpy.ndarray.flags.html:

The data area can be written to. Setting this to False locks the data, making it read-only. A view (slice, etc.) inherits WRITEABLE from its base array at creation time, but a view of a writeable array may be subsequently locked while the base array remains writeable. (The opposite is not true, in that a view of a locked array may not be made writeable. However, currently, locking a base object does not lock any views that already reference it, so under that circumstance it is possible to alter the contents of a locked array via a previously created writeable view onto it.) Attempting to change a non-writeable array raises a RuntimeError exception.

It sounds to me like:

  • if we have a readonly view, it may be viewing writable memory. Already in this case, readonly is definitely not enough to safely make a reference.
  • if we have a readonly array, then at any time the array can be made writable again, this breaking the safety invariant.

Perhaps we need to reconsider whether the readonly abstraction is viable?

@davidhewitt
Copy link
Member

davidhewitt commented Jan 16, 2022

Do PyO3 or rust-numpy have an already formulated position on the "unsafe-for-all-FFI-or-not" debate?

I'm a bit unsure exactly what is being asked here. My perspective is that if an API makes it possible for UB to occur (e.g. by an incorrect optimization because invariants were broken), it must be unsafe.

In cases where I'm not certain if an API is sound, I prefer to mark it unsafe.

@adamreichold
Copy link
Member Author

adamreichold commented Jan 16, 2022

we have a logical bug (not soundness) that as_cell_slice allows writing to a readonly array

Does this refer to as_cell_slice not checking the WRITEABLE flag? I think the problem why that would still be insufficient is that I can also first call as_cell_slice and then use readonly but readonly has no way to invalidate the previous cell slice view as both take &self.

I think we should therefore drop the method and expose raw pointer based array views instead which would be more versatile and more widely applicable.

Perhaps we need to reconsider whether the readonly abstraction is viable?

Indeed, I think we cannot depend on the WRITEABLE flag to protect us in the sense of making shared Rust references sound. One could still argue that this might help catch accidental aliasing, but more as a debugging aid than something to build language-level guarantees on top of.

In my personal opinion, I would prefer dropping it to simplify the API and avoid creating a false sense of safety.

I'm a bit unsure exactly what is being asked here.

I am referring to the community-wide discussions started by the cxx crate providing safe bindings which call into C++ code under the assumption that this C++ code will uphold invariants assumed by the bindings (even though the C++ compiler does not check that). I think this approach was further explained by the Chrome team via https://www.chromium.org/Home/chromium-security/memory-safety/rust-and-c-interoperability/

I think under this approach, one could for example say that if calling a Python function under a given aliasing situation was already wrong, than this isn't changed by that function being implemented in Rust. For example, mutable sharing of NumPy arrays between threads is already UB in a pure-Python setting and a Rust function called by Python can therefore assume that this does not happen. From this point of view, as_array(_mut) could be safe as any issue are already present on the Python side of the fence.

On the other hand, our requirements for creating shared or exclusive references basically do not exist for Python in which case both as_array and as_array_mut must be unsafe and it becomes explicitly part of the calling convention of our Python function that no invalid aliasing of the array data is allowed.

Personally, I can agree with both perspectives. Especially as I would like rust-numpy to enable scientific users to write fast and safe code. I think having as_array(_mut) unsafe is detrimental to this as explaining the detailed pre-conditions can difficult or at least involved. On the other hand, having scientific results invalidated due to current or future miscompilations is a huge issue and from a language lawyering perspective the unsoundness is definite.

(One more technical issue that enters here is that even a safe as_array_mut in the sense that calling Python code must do its part would need to take &mut self to make the Rust implementation itself sound. But I don't think there is way to have a Python function take &mut PyArray<A,D> at the moment as those do not come pre-wrapped in PyCell?)

@adamreichold
Copy link
Member Author

(Another design point might be to never create ArrayView<T, D> but only something like ArrayView<MathCell<T>, D>, but I think this would make to much numerical code inaccessible or prohibitively slow.)

@adamreichold
Copy link
Member Author

as those do not come pre-wrapped in PyCell

Having referred to the lack of PyCell now twice, one a little bit of a grandiose idea would be to have PyCell, but stored out of line similar to how parking_lot stores mutexes in a global table using the address of the NumPy array as the key. This way we could at least have PyCell style borrow checking in the signatures of our Python functions and any remaining aliasing issues so that any remaining aliasing issues would be rooted already on Python side.

If am going a bit crazy with this, we could even dynamically borrow check views by following their base objects and testing whether they overlap. Of course, the cost might be prohibitive and we would have to over-approximate when we do not understand a base object type. But I think this might allow be a safe and sound API to access NumPy arrays as reference-based array views.

@adamreichold
Copy link
Member Author

Scratch at least that second paragraph: This would not improve things as only Rust code would participate in the checking but Python could still produce overlapping views that the Rust never sees. But then again, could this could which would necessarily be invisible to the Rust compiler still lead to miscompilation? Cross-language LTO of an extension containing Rust and e.g. C++ code?

@kngwyu
Copy link
Member

kngwyu commented Jan 17, 2022

Thank you for your very interesting discussion (and sorry for my slow response... I sprained a finger and will be not active for a while 😅

I still don't know which way to go, but let me note some ideas. You nicely summarized two approaches to safety:

  1. Assume some safety rules hold in Python land (so that as_array_mut can be safe)
  2. Assume that arrays have shared aliases (in this case, as_array_mut is unsafe)

And considering that shared mutable references are pretty much everywhere in Python codes, I'm inclined to the latter approach. However, as a lazy programmer, I don't want to overwhelm users with many unsafe. In the case of Python extension written in Rust, I think most operations are done temporally and not actually very unsafe. For example, a typical extension takes arrays and returns the result of heavy computation. I wanted to express such a 'this operation is safe since it's temporal' using a temporal object (PyReadonlyArray), but this attempt failed as you pointed out 😅

Detailed comments:

In my personal opinion, I would prefer dropping it to simplify the API and avoid creating a false sense of safety.

What do you think of as the alternative of .readonly(),as_array()? I want to stick temporal object or scoped array approach (e.g., array.readonly(|arrayview| ...)) if they are possible, but of course, it's OK to drop it and just provide unsafe API if we don't have a nice way.

I think we should therefore drop the method and expose raw pointer based array views instead which would be more versatile and more widely applicable.

Agreed.

at the moment as those do not come pre-wrapped in PyCell?

This is a very long-term problem and should be discussed on PyO3...

@adamreichold
Copy link
Member Author

I sprained a finger and will be not active for a while sweat_smile

Oh, I hope your recuperation goes well!

What do you think of as the alternative of .readonly(),as_array()? I want to stick temporal object or scoped array approach (e.g., array.readonly(|arrayview| ...)) if they are possible, but of course, it's OK to drop it and just provide unsafe API if we don't have a nice way.

Actually, I was thinking of going for

This is a very long-term problem and should be discussed on PyO3...

as I think we can implement a prototype here which could maybe later be folded into PyO3 if it turns out to be generally useful. My current thinking would be to have safe methods

fn as_array<'a>(&'a self) -> PyArrayRef<'a, A, D>;
fn as_array_mut<'a>(&'a self) -> PyArrayRefMut<'a, A, D>;

where

PyArrayRef<'a, A, D>: Deref<Target=ArrayView<'a, A, D>>
PyArrayRef<'a, A, D>: DerefMut<Target=ArrayViewMut<'a, A, D>>

The methods would be safe as we would enforce borrowing check using borrow flags stored like

thread_local! {
    BORROW_FLAGS: RefCell<HashMap<usize, Box<Cell<isize>>> = Default::default();
}

with the key being the address of the NumPy array on the Python heap.

This looks rather inefficient at a first glance, but I think modifying the NumPy flags isn't free either. And this would also detect errors like calling

#[pyfunction]
fn binop(x: &PyArray1<f64>, x: &PyArray1<f64>, z: &PyArray1<f64>) {
  let x = x.readonly().as_array();
  let y = y.readonly().as_array();
  let z = unsafe { z.as_array_mut() };
  ...
}

as

binop(x, y, x)

Of course, this would not protect the Rust code from all errors in the Python code or errors in other Python extensions. That would be taking a stance similar to cxx.rs: You need to verify all your unsafe code - i.e. unsafe Rust and e.g. Python, C++, Fortran, etc. - but if that is safe, so is your safe Rust code.

(As written above, this could eventually be extended to not just check the identity of the NumPy arrays but traverse the base object chain and also check for overlapping views thereby catching more errors on the Python side of the fence.)

@adamreichold
Copy link
Member Author

Of course, this would not protect the Rust code from all errors in the Python code or errors in other Python extensions. That would be taking a stance similar to cxx.rs: You need to verify all your unsafe code - i.e. unsafe Rust and e.g. Python, C++, Fortran, etc. - but if that is safe, so is your safe Rust code.

Note that AFAIU, we are already taking this stance w.r.t. thread safety: While PyArray: !Send and hence non-atomically modifying the NumPy flags is reasonable from within the Rust code, Python code does not care and could have sent references to other threads racing with out flag modifications.

@adamreichold
Copy link
Member Author

Another thing we should probably add is considering the WRITEABLE flag ourselves, e.g. before handing out ArrayViewMut.

@aldanor
Copy link
Contributor

aldanor commented Jan 19, 2022

@adamreichold By the way - you're probably aware, but just in case:

Writeable flag of C-API wrapped arrays

When an array is created from the C-API to wrap a pointer to data, the only indication we have of the read-write nature of the data is the writeable flag set during creation. It is dangerous to force the flag to writeable. In the future it will not be possible to switch the writeable flag to True from python. This deprecation should not affect many users since arrays created in such a manner are very rare in practice and only available through the NumPy C-API.

This was changed in 1.17.0.

@aldanor
Copy link
Contributor

aldanor commented Jan 19, 2022

This ^ again, I think (and if I understand correctly), draws a line between

  • We received some stuff from Python, they say it's readonly but we don't trust them
  • We created readonly stuff ourselves, marked it as such and we can trust that it's readonly

@adamreichold
Copy link
Member Author

By the way - you're probably aware, but just in case:

We recently change things so that we make data that is owned via PySliceContainer as writeable to handle this.

We created readonly stuff ourselves, marked it as such and we can trust that it's readonly

As in the previous discussion, this breaks as soon as we hand that PyArray to any Python function, e.g. a callback. So we would need a type for "a PyArray that was never seen by Python code" but that role is already served by ndarray::Array so I don't think it helps to try to encode this in the PyArray-API.

@adamreichold
Copy link
Member Author

We received some stuff from Python, they say it's readonly but we don't trust them

Personally, I also advocate for basically the opposite position: Trust Python code (like we do unsafe Rust code) but make sure that safe Rust code upholds borrowing discipline by dynamic borrow checking methods like as_array_mut using borrow flags stored out of line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants