Skip to content

Commit 1d2b4a5

Browse files
ARROW-10109: [Rust] Add support to the C data interface for primitive types and utf8
This PR is a proposal to add support to the [C data interface](https://arrow.apache.org/docs/format/CDataInterface.html) by implementing the necessary functionality to both consume and produce structs with its ABI and lifetime rules. This is for now limited to primitive types and strings (utf8), but it is easily generalized for all types whose data is encapsulated in `ArrayData` (things with buffers and child data). Some design choices: * import and export does not care about the type of the data that is in memory (previously `BufferData`, now `Bytes`) - it only cares about how they should be converted from and to `ArrayData` to the C data interface. * import wraps incoming pointers on a struct behind an `Arc`, so that we thread-safely refcount them and can share them between buffers, arrays, etc. * `export` places `Buffer`s in `private_data` for bookkeeping and release them when the consumer releases it via `release`. I do not expect this PR to be easy to review, as it is touching sensitive (aka `unsafe`) code. However, based on the tests I did so far, I am sufficiently happy to PR it. This PR has three main parts: 1. Addition of an `ffi` module that contains the import and export functionality 2. Add some helpers to import and export an Array from C Data Interface 3. A crate to test this against Python/C++'s API It also does a small refactor of `BufferData`, renaming it to `Bytes` (motivated by the popular `bytes` crate), and moving it to a separate file. What is tested: * round-trip `Python -> Rust -> Python` (new separate crate, `arrow-c-integration`) * round-trip `Rust -> Python -> Rust` (new separate crate, `arrow-c-integration`) * round-trip `Rust -> Rust -> Rust` * memory allocation counts Finally, this PR has a large contribution of @pitrou , that took _a lot_ of his time to explain to me how the C++ was doing it and the main things that I had to worry about here. Closes #8401 from jorgecarleitao/arrow-c-inte Authored-by: Jorge C. Leitao <[email protected]> Signed-off-by: Jorge C. Leitao <[email protected]>
1 parent 63144ad commit 1d2b4a5

24 files changed

+1531
-130
lines changed

.github/workflows/rust_cron.yml

+23
Original file line numberDiff line numberDiff line change
@@ -54,3 +54,26 @@ jobs:
5454
continue-on-error: true
5555
shell: bash
5656
run: bash <(curl -s https://codecov.io/bash)
57+
58+
pyarrow-integration:
59+
name: AMD64 Debian 10 Rust ${{ matrix.rust }} Pyarrow integration
60+
runs-on: ubuntu-latest
61+
if: ${{ !contains(github.event.pull_request.title, 'WIP') && github.repository == 'apache/arrow' }}
62+
strategy:
63+
fail-fast: false
64+
matrix:
65+
rust: [nightly-2020-11-19]
66+
env:
67+
RUST: ${{ matrix.rust }}
68+
steps:
69+
- name: Checkout Arrow
70+
uses: actions/checkout@v2
71+
with:
72+
fetch-depth: 0
73+
- name: Fetch Submodules and Tags
74+
run: ci/scripts/util_checkout.sh
75+
- name: Run test
76+
shell: bash
77+
run: |
78+
echo ${RUST} > rust/rust-toolchain &&
79+
ci/scripts/rust_pyarrow_integration.sh `pwd` `pwd`/build $RUST

ci/docker/debian-10-rust.dockerfile

+3-1
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,8 @@ ENV CARGO_HOME="/rust/cargo" \
5353
# compiled dependencies. Create the directories and place an empty lib.rs
5454
# files.
5555
COPY rust /arrow/rust
56-
RUN mkdir \
56+
RUN mkdir -p \
57+
/arrow/rust/arrow-pyarrow-integration-testing/src \
5758
/arrow/rust/arrow-flight/src \
5859
/arrow/rust/arrow/src \
5960
/arrow/rust/benchmarks/src \
@@ -63,6 +64,7 @@ RUN mkdir \
6364
/arrow/rust/parquet_derive/src \
6465
/arrow/rust/parquet_derive_test/src && \
6566
touch \
67+
/arrow/rust/arrow-pyarrow-integration-testing/src/lib.rs \
6668
/arrow/rust/arrow-flight/src/lib.rs \
6769
/arrow/rust/arrow/src/lib.rs \
6870
/arrow/rust/benchmarks/src/lib.rs \
+42
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
#!/usr/bin/env bash
2+
#
3+
# Licensed to the Apache Software Foundation (ASF) under one
4+
# or more contributor license agreements. See the NOTICE file
5+
# distributed with this work for additional information
6+
# regarding copyright ownership. The ASF licenses this file
7+
# to you under the Apache License, Version 2.0 (the
8+
# "License"); you may not use this file except in compliance
9+
# with the License. You may obtain a copy of the License at
10+
#
11+
# http://www.apache.org/licenses/LICENSE-2.0
12+
#
13+
# Unless required by applicable law or agreed to in writing,
14+
# software distributed under the License is distributed on an
15+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
16+
# KIND, either express or implied. See the License for the
17+
# specific language governing permissions and limitations
18+
# under the License.
19+
20+
set -ex
21+
22+
arrow_dir=${1}
23+
source_dir=${1}/rust
24+
build_dir=${2}/rust
25+
rust=${3}
26+
27+
export ARROW_TEST_DATA=${arrow_dir}/testing/data
28+
export PARQUET_TEST_DATA=${arrow_dir}/cpp/submodules/parquet-testing/data
29+
export CARGO_TARGET_DIR=${build_dir}
30+
31+
pushd ${source_dir}/arrow-pyarrow-integration-testing
32+
33+
#rustup default ${rust}
34+
#rustup component add rustfmt --toolchain ${rust}-x86_64-unknown-linux-gnu
35+
python3 -m venv venv
36+
venv/bin/pip install maturin==0.8.2 toml==0.10.1 pyarrow==1.0.0
37+
38+
source venv/bin/activate
39+
maturin develop
40+
python -m unittest discover tests
41+
42+
popd

dev/release/00-prepare-test.rb

+18
Original file line numberDiff line numberDiff line change
@@ -271,6 +271,15 @@ def test_version_pre_tag
271271
"+arrow = { path = \"../arrow\", version = \"#{@release_version}\" }"],
272272
],
273273
},
274+
{
275+
path: "rust/arrow-pyarrow-integration-testing/Cargo.toml",
276+
hunks: [
277+
["-version = \"#{@snapshot_version}\"",
278+
"+version = \"#{@release_version}\""],
279+
["-arrow = { path = \"../arrow\", version = \"#{@snapshot_version}\" }",
280+
"+arrow = { path = \"../arrow\", version = \"#{@release_version}\" }"],
281+
],
282+
},
274283
{
275284
path: "rust/arrow/Cargo.toml",
276285
hunks: [
@@ -509,6 +518,15 @@ def test_version_post_tag
509518
"+arrow = { path = \"../arrow\", version = \"#{@next_snapshot_version}\" }"],
510519
],
511520
},
521+
{
522+
path: "rust/arrow-pyarrow-integration-testing/Cargo.toml",
523+
hunks: [
524+
["-version = \"#{@release_version}\"",
525+
"+version = \"#{@next_snapshot_version}\""],
526+
["-arrow = { path = \"../arrow\", version = \"#{@release_version}\" }",
527+
"+arrow = { path = \"../arrow\", version = \"#{@next_snapshot_version}\" }"],
528+
],
529+
},
512530
{
513531
path: "rust/arrow/Cargo.toml",
514532
hunks: [

rust/Cargo.toml

+5
Original file line numberDiff line numberDiff line change
@@ -26,3 +26,8 @@ members = [
2626
"integration-testing",
2727
"benchmarks",
2828
]
29+
30+
# this package is excluded because it requires different compilation flags, thereby significantly changing
31+
# how it is compiled within the workspace, causing the whole workspace to be compiled from scratch
32+
# this way, this is a stand-alone package that compiles independently of the others.
33+
exclude = ["arrow-pyarrow-integration-testing"]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
[target.x86_64-apple-darwin]
19+
rustflags = [
20+
"-C", "link-arg=-undefined",
21+
"-C", "link-arg=dynamic_lookup",
22+
]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
__pycache__
2+
venv
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
[package]
19+
name = "arrow-pyarrow-integration-testing"
20+
description = ""
21+
version = "3.0.0-SNAPSHOT"
22+
homepage = "https://github.com/apache/arrow"
23+
repository = "https://github.com/apache/arrow"
24+
authors = ["Apache Arrow <[email protected]>"]
25+
license = "Apache-2.0"
26+
keywords = [ "arrow" ]
27+
edition = "2018"
28+
29+
[lib]
30+
name = "arrow_pyarrow_integration_testing"
31+
crate-type = ["cdylib"]
32+
33+
[dependencies]
34+
arrow = { path = "../arrow", version = "3.0.0-SNAPSHOT" }
35+
pyo3 = { version = "0.12.1", features = ["extension-module"] }
36+
37+
[package.metadata.maturin]
38+
requires-dist = ["pyarrow>=1"]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
<!---
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# Arrow c integration
21+
22+
This is a Rust crate that tests compatibility between Rust's Arrow implementation and PyArrow.
23+
24+
Note that this crate uses two languages and an external ABI:
25+
* `Rust`
26+
* `Python`
27+
* C ABI privately exposed by `Pyarrow`.
28+
29+
## Basic idea
30+
31+
Pyarrow exposes a C ABI to convert arrow arrays from and to its C implementation, see [here](https://arrow.apache.org/docs/format/CDataInterface.html).
32+
33+
This package uses the equivalent struct in Rust (`arrow::array::ArrowArray`), and verifies that
34+
we can use pyarrow's interface to move pointers from and to Rust.
35+
36+
## Relevant literature
37+
38+
* [Arrow's CDataInterface](https://arrow.apache.org/docs/format/CDataInterface.html)
39+
* [Rust's FFI](https://doc.rust-lang.org/nomicon/ffi.html)
40+
* [Pyarrow private binds](https://github.com/apache/arrow/blob/ae1d24efcc3f1ac2a876d8d9f544a34eb04ae874/python/pyarrow/array.pxi#L1226)
41+
* [PyO3](https://docs.rs/pyo3/0.12.1/pyo3/index.html)
42+
43+
## How to develop
44+
45+
```bash
46+
# prepare development environment (used to build wheel / install in development)
47+
python -m venv venv
48+
venv/bin/pip install maturin==0.8.2 toml==0.10.1 pyarrow==1.0.0
49+
```
50+
51+
Whenever rust code changes (your changes or via git pull):
52+
53+
```bash
54+
source venv/bin/activate
55+
maturin develop
56+
python -m unittest discover tests
57+
```
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
[build-system]
19+
requires = ["maturin"]
20+
build-backend = "maturin"

0 commit comments

Comments
 (0)