Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support BF16 in Rust Numpy #380

Closed
guoqingbao opened this issue Jun 21, 2023 · 9 comments · Fixed by #381
Closed

Support BF16 in Rust Numpy #380

guoqingbao opened this issue Jun 21, 2023 · 9 comments · Fixed by #381

Comments

@guoqingbao
Copy link

Can we support numpy bfloat16 in Rust? Since we have half::bf16 in Rust-numpy but cannot pass the numpy array with "bfloat16" to Rust using this crate. I think the support of bf16 is similar to that of b16 you have done before. Thanks!

@adamreichold
Copy link
Member

What NumPy dtype does half::bf16 correspond to? Does NumPy actually support BFloat16 and if so, which version is required?

@guoqingbao
Copy link
Author

Thanks for the reply. I used numpy 1.19.5 and I can create a numpy array with:

arr = np.ones((16,16), "bfloat16")

You may observe the dtype info of the array created above in vs code.

@adamreichold
Copy link
Member

I suspect that you have some additional package installed/imported which provides the dtype. On plain NumPy installation, I get

> python3
Python 3.11.3 (main, Apr 27 2023, 22:08:21) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.version.full_version
'1.24.2'
>>> arr = np.ones((16,16), "bfloat16")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.11/site-packages/numpy/core/numeric.py", line 205, in ones
    a = empty(shape, dtype, order)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: data type 'bfloat16' not understood

@guoqingbao
Copy link
Author

I suspect that you have some additional package installed/imported which provides the dtype. On plain NumPy installation, I get

> python3
Python 3.11.3 (main, Apr 27 2023, 22:08:21) [GCC] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy as np
>>> np.version.full_version
'1.24.2'
>>> arr = np.ones((16,16), "bfloat16")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.11/site-packages/numpy/core/numeric.py", line 205, in ones
    a = empty(shape, dtype, order)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: data type 'bfloat16' not understood

I think there is a package to extend numpy for bfloat16 (pip install bfloat16), but I didn't install it. Perhaps Tensorflow registered the bfloat16 type for numpy. You may use:

    import tensorflow as tf
    bfloat16 = tf.bfloat16.as_numpy_dtype
    arr = np.ones((16,16), bfloat16)

Since we have all the dtype info and half::bf16 for type conversion, can we still implement bfloat16 in Rust-numpy?

@adamreichold
Copy link
Member

Since we have all the dtype info and half::bf16 for type conversion, can we still implement bfloat16 in Rust-numpy?

In principle, we can but it does complicate things: Testing becomes more complex because we would need additional dependencies. Getting at the actual type descriptor is more complex because we need to use the Python API (instead of NumPy's native API).

It is possible, but I am not sure yet whether I want to commit to including it under these conditions. For example, I would not be happy about our CI depending on Tensorflow. Maybe on a best effort basis, i.e. just the code but without tests. But that would mean we get more bug reports. 🤔

@guoqingbao
Copy link
Author

Since we have all the dtype info and half::bf16 for type conversion, can we still implement bfloat16 in Rust-numpy?

In principle, we can but it does complicate things: Testing becomes more complex because we would need additional dependencies. Getting at the actual type descriptor is more complex because we need to use the Python API (instead of NumPy's native API).

It is possible, but I am not sure yet whether I want to commit to including it under these conditions. For example, I would not be happy about our CI depending on Tensorflow. Maybe on a best effort basis, i.e. just the code but without tests. But that would mean we get more bug reports. 🤔

I think the implementation for bfloat16 may not depend on Tensorflow or other external python packages because the rust half::bf16 have such type conversion and what we are going to do is to make it compatible with python bfloat16, e.g., Tensorflow -> numpy bf16 array. There is a pure C++ implementation for supporting numpy bfloat16 https://github.com/GreenWaves-Technologies/bfloat16/blob/main/bfloat16.cc
Is that still feasible?

@adamreichold
Copy link
Member

I think the implementation for bfloat16 may not depend on Tensorflow or other external python packages because the rust half::bf16 have such type conversion and what we are going to do is to make it compatible with python bfloat16, e.g., Tensorflow -> numpy bf16 array. There is a pure C++ implementation for supporting numpy bfloat16 https://github.com/GreenWaves-Technologies/bfloat16/blob/main/bfloat16.cc
Is that still feasible?

My idea was rather to not rely on the specific package which provides the dtype implementation at all by creating the array descriptor through the Python API and caching it.

Could you give #381 a try whether it would work for you?

@guoqingbao
Copy link
Author

I think the implementation for bfloat16 may not depend on Tensorflow or other external python packages because the rust half::bf16 have such type conversion and what we are going to do is to make it compatible with python bfloat16, e.g., Tensorflow -> numpy bf16 array. There is a pure C++ implementation for supporting numpy bfloat16 https://github.com/GreenWaves-Technologies/bfloat16/blob/main/bfloat16.cc
Is that still feasible?

My idea was rather to not rely on the specific package which provides the dtype implementation at all by creating the array descriptor through the Python API and caching it.

Could you give #381 a try whether it would work for you?

That's great! I will try and report the results.

@guoqingbao
Copy link
Author

I think the implementation for bfloat16 may not depend on Tensorflow or other external python packages because the rust half::bf16 have such type conversion and what we are going to do is to make it compatible with python bfloat16, e.g., Tensorflow -> numpy bf16 array. There is a pure C++ implementation for supporting numpy bfloat16 https://github.com/GreenWaves-Technologies/bfloat16/blob/main/bfloat16.cc
Is that still feasible?

My idea was rather to not rely on the specific package which provides the dtype implementation at all by creating the array descriptor through the Python API and caching it.

Could you give #381 a try whether it would work for you?

I can confirm that it works! Thanks for the superfast support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants