Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for arrays of ASCII and Unicode strings #378

Merged
merged 2 commits into from
Jun 2, 2023

Conversation

adamreichold
Copy link
Member

@adamreichold adamreichold commented May 14, 2023

This supports Unicode arrays whose element length is know at compile time.

Closes #141

@adamreichold adamreichold force-pushed the unicode-const-generics branch 7 times, most recently from ecdfac5 to f8c4ec1 Compare May 14, 2023 20:23
@adamreichold adamreichold changed the title Add PyUCS4Array implementor of Element Add support for arrays of ASCII and Unicode strings May 15, 2023
@rachtsingh
Copy link

Worked great! This was a very painless way to parse fixed-size ASCII and Unicode strings into numpy arrays. Thank you for doing all the work that I had aimed to do, it is really appreciated.

One small nit: it took me a while to understand that the PyASCIIArray type was a newtype around [Py_UCS1; N] which is the same as [u8; N]. For some reason I thought it was a specialization of the PyArray type, but it seems fairly different to me (feel free to disagree).

I couldn't think of any better names, but maybe one of:

  1. NPYString and NPYUnicode (with a comment marking those as Py_UCS1 / S or Py_UCS4 / U dtypes)
  2. NPYFixedString and NPYFixedUnicode to make it more explicit that these are fixed-length dtypes?

I thought about PyASCIIArray1, but think that's a bad idea because Array in this crate should refer to numpy arrays, but here we're talking about dtypes. For example, here's a numpy array:

In [5]: x = np.array(['a', 'b', 'c', 'd'], dtype='S1').reshape((2, 2))

In [6]: x
Out[6]:
array([[b'a', b'b'],
       [b'c', b'd']], dtype='|S1')

and I think the precise way to refer to this is a (2, 2) numpy array of dtype S1, so the way to construct that should be explicit in rust-numpy.

Anyway, thanks for all the help!

@adamreichold adamreichold force-pushed the unicode-const-generics branch from f8c4ec1 to 8784135 Compare May 15, 2023 16:01
@adamreichold
Copy link
Member Author

One small nit: it took me a while to understand that the PyASCIIArray type was a newtype around [Py_UCS1; N] which is the same as [u8; N]

That is literally the first sentence of its docs. 😉 But I agree, the names are somewhat misleading as these not related to PyArray at all. I changed them to PyFixedString and PyFixedUnicode to stay within our existing naming scheme but emphasize that these are strings with a fixed capacity.

@adamreichold adamreichold force-pushed the unicode-const-generics branch 2 times, most recently from 09385ca to 4dce32f Compare May 15, 2023 16:40
Copy link
Member

@kngwyu kngwyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really happy to see this finally comes with const generics. Cool. I left some minor comments.

@adamreichold adamreichold force-pushed the unicode-const-generics branch from 4dce32f to 5305469 Compare May 16, 2023 18:04
@adamreichold adamreichold force-pushed the unicode-const-generics branch from 5305469 to 80d962b Compare May 31, 2023 16:21
@adamreichold adamreichold marked this pull request as ready for review May 31, 2023 16:22
@adamreichold
Copy link
Member Author

@kngwyu I think this is ready to merge as I have just branched off the release-0.19 maintenance branch. Could you have another look? Thank you!

Copy link
Member

@kngwyu kngwyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two minor comments on the document, but otherwise, it looks good. Thank you for achieving this!

@adamreichold adamreichold force-pushed the unicode-const-generics branch from 80d962b to e32ffe1 Compare June 1, 2023 12:20
@adamreichold adamreichold force-pushed the unicode-const-generics branch from e32ffe1 to ea5033f Compare June 2, 2023 16:10
@adamreichold adamreichold merged commit 9f10b61 into main Jun 2, 2023
@adamreichold adamreichold deleted the unicode-const-generics branch June 2, 2023 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support array of &str / unicode_ / string
3 participants