-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move away from integral casts in xorbuf #1020
Comments
noloader
added a commit
to noloader/cryptopp
that referenced
this issue
Mar 17, 2021
noloader
added a commit
to noloader/cryptopp
that referenced
this issue
Mar 17, 2021
noloader
added a commit
that referenced
this issue
Mar 17, 2021
noloader
added a commit
that referenced
this issue
Mar 17, 2021
We think this is another instance problem that surfaced under GH #683 when inString==outString. It violates aliasing rules and the compiler begins removing code. The ultimate workaround was to add a member variable m_tempOutString as scratch space when inString==outString. We did not loose much in the way of perforamce for some reason. It looks like AES/CTR lost about 0.03-0.05 cpb. When combined with the updated xorbuf from GH #1020, the net result was a speedup of 0.1-0.6 cpb. In fact, some ciphers like RC6, gained almost 5 cpb.
noloader
added a commit
that referenced
this issue
Mar 17, 2021
We think this is another instance problem that surfaced under GH #683 when inString==outString. It violates aliasing rules and the compiler begins removing code. The ultimate workaround was to add a member variable m_tempOutString as scratch space when inString==outString. We did not loose much in the way of perforamce for some reason. It looks like AES/CTR lost about 0.03-0.05 cpb. When combined with the updated xorbuf from GH #1020, the net result was a speedup of 0.1-0.6 cpb. In fact, some ciphers like RC6, gained almost 5 cpb.
noloader
added a commit
that referenced
this issue
Mar 17, 2021
noloader
added a commit
that referenced
this issue
Mar 17, 2021
noloader
added a commit
that referenced
this issue
Mar 18, 2021
EAddario
pushed a commit
to EAddario/cryptopp
that referenced
this issue
Apr 10, 2021
EAddario
pushed a commit
to EAddario/cryptopp
that referenced
this issue
Apr 10, 2021
We think this is another instance problem that surfaced under GH weidai11#683 when inString==outString. It violates aliasing rules and the compiler begins removing code. The ultimate workaround was to add a member variable m_tempOutString as scratch space when inString==outString. We did not loose much in the way of perforamce for some reason. It looks like AES/CTR lost about 0.03-0.05 cpb. When combined with the updated xorbuf from GH weidai11#1020, the net result was a speedup of 0.1-0.6 cpb. In fact, some ciphers like RC6, gained almost 5 cpb.
EAddario
pushed a commit
to EAddario/cryptopp
that referenced
this issue
Apr 10, 2021
We think this is another instance problem that surfaced under GH weidai11#683 when inString==outString. It violates aliasing rules and the compiler begins removing code. The ultimate workaround was to add a member variable m_tempOutString as scratch space when inString==outString. We did not loose much in the way of perforamce for some reason. It looks like AES/CTR lost about 0.03-0.05 cpb. When combined with the updated xorbuf from GH weidai11#1020, the net result was a speedup of 0.1-0.6 cpb. In fact, some ciphers like RC6, gained almost 5 cpb.
EAddario
pushed a commit
to EAddario/cryptopp
that referenced
this issue
Apr 10, 2021
EAddario
pushed a commit
to EAddario/cryptopp
that referenced
this issue
Apr 10, 2021
EAddario
pushed a commit
to EAddario/cryptopp
that referenced
this issue
Apr 10, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
xorbuf
inmisc.cpp
has code to perform xor's on arbitrary buffers. For example:The casting is kind of shady nowadays. It could draw the ire of the compiler and earn us a demerit.
In a simpler form, this is closer to what we should be doing. It avoids the cast and honors alignment. Compilers nowadays will know when they can elide the
memcpy
and simply perform the xor.In fact, we can add some architectural speed-ups to make things run even faster without the casting. The code below runs 0.1 cpb to 0.4 cpb faster on x86_64. An x86_64 machine will enable the
__SSE2__
code path without arch options.This bug report will track the cut-over.
The text was updated successfully, but these errors were encountered: