Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
bugfix: fix the prefill/append attention kernel accuracy issue on sm75 (
#448) As reported by @esmeetu , the prefill/append attention kernel produce incorrect results on sm75. This PR fixes the issue. We need another round of kernel configuration check before releasing the official sm75 wheel (e.g., the nthrs per block 1024 is too large for sm75, we should use smaller values such as 512/256), @zhyncs would you mind helping with this?
- Loading branch information