You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on May 12, 2023. It is now read-only.
I have an old Mac but these commands likely also work on any linux machine.
The default pyllamacpp and llama.cpp require AVX2 support. But there is a way to build both even if have an old CPU with AVX1 support. First, check what technologies your CPU supports. On a Mac you can do it with:
sysctl -a
I these options which means it supports AVX1 but not AVX2 or FMA.
hw.optional.avx1_0: 1
hw.optional.avx2_0: 0
hw.optional.fma: 0
It should install the custom pyllamacpp to your python packages.
3) Use the built pyllamacpp in code.
Now you can just use
importpyllamacppfrompyllamacpp.modelimportModel
Now, for some reason this version only works with a single thread. So if you write
generated_text=model.generate("Once upon a time, ", n_predict=55, n_threads=1)
Everything is fine, but
generated_text=model.generate("Once upon a time, ", n_predict=55, n_threads=2)
Seems to generate gibberish.
4) Compare with llama.cpp
For testing purposes I also built the regular llama.cpp.
Here, like they say in their github issues, you have to use regular make instead of cmake to make it work without AVX2. But after building the cpp version, it does work with multiple threads. So just run make like this and you should get the main file:
make
Now, with regular llama.cpp you can write:
./main --threads 4 [+rest of the options]
and that will work.
The text was updated successfully, but these errors were encountered:
Update: fixed the gibberish issue by downloading the newest llama.cpp submodule and replacing the old @ 3525899 checkpoint folder with the newest. Changed also the CMakeLists.txt inside llama.cpp folder to disable AVX2, FMA but that might not be necessary. All good now.
kuvaus
changed the title
How to build pyllamacpp without AVX2 or FMA. Works only with n_threads=1, n_threads=2 creates gibberish.
How to build pyllamacpp without AVX2 or FMA.
Apr 21, 2023
How to build pyllamacpp without AVX2 or FMA.
1) Check what features your CPU supports
I have an old Mac but these commands likely also work on any linux machine.
The default pyllamacpp and llama.cpp require AVX2 support. But there is a way to build both even if have an old CPU with AVX1 support. First, check what technologies your CPU supports. On a Mac you can do it with:
I these options which means it supports AVX1 but not AVX2 or FMA.
hw.optional.avx1_0: 1
hw.optional.avx2_0: 0
hw.optional.fma: 0
2) Clone the repository and edit the CMakelists
So, clone the repository.
Edit the CMakeLists.txt and change:
Run the install:
pip install -e .
It should install the custom pyllamacpp to your python packages.
3) Use the built pyllamacpp in code.
Now you can just use
Now, for some reason this version only works with a single thread. So if you write
Everything is fine, but
Seems to generate gibberish.
4) Compare with llama.cpp
For testing purposes I also built the regular llama.cpp.
Here, like they say in their github issues, you have to use regular make instead of cmake to make it work without AVX2. But after building the cpp version, it does work with multiple threads. So just run make like this and you should get the main file:
Now, with regular llama.cpp you can write:
and that will work.
The text was updated successfully, but these errors were encountered: