Skip to content
This repository has been archived by the owner on May 12, 2023. It is now read-only.

How to build pyllamacpp without AVX2 or FMA. #71

Closed
kuvaus opened this issue Apr 20, 2023 · 2 comments
Closed

How to build pyllamacpp without AVX2 or FMA. #71

kuvaus opened this issue Apr 20, 2023 · 2 comments

Comments

@kuvaus
Copy link

kuvaus commented Apr 20, 2023

How to build pyllamacpp without AVX2 or FMA.

1) Check what features your CPU supports

I have an old Mac but these commands likely also work on any linux machine.

The default pyllamacpp and llama.cpp require AVX2 support. But there is a way to build both even if have an old CPU with AVX1 support. First, check what technologies your CPU supports. On a Mac you can do it with:

sysctl -a

I these options which means it supports AVX1 but not AVX2 or FMA.
hw.optional.avx1_0: 1
hw.optional.avx2_0: 0
hw.optional.fma: 0

2) Clone the repository and edit the CMakelists

So, clone the repository.

git clone --recursive https://github.com/nomic-ai/pyllamacpp && cd pyllamacpp

Edit the CMakeLists.txt and change:

option(LLAMA_AVX2  "llama: enable AVX2" OFF)
option(LLAMA_FMA   "llama: enable FMA"  OFF)

Run the install:

pip install -e .

It should install the custom pyllamacpp to your python packages.

3) Use the built pyllamacpp in code.

Now you can just use

import pyllamacpp
from pyllamacpp.model import Model

Now, for some reason this version only works with a single thread. So if you write

generated_text = model.generate("Once upon a time, ", n_predict=55, n_threads=1)

Everything is fine, but

generated_text = model.generate("Once upon a time, ", n_predict=55, n_threads=2)

Seems to generate gibberish.

4) Compare with llama.cpp

For testing purposes I also built the regular llama.cpp.
Here, like they say in their github issues, you have to use regular make instead of cmake to make it work without AVX2. But after building the cpp version, it does work with multiple threads. So just run make like this and you should get the main file:

make

Now, with regular llama.cpp you can write:

./main --threads 4  [+rest of the options]

and that will work.

@kuvaus
Copy link
Author

kuvaus commented Apr 21, 2023

Update: fixed the gibberish issue by downloading the newest llama.cpp submodule and replacing the old @ 3525899 checkpoint folder with the newest. Changed also the CMakeLists.txt inside llama.cpp folder to disable AVX2, FMA but that might not be necessary. All good now.

@kuvaus kuvaus closed this as completed Apr 21, 2023
@kuvaus kuvaus changed the title How to build pyllamacpp without AVX2 or FMA. Works only with n_threads=1, n_threads=2 creates gibberish. How to build pyllamacpp without AVX2 or FMA. Apr 21, 2023
@absadiki
Copy link
Collaborator

absadiki commented May 2, 2023

Thanks so much @kuvaus for the guide.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants