p2p generative AI platform
There are tons of personal computers and homelabs out there with lots of compute that remain idle. This project aims to create a marketplace of people with spare compute and people with needs for compute. Note that this is not a financial marketplace -- this is intended to be a public good. If it takes off is anyone's guess, but I'll donate whatever compute I can in the meantime once this is up and running.
create a virtual environment:
$ python -m venv env
Depending on your environment, you may need to replace python
in the above command with python3
.
On Debian-based distributions, you may additionally need to install venv first, if it's not already installed:
apt install python3.12-venv
activate the virtual environment:
source env/bin/activate
.envScriptsactivate
If this command doesn't work (e.g. Activate.ps1 cannot be loaded because running scripts is disabled on this system
), you may have to run the following command in an Administrator PowerShell session:
Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUser
You may be missing some CMAKE dependencies
on Debian, you can run the following
sudo apt-get install build-essential cmake
TODO: instructions for other common OSes
then, run:
pip install -r requirements.txt
If you want to also utilize your GPU (instead of just your CPU), follow these platform-specific instructions.
This is the resource I used to get things finally working: https://medium.com/@piyushbatra1999/installing-llama-cpp-python-with-nvidia-gpu-acceleration-on-windows-a-short-guide-0dfac475002d
Summarizing:
Prerequisites
The next steps need to be executed in the same virtual environment you set up above. You'll see something like (env) on the bottom left in your terminal (may not be true on all platforms in all terminals).
This will replace the llama-cpp-python you installed via pip install -r requirements.txt
and will instruct it to use cuBLAS.
if you're using Command Prompt
set CMAKE_ARGS=-DLLAMA_CUBLAS=on
set FORCE_CMAKE=1
pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose
if you're using Powershell
$env:CMAKE_ARGS = "-DLLAMA_CUBLAS=on"
$env:FORCE_CMAKE=1
pip install llama-cpp-python --force-reinstall --upgrade --no-cache-dir --verbose
when you run server.py
next, you'll see BLAS = 1
in a collection of lines that looks like this:
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
This indicates that server.py
can correctly access your GPU resources.
llama_cpp_python is initialized like this:
llm = Llama(
model_path=model_path,
n_gpu_layers=-1,
n_ctx=8192,
chat_format="llama-3"
)
n_gpu_layers
instructs llama to use as much of your GPU resources as possible.
TODO
TODO
You'll need a way to switch between terminal tabs (e.g. tmux, VS Code terminal tabs).
Launch the relay, which runs on http://localhost:5000:
python relay.py
Then, in a separate terminal tab, launch the server, which runs on http://localhost:3000:
python server.py
NOTE: When first launched, or if the model file isn't present (currently only Llama 7B Chat GGUF by TheBloke), the script will download the model (approximately 4GB) and will save it in the models/
directory in your project directory under the same filename. This will be gated by user interaction in the future to prevent large file downloads without the user's consent. Eventually you'll basically browse models and choose one from a list.
relay.py
acts as a proxy between the client (including, but not limited to, this repo's client.py
) and server.py
, obfuscating each other's public IP from each other, solving one of the big limitations of P2P networks (e.g. for .torrents). In a future version, relay.py
will not see the contents of the conversation between server and client thanks to end-to-end encryption. Anyone can fork this project and run your own relay, which have compute provided by various server.py
s running on various consumer hardware.
You can test things out using the simple command-line client, client.py
:
python client.py
Type your message when prompted and press Enter. All of this is now happening on your local hardware, thanks to llama_cpp_python
, a binding for llama.cpp.
To exit, press Ctrl+C/Cmd+C.
Alternatively, you can visit http://localhost:5000 in your browser.
TODO