A chatbot for Discord using Meta's LLaMA model, 4-bit quantized. The 13 billion parameters model fits within less than 9 GiB VRAM.
Before you do any of this, you will need a bot token. If you don't have a bot token, follow this guide to make a bot and then add the bot to your server.
Presently this is Linux only, but you might be able to make it work with other OSs.
pip install virtualenv
), and CUDA installed.git clone https://github.com/AmericanPresidentJimmyCarter/yal-discord-bot/
cd yal-discord-bot
python3 -m virtualenv env
source env/bin/activate
pip install -r requirements.txt
git clone https://github.com/huggingface/transformers/
cd transformers
git checkout 20e54e49fa11172a893d046f6e7364a434cbc04f
pip install -e .
cd ..
cd bot/llama_model
python setup_cuda.py install
cd ../..
wget https://huggingface.co/Neko-Institute-of-Science/LLaMA-13B-4bit-128g/resolve/main/llama-13b-4bit-128g.safetensors
cd bot
python -m bot $YOUR_BOT_TOKEN --allow-queue -g $YOUR_GUILD --llama-model="Neko-Institute-of-Science/LLaMA-13B-4bit-128g" --groupsize=128 --load-checkpoint="path/to/llama/weights/llama-13b-4bit-128g.safetensors"
Ensure that $YOUR_BOT_TOKEN
and $YOUR_GUILD
are set to what they should be, --load-checkpoint=..."
is pointing at the correct location of the weights, and --llama-model=...
is pointing at the correct location in Huggingface to find the configuration for the weights.
You can use any ALPACA model by setting the --alpaca
flag, which will allow you to add input strings as well as automatically format your prompt into the form expected by ALPACA.
Recommended 4-bit ALPACA weights are as follows:
Or GPT4 finetuned (better coding responses, more restrictive in content):
cd bot
python -m bot $YOUR_BOT_TOKEN --allow-queue -g $YOUR_GUILD --alpaca --groupsize=128 --llama-model="elinas/alpaca-30b-lora-int4" --load-checkpoint="path/to/alpaca/weights/alpaca-30b-4bit-128g.safetensors"
(c) 2023 AmericanPresidentJimmyCarter