LLM for Unity enables seamless integration of Large Language Models (LLMs) within the Unity engine.
It allows to create intelligent characters that your players can interact with for an immersive experience.
The package also features a Retrieval-Augmented Generation (RAG) system that allows to performs semantic search across your data, which can be used to enhance the character's knowledge.
LLM for Unity is built on top of the awesome llama.cpp library.
? Tested on Unity: 2021 LTS, 2022 LTS, 2023
? Upcoming Releases
Contact us to add your project!
Method 1: Install using the asset store
Add to My Assets
Window > Package Manager
Packages: My Assets
option from the drop-downLLM for Unity
package, click Download
and then Import
Method 2: Install using the GitHub repo:
Window > Package Manager
+
button and select Add package from git URL
https://github.com/undreamai/LLMUnity.git
and click Add
First you will setup the LLM for your game ?:
Add Component
and select the LLM script.Download Model
button (~GBs).Load model
button (see LLM model management).Then you can setup each of your characters as follows ?♀️:
Add Component
and select the LLMCharacter script.Prompt
. You can define the name of the AI (AI Name
) and the player (Player Name
).LLM
field if you have more than one LLM GameObjects.You can also adjust the LLM and character settings according to your preference (see Options).
In your script you can then use it as follows ?:
using LLMUnity;
public class MyScript {
public LLMCharacter llmCharacter;
void HandleReply(string reply){
// do something with the reply from the model
Debug.Log(reply);
}
void Game(){
// your game function
...
string message = "Hello bot!";
_ = llmCharacter.Chat(message, HandleReply);
...
}
}
You can also specify a function to call when the model reply has been completed.
This is useful if the Stream
option is enabled for continuous output from the model (default behaviour):
void ReplyCompleted(){
// do something when the reply from the model is complete
Debug.Log("The AI replied");
}
void Game(){
// your game function
...
string message = "Hello bot!";
_ = llmCharacter.Chat(message, HandleReply, ReplyCompleted);
...
}
To stop the chat without waiting for its completion you can use:
llmCharacter.CancelRequests();
That's all ✨!
You can also:
To build an Android app you need to specify the IL2CPP
scripting backend and the ARM64
as the target architecture in the player settings.
These settings can be accessed from the Edit > Project Settings
menu within the Player > Other Settings
section.
It is also a good idea to enable the Download on Build
option in the LLM GameObject to download the model on launch in order to keep the app size small.
To automatically save / load your chat history, you can specify the Save
parameter of the LLMCharacter to the filename (or relative path) of your choice.
The file is saved in the persistentDataPath folder of Unity.
This also saves the state of the LLM which means that the previously cached prompt does not need to be recomputed.
To manually save your chat history, you can use:
llmCharacter.Save("filename");
and to load the history:
llmCharacter.Load("filename");
where filename the filename or relative path of your choice.
void WarmupCompleted(){
// do something when the warmup is complete
Debug.Log("The AI is nice and ready");
}
void Game(){
// your game function
...
_ = llmCharacter.Warmup(WarmupCompleted);
...
}
The last argument of the Chat
function is a boolean that specifies whether to add the message to the history (default: true):
void Game(){
// your game function
...
string message = "Hello bot!";
_ = llmCharacter.Chat(message, HandleReply, ReplyCompleted, false);
...
}
void Game(){
// your game function
...
string message = "The cat is away";
_ = llmCharacter.Complete(message, HandleReply, ReplyCompleted);
...
}
For this you can use the async
/await
functionality:
async void Game(){
// your game function
...
string message = "Hello bot!";
string reply = await llmCharacter.Chat(message, HandleReply, ReplyCompleted);
Debug.Log(reply);
...
}
using UnityEngine;
using LLMUnity;
public class MyScript : MonoBehaviour
{
LLM llm;
LLMCharacter llmCharacter;
async void Start()
{
// disable gameObject so that theAwake is not called immediately
gameObject.SetActive(false);
// Add an LLM object
llm = gameObject.AddComponent<LLM>();
// set the model using the filename of the model.
// The model needs to be added to the LLM model manager (see LLM model management) by loading or downloading it.
// Otherwise the model file can be copied directly inside the StreamingAssets folder.
llm.SetModel("Phi-3-mini-4k-instruct-q4.gguf");
// optional: you can also set loras in a similar fashion and set their weights (if needed)
llm.AddLora("my-lora.gguf");
llm.SetLoraWeight(0.5f);
// optional: you can set the chat template of the model if it is not correctly identified
// You can find a list of chat templates in the ChatTemplate.templates.Keys
llm.SetTemplate("phi-3");
// optional: set number of threads
llm.numThreads = -1;
// optional: enable GPU by setting the number of model layers to offload to it
llm.numGPULayers = 10;
// Add an LLMCharacter object
llmCharacter = gameObject.AddComponent<LLMCharacter>();
// set the LLM object that handles the model
llmCharacter.llm = llm;
// set the character prompt
llmCharacter.SetPrompt("A chat between a curious human and an artificial intelligence assistant.");
// set the AI and player name
llmCharacter.AIName = "AI";
llmCharacter.playerName = "Human";
// optional: set streaming to false to get the complete result in one go
// llmCharacter.stream = true;
// optional: set a save path
// llmCharacter.save = "AICharacter1";
// optional: enable the save cache to avoid recomputation when loading a save file (requires ~100 MB)
// llmCharacter.saveCache = true;
// optional: set a grammar
// await llmCharacter.SetGrammar("json.gbnf");
// re-enable gameObject
gameObject.SetActive(true);
}
}
You can use a remote server to carry out the processing and implement characters that interact with it.
Create the server
To create the server:
LLM
script as described aboveRemote
option of the LLM
and optionally configure the server parameters: port, API key, SSL certificate, SSL keyAlternatively you can use a server binary for easier deployment:
windows-cuda-cu12.2.0
.Create the characters
Create a second project with the game characters using the LLMCharacter
script as described above.
Enable the Remote
option and configure the host with the IP address (starting with "http://") and port of the server.
The Embeddings
function can be used to obtain the emdeddings of a phrase:
List<float> embeddings = await llmCharacter.Embeddings("hi, how are you?");
A detailed documentation on function level can be found here:
LLM for Unity implements a super-fast similarity search functionality with a Retrieval-Augmented Generation (RAG) system.
It is based on the LLM functionality, and the Approximate Nearest Neighbors (ANN) search from the usearch library.
Semantic search works as follows.
Building the data You provide text inputs (a phrase, paragraph, document) to add to the data.
Each input is split into chunks (optional) and encoded into embeddings with a LLM.
Searching You can then search for a query text input.
The input is again encoded and the most similar text inputs or chunks in the data are retrieved.
To use semantic serch:
Add Component
and select the RAG
script.SimpleSearch
is a simple brute-force search, whileDBSearch
is a fast ANN method that should be preferred in most cases.Alternatively, you can create the RAG from code (where llm is your LLM):
RAG rag = gameObject.AddComponent<RAG>();
rag.Init(SearchMethods.DBSearch, ChunkingMethods.SentenceSplitter, llm);
In your script you can then use it as follows ?:
using LLMUnity;
public class MyScript : MonoBehaviour
{
RAG rag;
async void Game(){
...
string[] inputs = new string[]{
"Hi! I'm a search system.",
"the weather is nice. I like it.",
"I'm a RAG system"
};
// add the inputs to the RAG
foreach (string input in inputs) await rag.Add(input);
// get the 2 most similar inputs and their distance (dissimilarity) to the search query
(string[] results, float[] distances) = await rag.Search("hello!", 2);
// to get the most similar text parts (chnuks) you can enable the returnChunks option
rag.ReturnChunks(true);
(results, distances) = await rag.Search("hello!", 2);
...
}
}
You can save the RAG state (stored in the Assets/StreamingAssets
folder):
rag.Save("rag.zip");
and load it from disk:
await rag.Load("rag.zip");
You can use the RAG to feed relevant data to the LLM based on a user message:
string message = "How is the weather?";
(string[] similarPhrases, float[] distances) = await rag.Search(message, 3);
string prompt = "Answer the user query based on the provided data.nn";
prompt += $"User query: {message}nn";
prompt += $"Data:n";
foreach (string similarPhrase in similarPhrases) prompt += $"n- {similarPhrase}";
_ = llmCharacter.Chat(prompt, HandleReply, ReplyCompleted);
The RAG
sample includes an example RAG implementation as well as an example RAG-LLM integration.
That's all ✨!
LLM for Unity uses a model manager that allows to load or download LLMs and ship them directly in your game.
The model manager can be found as part of the LLM GameObject:
You can download models with the Download model
button.
LLM for Unity includes different state of the art models built-in for different model sizes, quantised with the Q4_K_M method.
Alternative models can be downloaded from HuggingFace in the .gguf format.
You can download a model locally and load it with the Load model
button, or copy the URL in the Download model > Custom URL
field to directly download it.
If a HuggingFace model does not provide a gguf file, it can be converted to gguf with this online converter.
The chat template used for constructing the prompts is determined automatically from the model (if a relevant entry exists) or the model name.
If incorrecly identified, you can select another template from the chat template dropdown.
Models added in the model manager are copied to the game during the building process.
You can omit a model from being built in by deselecting the "Build" checkbox.
To remove the model (but not delete it from disk) you can click the bin button.
The the path and URL (if downloaded) of each added model is diplayed in the expanded view of the model manager access with the >>
button:
You can create lighter builds by selecting the Download on Build
option.
Using this option the models will be downloaded the first time the game starts instead of copied in the build.
If you have loaded a model locally you need to set its URL through the expanded view, otherwise it will be copied in the build.
❕ Before using any model make sure you check their license ❕
The Samples~ folder contains several examples of interaction ?:
To install a sample:
Window > Package Manager
LLM for Unity
Package. From the Samples
Tab, click Import
next to the sample you want to install.The samples can be run with the Scene.unity
scene they contain inside their folder.
In the scene, select the LLM
GameObject and click the Download Model
button to download a default model or Load model
to load your own model (see LLM model management).
Save the scene, run and enjoy!
Show/Hide Advanced Options
Toggle to show/hide advanced options from belowLog Level
select how verbose the log messages areUse extras
select to install and allow the use of extra features (flash attention and IQ quants)Remote
select to provide remote access to the LLM
Port
port to run the LLM server (if Remote
is set)
Num Threads
number of threads to use (default: -1 = all)
Num GPU Layers
number of model layers to offload to the GPU.
If set to 0 the GPU is not used. Use a large number i.e. >30 to utilise the GPU as much as possible.
Note that higher values of context size will use more VRAM.
If the user's GPU is not supported, the LLM will fall back to the CPU
Debug
select to log the output of the model in the Unity Editor
Parallel Prompts
number of prompts / slots that can happen in parallel (default: -1 = number of LLMCharacter objects). Note that the context size is divided among the slots.e.g. Setting Parallel Prompts
to 1 and slot 0 for all LLMCharacter objects will use the full context, but the entire prompt will need to be computed (no caching) whenever a LLMCharacter object is used for chat.
Dont Destroy On Load
select to not destroy the LLM GameObject when loading a new SceneAPI key
API key to use to allow access to requests from LLMCharacter objects (if Remote
is set)
Load SSL certificate
allows to load a SSL certificate for end-to-end encryption of requests (if Remote
is set). Requires SSL key as well.Load SSL key
allows to load a SSL key for end-to-end encryption of requests (if Remote
is set). Requires SSL certificate as well.SSL certificate path
the SSL certificate used for end-to-end encryption of requests (if Remote
is set).SSL key path
the SSL key used for end-to-end encryption of requests (if Remote
is set).Download model
click to download one of the default models
Load model
click to load your own model in .gguf format
Download on Start
enable to downloaded the LLM models the first time the game starts. Alternatively the LLM models wil be copied directly in the build
Context Size
size of the prompt context (0 = context size of the model)Download lora
click to download a LoRA model in .gguf formatLoad lora
click to load a LoRA model in .gguf formatBatch Size
batch size for prompt processing (default: 512)Model
the path of the model being used (relative to the Assets/StreamingAssets folder)Chat Template
the chat template being used for the LLMLora
the path of the LoRAs being used (relative to the Assets/StreamingAssets folder)Lora Weights
the weights of the LoRAs being usedFlash Attention
click to use flash attention in the model (if Use extras
is enabled)Base Prompt
a common base prompt to use across all LLMCharacter objects using the LLM
Show/Hide Advanced Options
Toggle to show/hide advanced options from belowLog Level
select how verbose the log messages areUse extras
select to install and allow the use of extra features (flash attention and IQ quants)Remote
whether the LLM used is remote or localLLM
the LLM GameObject (if Remote
is not set)Hort
ip of the LLM server (if Remote
is set)Port
port of the LLM server (if Remote
is set)Num Retries
number of HTTP request retries from the LLM server (if Remote
is set)API key
API key of the LLM server (if Remote
is set)Save
save filename or relative pathSave Cache
select to save the LLM state along with the chat history. The LLM state is typically around 100MB+.Debug Prompt
select to log the constructed prompts in the Unity EditorPlayer Name
the name of the playerAI Name
the name of the AIPrompt
description of the AI roleStream
select to receive the reply from the model as it is produced (recommended!).
If it is not selected, the full reply from the model is received in one go
Num Predict
maximum number of tokens to predict (default: 256, -1 = infinity, -2 = until context filled)Load grammar
click to load a grammar in .gbnf formatGrammar
the path of the grammar being used (relative to the Assets/StreamingAssets folder)Cache Prompt
save the ongoing prompt from the chat (default: true)Slot
slot of the server to use for computation. Value can be set from 0 to Parallel Prompts
-1 (default: -1 = new slot for each character)Seed
seed for reproducibility. For random results every time use -1Temperature
LLM temperature, lower values give more deterministic answers (default: 0.2)Top K
top-k sampling (default: 40, 0 = disabled)Top P
top-p sampling (default: 0.9, 1.0 = disabled)Min P
minimum probability for a token to be used (default: 0.05)Repeat Penalty
control the repetition of token sequences in the generated text (default: 1.1)Presence Penalty
repeated token presence penalty (default: 0.0, 0.0 = disabled)Frequency Penalty
repeated token frequency penalty (default: 0.0, 0.0 = disabled)Tfs_z
: enable tail free sampling with parameter z (default: 1.0, 1.0 = disabled).Typical P
: enable locally typical sampling with parameter p (default: 1.0, 1.0 = disabled).Repeat Last N
: last N tokens to consider for penalizing repetition (default: 64, 0 = disabled, -1 = ctx-size).Penalize Nl
: penalize newline tokens when applying the repeat penalty (default: true).Penalty Prompt
: prompt for the purpose of the penalty evaluation. Can be either null
, a string or an array of numbers representing tokens (default: null
= use original prompt
).Mirostat
: enable Mirostat sampling, controlling perplexity during text generation (default: 0, 0 = disabled, 1 = Mirostat, 2 = Mirostat 2.0).Mirostat Tau
: set the Mirostat target entropy, parameter tau (default: 5.0).Mirostat Eta
: set the Mirostat learning rate, parameter eta (default: 0.1).N Probs
: if greater than 0, the response also contains the probabilities of top N tokens for each generated token (default: 0)Ignore Eos
: enable to ignore end of stream tokens and continue generating (default: false).The license of LLM for Unity is MIT (LICENSE.md) and uses third-party software with MIT and Apache licenses. Some models included in the asset define their own license terms, please review them before using each model. Third-party licenses can be found in the (Third Party Notices.md).