The smallest possible LLM API. Build a question and answer interface to your own content in a few minutes. Uses OpenAI embeddings, gpt-3.5 and Faiss, via Langchain.
source.json
.
It should look like this:[
{
"source": "Reference to the source of your content. Typically a title.",
"url": "URL for your source. This key is optional.",
"content": "Your content as a single string. If there's a title or summary, put these first, separated by new lines."
},
...
]
See example.source.json
for an example.
pip install microllama
Get an OpenAI API key and add
it to the environment, e.g. export OPENAI_API_KEY=sk-etc
. Note that
indexing and querying require OpenAI credits, which
aren't free.
Run your server with microllama
. If a vector search index doesn't exist,
it'll be created from your source.json
, and stored.
Query your documents at /api/ask?your question.
Microllama includes an optional web front-end, which is generated with
microllama make-front-end
. This command creates a single index.html
file
which you can edit. It's served at /.
Microllama is configured through environment variables, with the following defaults:
OPENAI_API_KEY
: requiredFAISS_INDEX_PATH
: "faiss_index"SOURCE_JSON
: "source.json"MAX_RELATED_DOCUMENTS
: "5"EXTRA_CONTEXT
: "Answer in no more than three sentences. If the answer is not
included in the context, say 'Sorry, this is no answer for this in my
sources.'."UVICORN_HOST
: "0.0.0.0"UVICORN_PORT
: "8080"Create a Dockerfile with microllama make-dockerfile
. Then:
Sign up for a Fly.io account and install flyctl. Then:
fly launch # answer no to Postgres, Redis and deploying now
fly secrets set OPENAI_API_KEY=sk-etc
fly deploy
gcloud run deploy --source . --set-env-vars="OPENAI_API_KEY=sk-etc"
For Cloud Run and other serverless platforms you should generate the FAISS index
at container build time, to reduce startup time. See the two commented lines in
Dockerfile
.
You can also generate these commands with microllama deploy
.
SpacyTextSplitter(chunk_size=700, chunk_overlap=200, separator=" ")