Large Language Models (LLMs) are cutting-edge technology that I'm experimenting with. While managed services like OpenAI offer cost-effective LLM usage, there are scenarios where running an LLM locally becomes necessary. This may be due to handling sensitive data or needing high-quality outputs in languages other than English. Open source LLMs match the quality of major players like OpenAI but often demand significant compute resources. Deploying smaller models on platforms like AWS Lambda can offer cost-effective alternatives.
My goal with this project is to deploy a smaller open-source LLM, specifically Microsoft Phi-2, a 2.7 billion parameter model that rivals outputs from larger open-source models. I'll explore LLMs, and docker-based lambdas, evaluate performance, and assess costs for real-world applications.
Ensure the necessary tools are installed, including an AWS account, AWS CLI, Docker, and Python.
lambda_function.py
file.requirements.txt
, starting with the AWS library (boto3
).Dockerfile
specifying the Docker image composition.docker-compose.yml
for running and building the container.docker-compose up
.llama-cpp-python
to requirements.txt
.Rebuild the container and test with a real prompt using curl
.
Execute the deployment using the provided script (deploy.sh
). This involves creating or checking the ECR repository, IAM role, Docker-ECR authentication, Docker image construction, ECR image upload, IAM role ARN acquisition, Lambda function verification, configuration, and deployment.
Use the Lambda function URL obtained during deployment to test with a prompt.
Working knowledge of programming, Docker, AWS, and Python.
Feel free to explore, modify, and run the provided scripts to deploy and test open-source LLM on AWS Lambda.