llama3_explained Download - llama3_explained Source code download

llama3_explained

Other source code

Download

? Models on Hugging Face | Blog | Website | Get Started

Meta Llama 3

We are unleashing the power of large language models. Our latest version of Llama is now available to individuals, creators, researchers and businesses of all sizes so they can responsibly experiment, innovate and scale their ideas.

This release includes model weights and starting code for pre-trained and instruction-tuned Llama 3 language models, including parameter sizes from 8B to 70B.

This repository is intended as a minimal example of loading a Llama 3 model and running inference. See llama-recipes for more detailed examples.

download

In order to download the model weights and tokenizer, please visit the Meta Llama website and accept our license agreement.

After submitting your request, you will receive a signed URL via email. Then run the download.sh script, passing the provided URL when prompted to start the download.

Prerequisite: Make sure you have wget and md5sum installed. Then run the script: ./download.sh .

Keep in mind that the link will expire after 24 hours and a certain number of downloads. If you start seeing errors like 403: Forbidden , you can always re-request the link.

Visit Hugging Face

We also offer downloads on Hugging Face, including transformers and native llama3 formats. To download weights from Hugging Face, follow these steps:

Visit one of the repositories, for example meta-llama/Meta-Llama-3-8B-Instruct.
Read and accept the license. Once your request is approved, you will gain access to all Llama 3 models. Please note that it often takes up to an hour to process requests.
To download the original native weights for use with this repository, click the "Files and versions" tab and download the contents of the original folder. You can also download them from the command line if you installed pip install huggingface-hub :

huggingface-cli download meta-llama/Meta-Llama-3-8B-Instruct --include " original/* " --local-dir meta-llama/Meta-Llama-3-8B-Instruct

For use with transformers, the following pipeline code snippet will download and cache the weights:

  import transformers
  import torch
  
  model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
  
  pipeline = transformers . pipeline (
    "text-generation" , 
    model = "meta-llama/Meta-Llama-3-8B-Instruct" ,
    model_kwargs = { "torch_dtype" : torch . bfloat16 },
    device = "cuda" ,
  )

quick start

You can quickly start using the Llama 3 model by following the steps below. These steps will enable you to do fast inference locally. For more examples, check out the Llama recipe repository.

Clone and download this repository in a conda environment with PyTorch/CUDA installed.
Run in the top directory:
```
pip install -e .
```
Visit the Meta Llama website and register to download models.
After registering, you will receive an email with the URL to download the model. You will need this URL when running the download.sh script.
Once you receive the email, navigate to the llama repository you downloaded and run the download.sh script.
- Make sure to grant execution permissions to the download.sh script
- During this process, you will be prompted to enter the URL from the email.
- Do not use the "Copy link" option, instead make sure to manually copy the link from the email.
After downloading the required model, you can run the model locally using the following command:

torchrun --nproc_per_node 1 example_chat_completion.py 
    --ckpt_dir Meta-Llama-3-8B-Instruct/ 
    --tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model 
    --max_seq_len 512 --max_batch_size 6

Notice

Replace Meta-Llama-3-8B-Instruct/ with your checkpoint directory path and Meta-Llama-3-8B-Instruct/tokenizer.model with your tokenizer model path.
–nproc_per_node should be set to the MP value of the model you are using.
Adjust max_seq_len and max_batch_size parameters as needed.
This example runs example_chat_completion.py found in this repository, but you can change to a different .py file.

reasoning

Different models require different model parallelism (MP) values:

Model	MP
8B	1
70B	8

All models support sequence lengths up to 8192 tokens, but we pre-allocate cache based on the values of max_seq_len and max_batch_size . Therefore, set these values according to your hardware.

Pre-trained model

These models are not fine-tuned for chat or Q&A. Prompts should be set up so that the expected answer is a natural continuation of the prompt.

See example_text_completion.py for some examples. For illustration, see the command below to run it using the llama-3-8b model ( nproc_per_node needs to be set to MP value):

torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir Meta-Llama-3-8B/ --tokenizer_path Meta-Llama-3-8B/tokenizer.model --max_seq_len 128 --max_batch_size 4

command tuning model

Fine-tuned models are trained for conversational applications. In order to obtain their expected characteristics and performance, they need to follow a specific format defined in ChatFormat : prompts start with the special token <|begin_of_text|> , followed by one or more messages. Each message starts with the tag <|start_header_id|> , has the role of system , user , or assistant , and ends with the tag <|end_header_id|> . After the double newline nn content of the message follows. The end of each message is marked by the <|eot_id|> token.

You can also deploy additional classifiers to filter out inputs and outputs deemed unsafe. See an example in the llama-recipes repository on how to add safety checkers to the input and output of your inference code.

Example using llama-3-8b-chat:

torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir Meta-Llama-3-8B-Instruct/ --tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model --max_seq_len 512 --max_batch_size 6

Llama 3 is a new technology and comes with potential risks. The tests conducted so far do not – and cannot – cover every situation. To help developers address these risks, we've created Responsible Use Guidelines.

question

Please report software "bugs" or other issues with the model via one of the following methods:

Report model issues: https://github.com/meta-llama/llama3/issues
Report risk content generated by your model: developers.facebook.com/llama_output_feedback
Report vulnerabilities and security issues: facebook.com/whitehat/info

model card

See MODEL_CARD.md.

license

Our models and weights are licensed to researchers and commercial entities, adhering to open principles. Our mission is to empower individuals and industries through this opportunity while promoting an environment of discovery and ethical AI advancement.

Please review the LICENSE document, as well as our Acceptable Use Policy