llama_ros Descargar - llama_ros Descarga del código fuente

llama_ros

Código Fuente de IA

4.1.2

Descargar

llama_ros

Este repositorio proporciona un conjunto de paquetes ROS 2 para integrar llama.cpp en ROS 2. Usando los paquetes llama_ros, puede incorporar fácilmente las poderosas capacidades de optimización de llama.cpp en sus proyectos ROS 2 ejecutando LLM y VLM basados en GGUF. También puede utilizar funciones de llama.cpp, como gramáticas GBNF, y modificar LoRA en tiempo real.

Tabla de contenido

Proyectos Relacionados
Instalación
Estibador
Uso
- llama_cli
- Iniciar archivos
- Adaptadores LoRA
- Clientes ROS 2
- LangChain
Población

Proyectos Relacionados

chatbot_ros → Este chatbot, integrado en ROS 2, usa susurro_ros, para escuchar el habla de las personas; y llama_ros, para generar respuestas. El chatbot está controlado por una máquina de estados creada con YASMIN.
explicable_ros → Una herramienta ROS 2 para explicar el comportamiento de un robot. Utilizando la integración de LangChain, los registros se almacenan en una base de datos vectorial. Luego, se aplica RAG para recuperar registros relevantes para las preguntas de los usuarios respondidas con llama_ros.

Instalación

Para ejecutar llama_ros con CUDA, primero debe instalar CUDA Toolkit. Luego, puede compilar llama_ros con --cmake-args -DGGML_CUDA=ON para habilitar la compatibilidad con CUDA.

 cd ~ /ros2_ws/src
git clone https://github.com/mgonzs13/llama_ros.git
pip3 install -r llama_ros/requirements.txt
cd ~ /ros2_ws
rosdep install --from-paths src --ignore-src -r -y
colcon build --cmake-args -DGGML_CUDA=ON # add this for CUDA

Estibador

Compile la ventana acoplable llama_ros o descargue una imagen de DockerHub. Puede optar por compilar llama_ros con CUDA ( USE_CUDA ) y elegir la versión CUDA ( CUDA_VERSION ). Recuerda que debes usar DOCKER_BUILDKIT=0 para compilar llama_ros con CUDA al construir la imagen.

DOCKER_BUILDKIT=0 docker build -t llama_ros --build-arg USE_CUDA=1 --build-arg CUDA_VERSION=12-6 .

Ejecute el contenedor acoplable. Si desea utilizar CUDA, debe instalar NVIDIA Container Tollkit y agregar --gpus all .

docker run -it --rm --gpus all llama_ros

Uso

llama_cli

Los comandos se incluyen en llama_ros para acelerar la prueba de LLM basados en GGUF dentro del ecosistema ROS 2. De esta manera, los siguientes comandos se integran en los comandos de ROS 2:

lanzamiento

Con este comando, inicie un LLM desde un archivo YAML. La configuración de YAML se utiliza para iniciar LLM de la misma manera que se utiliza un archivo de inicio normal. Aquí hay un ejemplo de cómo usarlo:

ros2 llama launch ~ /ros2_ws/src/llama_ros/llama_bringup/models/StableLM-Zephyr.yaml

inmediato

Con este comando, envíe un mensaje a un LLM iniciado. El comando utiliza una cadena, que es el mensaje y tiene los siguientes argumentos:

( -r , --reset ): si se debe restablecer el LLM antes de solicitarlo
( -t , --temp ): El valor de temperatura
( --image-url ): URL de la imagen para enviar a un VLM

Aquí hay un ejemplo de cómo usarlo:

ros2 llama prompt " Do you know ROS 2? " -t 0.0

Iniciar archivos

Primero que nada, necesitas crear un archivo de inicio para usar llama_ros o llava_ros. Este archivo de inicio contendrá los parámetros principales para descargar el modelo de HuggingFace y configurarlo. Eche un vistazo a los siguientes ejemplos y los archivos de inicio predefinidos.

llama_ros (Lanzamiento de Python)

Haga clic para ampliar

 from launch import LaunchDescription
from llama_bringup . utils import create_llama_launch


def generate_launch_description ():

    return LaunchDescription ([
        create_llama_launch (
            n_ctx = 2048 , # context of the LLM in tokens
            n_batch = 8 , # batch size in tokens
            n_gpu_layers = 0 , # layers to load in GPU
            n_threads = 1 , # threads
            n_predict = 2048 , # max tokens, -1 == inf

            model_repo = "TheBloke/Marcoroni-7B-v3-GGUF" , # Hugging Face repo
            model_filename = "marcoroni-7b-v3.Q4_K_M.gguf" , # model file in repo

            system_prompt_type = "Alpaca" # system prompt type
        )
    ])

ros2 launch llama_bringup marcoroni.launch.py

llama_ros (Configuración YAML)

Haga clic para ampliar

 n_ctx : 2048 # context of the LLM in tokens
n_batch : 8 # batch size in tokens
n_gpu_layers : 0 # layers to load in GPU
n_threads : 1 # threads
n_predict : 2048 # max tokens, -1 == inf

model_repo : " cstr/Spaetzle-v60-7b-GGUF " # Hugging Face repo
model_filename : " Spaetzle-v60-7b-q4-k-m.gguf " # model file in repo

system_prompt_type : " Alpaca " # system prompt type

 import os
from launch import LaunchDescription
from llama_bringup . utils import create_llama_launch_from_yaml
from ament_index_python . packages import get_package_share_directory


def generate_launch_description ():
    return LaunchDescription ([
        create_llama_launch_from_yaml ( os . path . join (
            get_package_share_directory ( "llama_bringup" ), "models" , "Spaetzle.yaml" ))
    ])

ros2 launch llama_bringup spaetzle.launch.py

llama_ros (Configuración YAML + fragmentos de modelo)

Haga clic para ampliar

 n_ctx : 2048 # context of the LLM in tokens
n_batch : 8 # batch size in tokens
n_gpu_layers : 0 # layers to load in GPU
n_threads : 1 # threads
n_predict : 2048 # max tokens, -1 == inf

model_repo : " Qwen/Qwen2.5-Coder-7B-Instruct-GGUF " # Hugging Face repo
model_filename : " qwen2.5-coder-7b-instruct-q4_k_m-00001-of-00002.gguf " # model shard file in repo

system_prompt_type : " ChatML " # system prompt type

ros2 llama launch Qwen2.yaml

llama_ros (Lanzamiento de Python)

Haga clic para ampliar

 from launch import LaunchDescription
from llama_bringup . utils import create_llama_launch

def generate_launch_description ():

    return LaunchDescription ([
        create_llama_launch (
            use_llava = True , # enable llava

            n_ctx = 8192 , # context of the LLM in tokens, use a huge context size to load images
            n_batch = 512 , # batch size in tokens
            n_gpu_layers = 33 , # layers to load in GPU
            n_threads = 1 , # threads
            n_predict = 8192 , # max tokens, -1 == inf

            model_repo = "cjpais/llava-1.6-mistral-7b-gguf" , # Hugging Face repo
            model_filename = "llava-v1.6-mistral-7b.Q4_K_M.gguf" , # model file in repo

            mmproj_repo = "cjpais/llava-1.6-mistral-7b-gguf" , # Hugging Face repo
            mmproj_filename = "mmproj-model-f16.gguf" , # mmproj file in repo

            system_prompt_type = "Mistral" # system prompt type
        )
    ])

ros2 launch llama_bringup llava.launch.py

llama_ros (Configuración YAML)

Haga clic para ampliar

 use_llava : True # enable llava

n_ctx : 8192 # context of the LLM in tokens use a huge context size to load images
n_batch : 512 # batch size in tokens
n_gpu_layers : 33 # layers to load in GPU
n_threads : 1 # threads
n_predict : 8192 # max tokens -1 : :  inf

model_repo : " cjpais/llava-1.6-mistral-7b-gguf " # Hugging Face repo
model_filename : " llava-v1.6-mistral-7b.Q4_K_M.gguf " # model file in repo

mmproj_repo : " cjpais/llava-1.6-mistral-7b-gguf " # Hugging Face repo
mmproj_filename : " mmproj-model-f16.gguf " # mmproj file in repo

system_prompt_type : " mistral " # system prompt type

 def generate_launch_description ():
    return LaunchDescription ([
        create_llama_launch_from_yaml ( os . path . join (
            get_package_share_directory ( "llama_bringup" ),
            "models" , "llava-1.6-mistral-7b-gguf.yaml" ))
    ])

ros2 launch llama_bringup llava.launch.py

Adaptadores LoRA

Puede utilizar adaptadores LoRA al iniciar LLM. Al utilizar las funciones de llama.cpp, puede cargar varios adaptadores eligiendo la escala que se aplicará a cada adaptador. Aquí tienes un ejemplo de uso de adaptadores LoRA con Phi-3. Puede enumerar los LoRA utilizando el servicio /llama/list_loras y modificar los valores de sus escalas utilizando el servicio /llama/update_loras . Un valor de escala de 0,0 significa no utilizar ese LoRA.

Haga clic para ampliar

 n_ctx : 2048
n_batch : 8
n_gpu_layers : 0
n_threads : 1
n_predict : 2048

model_repo : " bartowski/Phi-3.5-mini-instruct-GGUF "
model_filename : " Phi-3.5-mini-instruct-Q4_K_M.gguf "

lora_adapters :
  - repo : " zhhan/adapter-Phi-3-mini-4k-instruct_code_writing "
    filename : " Phi-3-mini-4k-instruct-adaptor-f16-code_writer.gguf "
    scale : 0.5
  - repo : " zhhan/adapter-Phi-3-mini-4k-instruct_summarization "
    filename : " Phi-3-mini-4k-instruct-adaptor-f16-summarization.gguf "
    scale : 0.5

system_prompt_type : " Phi-3 "

Clientes ROS 2

Tanto llama_ros como llava_ros proporcionan interfaces ROS 2 para acceder a las principales funcionalidades de los modelos. Aquí tienes algunos ejemplos de cómo usarlos dentro de nodos ROS 2. Además, eche un vistazo a las demostraciones llama_demo_node.py y llava_demo_node.py.

tokenizar

Haga clic para ampliar

 from rclpy . node import Node
from llama_msgs . srv import Tokenize


class ExampleNode ( Node ):
    def __init__ ( self ) -> None :
        super (). __init__ ( "example_node" )

        # create the client
        self . srv_client = self . create_client ( Tokenize , "/llama/tokenize" )

        # create the request
        req = Tokenize . Request ()
        req . text = "Example text"

        # call the tokenize service
        self . srv_client . wait_for_service ()
        tokens = self . srv_client . call ( req ). tokens

Destokenizar

Haga clic para ampliar

 from rclpy . node import Node
from llama_msgs . srv import Detokenize


class ExampleNode ( Node ):
    def __init__ ( self ) -> None :
        super (). __init__ ( "example_node" )

        # create the client
        self . srv_client = self . create_client ( Detokenize , "/llama/detokenize" )

        # create the request
        req = Detokenize . Request ()
        req . tokens = [ 123 , 123 ]

        # call the tokenize service
        self . srv_client . wait_for_service ()
        text = self . srv_client . call ( req ). text

Incrustaciones

Haga clic para ampliar

Recuerde iniciar llama_ros con la incrustación configurada en verdadero para poder generar incrustaciones con su LLM.

 from rclpy . node import Node
from llama_msgs . srv import Embeddings


class ExampleNode ( Node ):
    def __init__ ( self ) -> None :
        super (). __init__ ( "example_node" )

        # create the client
        self . srv_client = self . create_client ( Embeddings , "/llama/generate_embeddings" )

        # create the request
        req = Embeddings . Request ()
        req . prompt = "Example text"
        req . normalize = True

        # call the embedding service
        self . srv_client . wait_for_service ()
        embeddings = self . srv_client . call ( req ). embeddings

Generar respuesta

Haga clic para ampliar

 import rclpy
from rclpy . node import Node
from rclpy . action import ActionClient
from llama_msgs . action import GenerateResponse


class ExampleNode ( Node ):
    def __init__ ( self ) -> None :
        super (). __init__ ( "example_node" )

        # create the client
        self . action_client = ActionClient (
            self , GenerateResponse , "/llama/generate_response" )

        # create the goal and set the sampling config
        goal = GenerateResponse . Goal ()
        goal . prompt = self . prompt
        goal . sampling_config . temp = 0.2

        # wait for the server and send the goal
        self . action_client . wait_for_server ()
        send_goal_future = self . action_client . send_goal_async (
            goal )

        # wait for the server
        rclpy . spin_until_future_complete ( self , send_goal_future )
        get_result_future = send_goal_future . result (). get_result_async ()

        # wait again and take the result
        rclpy . spin_until_future_complete ( self , get_result_future )
        result : GenerateResponse . Result = get_result_future . result (). result

Generar respuesta (llava)

Haga clic para ampliar

 import cv2
from cv_bridge import CvBridge

import rclpy
from rclpy . node import Node
from rclpy . action import ActionClient
from llama_msgs . action import GenerateResponse


class ExampleNode ( Node ):
    def __init__ ( self ) -> None :
        super (). __init__ ( "example_node" )

        # create a cv bridge for the image
        self . cv_bridge = CvBridge ()

        # create the client
        self . action_client = ActionClient (
            self , GenerateResponse , "/llama/generate_response" )

        # create the goal and set the sampling config
        goal = GenerateResponse . Goal ()
        goal . prompt = self . prompt
        goal . sampling_config . temp = 0.2

        # add your image to the goal
        image = cv2 . imread ( "/path/to/your/image" , cv2 . IMREAD_COLOR )
        goal . image = self . cv_bridge . cv2_to_imgmsg ( image )

        # wait for the server and send the goal
        self . action_client . wait_for_server ()
        send_goal_future = self . action_client . send_goal_async (
            goal )

        # wait for the server
        rclpy . spin_until_future_complete ( self , send_goal_future )
        get_result_future = send_goal_future . result (). get_result_async ()

        # wait again and take the result
        rclpy . spin_until_future_complete ( self , get_result_future )
        result : GenerateResponse . Result = get_result_future . result (). result

LangChain

Existe una integración llama_ros para LangChain. Por tanto, se podrían aplicar técnicas de ingeniería rápidas. Aquí tienes un ejemplo para usarlo.

llama_ros (Cadena)

Haga clic para ampliar

 import rclpy
from llama_ros . langchain import LlamaROS
from langchain . prompts import PromptTemplate
from langchain_core . output_parsers import StrOutputParser


rclpy . init ()

# create the llama_ros llm for langchain
llm = LlamaROS ()

# create a prompt template
prompt_template = "tell me a joke about {topic}"
prompt = PromptTemplate (
    input_variables = [ "topic" ],
    template = prompt_template
)

# create a chain with the llm and the prompt template
chain = prompt | llm | StrOutputParser ()

# run the chain
text = chain . invoke ({ "topic" : "bears" })
print ( text )

rclpy . shutdown ()

llama_ros (Transmisión)

Haga clic para ampliar

 import rclpy
from llama_ros . langchain import LlamaROS
from langchain . prompts import PromptTemplate
from langchain_core . output_parsers import StrOutputParser


rclpy . init ()

# create the llama_ros llm for langchain
llm = LlamaROS ()

# create a prompt template
prompt_template = "tell me a joke about {topic}"
prompt = PromptTemplate (
    input_variables = [ "topic" ],
    template = prompt_template
)

# create a chain with the llm and the prompt template
chain = prompt | llm | StrOutputParser ()

# run the chain
for c in chain . stream ({ "topic" : "bears" }):
    print ( c , flush = True , end = "" )

rclpy . shutdown ()

llava_ros

Haga clic para ampliar

 import rclpy
from llama_ros . langchain import LlamaROS

rclpy . init ()

# create the llama_ros llm for langchain
llm = LlamaROS ()

# bind the url_image
llm = llm . bind ( image_url = image_url ). stream ( "Describe the image" )
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"

# run the llm
for c in llm :
    print ( c , flush = True , end = "" )

rclpy . shutdown ()

llama_ros_embeddings (RAG)

Haga clic para ampliar

 import rclpy
from langchain_chroma import Chroma
from llama_ros . langchain import LlamaROSEmbeddings


rclpy . init ()

# create the llama_ros embeddings for langchain
embeddings = LlamaROSEmbeddings ()

# create a vector database and assign it
db = Chroma ( embedding_function = embeddings )

# create the retriever
retriever = db . as_retriever ( search_kwargs = { "k" : 5 })

# add your texts
db . add_texts ( texts = [ "your_texts" ])

# retrieve documents
documents = retriever . invoke ( "your_query" )
print ( documents )

rclpy . shutdown ()

llama_ros (Renranker)

Haga clic para ampliar

 import rclpy
from llama_ros . langchain import LlamaROSReranker
from llama_ros . langchain import LlamaROSEmbeddings

from langchain_community . vectorstores import FAISS
from langchain_community . document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain . retrievers import ContextualCompressionRetriever


rclpy . init ()

# load the documents
documents = TextLoader ( "../state_of_the_union.txt" ,). load ()
text_splitter = RecursiveCharacterTextSplitter (
    chunk_size = 500 , chunk_overlap = 100 )
texts = text_splitter . split_documents ( documents )

# create the llama_ros embeddings
embeddings = LlamaROSEmbeddings ()

# create the VD and the retriever
retriever = FAISS . from_documents (
    texts , embeddings ). as_retriever ( search_kwargs = { "k" : 20 })

# create the compressor using the llama_ros reranker
compressor = LlamaROSReranker ()
compression_retriever = ContextualCompressionRetriever (
    base_compressor = compressor , base_retriever = retriever
)

# retrieve the documents
compressed_docs = compression_retriever . invoke (
    "What did the president say about Ketanji Jackson Brown"
)

for doc in compressed_docs :
    print ( "-" * 50 )
    print ( doc . page_content )
    print ( " n " )

rclpy . shutdown ()