llama_ros 다운로드 - llama_ros 소스 코드 다운로드

llama_ros

AI 소스 코드

4.1.2

다운로드

라마_로스

이 저장소는 llama.cpp를 ROS 2에 통합하기 위한 ROS 2 패키지 세트를 제공합니다. llama_ros 패키지를 사용하면 GGUF 기반 LLM 및 VLM을 실행하여 llama.cpp의 강력한 최적화 기능을 ROS 2 프로젝트에 쉽게 통합할 수 있습니다. GBNF 문법과 같은 llama.cpp의 기능을 사용하고 실시간으로 LoRA를 수정할 수도 있습니다.

설치

CUDA와 함께 llama_ros를 실행하려면 먼저 CUDA 툴킷을 설치해야 합니다. 그런 다음 --cmake-args -DGGML_CUDA=ON 사용하여 llama_ros를 컴파일하여 CUDA 지원을 활성화할 수 있습니다.

 cd ~ /ros2_ws/src
git clone https://github.com/mgonzs13/llama_ros.git
pip3 install -r llama_ros/requirements.txt
cd ~ /ros2_ws
rosdep install --from-paths src --ignore-src -r -y
colcon build --cmake-args -DGGML_CUDA=ON # add this for CUDA

도커

llama_ros docker를 빌드하거나 DockerHub에서 이미지를 다운로드하세요. CUDA( USE_CUDA )로 llama_ros를 빌드하고 CUDA 버전( CUDA_VERSION )을 선택할 수 있습니다. 이미지를 빌드할 때 CUDA로 llama_ros를 컴파일하려면 DOCKER_BUILDKIT=0 사용해야 한다는 점을 기억하세요.

DOCKER_BUILDKIT=0 docker build -t llama_ros --build-arg USE_CUDA=1 --build-arg CUDA_VERSION=12-6 .

도커 컨테이너를 실행합니다. CUDA를 사용하려면 NVIDIA Container Tollkit을 설치하고 --gpus all 추가해야 합니다.

docker run -it --rm --gpus all llama_ros

용법

라마_cli

ROS 2 생태계 내에서 GGUF 기반 LLM 테스트 속도를 높이기 위해 llama_ros에 명령이 포함되어 있습니다. 이런 방식으로 다음 명령이 ROS 2 명령에 통합됩니다.

시작하다

이 명령을 사용하면 YAML 파일에서 LLM을 시작합니다. YAML의 구성은 일반 시작 파일을 사용하는 것과 동일한 방식으로 LLM을 시작하는 데 사용됩니다. 사용 방법의 예는 다음과 같습니다.

ros2 llama launch ~ /ros2_ws/src/llama_ros/llama_bringup/models/StableLM-Zephyr.yaml

즉각적인

이 명령을 사용하면 시작된 LLM에 프롬프트를 보냅니다. 이 명령은 프롬프트인 문자열을 사용하며 다음 인수를 포함합니다.

( -r , --reset ): 메시지를 표시하기 전에 LLM을 재설정할지 여부
( -t , --temp ): 온도 값
( --image-url ): VLM으로 보낼 이미지 URL

사용 방법의 예는 다음과 같습니다.

ros2 llama prompt " Do you know ROS 2? " -t 0.0

파일 실행

먼저 llama_ros나 llava_ros를 사용하기 위해서는 실행 파일을 생성해야 합니다. 이 실행 파일에는 HuggingFace에서 모델을 다운로드하고 구성하기 위한 주요 매개변수가 포함되어 있습니다. 다음 예제와 사전 정의된 실행 파일을 살펴보세요.

llama_ros(파이썬 실행)

펼치려면 클릭하세요.

 from launch import LaunchDescription
from llama_bringup . utils import create_llama_launch


def generate_launch_description ():

    return LaunchDescription ([
        create_llama_launch (
            n_ctx = 2048 , # context of the LLM in tokens
            n_batch = 8 , # batch size in tokens
            n_gpu_layers = 0 , # layers to load in GPU
            n_threads = 1 , # threads
            n_predict = 2048 , # max tokens, -1 == inf

            model_repo = "TheBloke/Marcoroni-7B-v3-GGUF" , # Hugging Face repo
            model_filename = "marcoroni-7b-v3.Q4_K_M.gguf" , # model file in repo

            system_prompt_type = "Alpaca" # system prompt type
        )
    ])

ros2 launch llama_bringup marcoroni.launch.py

llama_ros(YAML 구성)

펼치려면 클릭하세요.

 n_ctx : 2048 # context of the LLM in tokens
n_batch : 8 # batch size in tokens
n_gpu_layers : 0 # layers to load in GPU
n_threads : 1 # threads
n_predict : 2048 # max tokens, -1 == inf

model_repo : " cstr/Spaetzle-v60-7b-GGUF " # Hugging Face repo
model_filename : " Spaetzle-v60-7b-q4-k-m.gguf " # model file in repo

system_prompt_type : " Alpaca " # system prompt type

 import os
from launch import LaunchDescription
from llama_bringup . utils import create_llama_launch_from_yaml
from ament_index_python . packages import get_package_share_directory


def generate_launch_description ():
    return LaunchDescription ([
        create_llama_launch_from_yaml ( os . path . join (
            get_package_share_directory ( "llama_bringup" ), "models" , "Spaetzle.yaml" ))
    ])

ros2 launch llama_bringup spaetzle.launch.py

llama_ros(YAML 구성 + 모델 샤드)

펼치려면 클릭하세요.

 n_ctx : 2048 # context of the LLM in tokens
n_batch : 8 # batch size in tokens
n_gpu_layers : 0 # layers to load in GPU
n_threads : 1 # threads
n_predict : 2048 # max tokens, -1 == inf

model_repo : " Qwen/Qwen2.5-Coder-7B-Instruct-GGUF " # Hugging Face repo
model_filename : " qwen2.5-coder-7b-instruct-q4_k_m-00001-of-00002.gguf " # model shard file in repo

system_prompt_type : " ChatML " # system prompt type

ros2 llama launch Qwen2.yaml

llava_ros(파이썬 실행)

펼치려면 클릭하세요.

 from launch import LaunchDescription
from llama_bringup . utils import create_llama_launch

def generate_launch_description ():

    return LaunchDescription ([
        create_llama_launch (
            use_llava = True , # enable llava

            n_ctx = 8192 , # context of the LLM in tokens, use a huge context size to load images
            n_batch = 512 , # batch size in tokens
            n_gpu_layers = 33 , # layers to load in GPU
            n_threads = 1 , # threads
            n_predict = 8192 , # max tokens, -1 == inf

            model_repo = "cjpais/llava-1.6-mistral-7b-gguf" , # Hugging Face repo
            model_filename = "llava-v1.6-mistral-7b.Q4_K_M.gguf" , # model file in repo

            mmproj_repo = "cjpais/llava-1.6-mistral-7b-gguf" , # Hugging Face repo
            mmproj_filename = "mmproj-model-f16.gguf" , # mmproj file in repo

            system_prompt_type = "Mistral" # system prompt type
        )
    ])

ros2 launch llama_bringup llava.launch.py

llava_ros(YAML 구성)

펼치려면 클릭하세요.

 use_llava : True # enable llava

n_ctx : 8192 # context of the LLM in tokens use a huge context size to load images
n_batch : 512 # batch size in tokens
n_gpu_layers : 33 # layers to load in GPU
n_threads : 1 # threads
n_predict : 8192 # max tokens -1 : :  inf

model_repo : " cjpais/llava-1.6-mistral-7b-gguf " # Hugging Face repo
model_filename : " llava-v1.6-mistral-7b.Q4_K_M.gguf " # model file in repo

mmproj_repo : " cjpais/llava-1.6-mistral-7b-gguf " # Hugging Face repo
mmproj_filename : " mmproj-model-f16.gguf " # mmproj file in repo

system_prompt_type : " mistral " # system prompt type

 def generate_launch_description ():
    return LaunchDescription ([
        create_llama_launch_from_yaml ( os . path . join (
            get_package_share_directory ( "llama_bringup" ),
            "models" , "llava-1.6-mistral-7b-gguf.yaml" ))
    ])

ros2 launch llama_bringup llava.launch.py

LoRA 어댑터

LLM을 시작할 때 LoRA 어댑터를 사용할 수 있습니다. llama.cpp 기능을 사용하면 각 어댑터에 적용할 배율을 선택하여 여러 어댑터를 로드할 수 있습니다. 다음은 Phi-3과 함께 LoRA 어댑터를 사용하는 예입니다. /llama/list_loras 서비스를 사용하여 LoRA를 나열하고 /llama/update_loras 서비스를 사용하여 스케일 값을 수정할 수 있습니다. 스케일 값 0.0은 해당 LoRA를 사용하지 않음을 의미합니다.

펼치려면 클릭하세요.

 n_ctx : 2048
n_batch : 8
n_gpu_layers : 0
n_threads : 1
n_predict : 2048

model_repo : " bartowski/Phi-3.5-mini-instruct-GGUF "
model_filename : " Phi-3.5-mini-instruct-Q4_K_M.gguf "

lora_adapters :
  - repo : " zhhan/adapter-Phi-3-mini-4k-instruct_code_writing "
    filename : " Phi-3-mini-4k-instruct-adaptor-f16-code_writer.gguf "
    scale : 0.5
  - repo : " zhhan/adapter-Phi-3-mini-4k-instruct_summarization "
    filename : " Phi-3-mini-4k-instruct-adaptor-f16-summarization.gguf "
    scale : 0.5

system_prompt_type : " Phi-3 "

ROS 2 클라이언트

llama_ros와 llava_ros는 모두 모델의 주요 기능에 액세스할 수 있는 ROS 2 인터페이스를 제공합니다. 다음은 ROS 2 노드 내에서 이를 사용하는 방법에 대한 몇 가지 예입니다. 또한 llama_demo_node.py 및 llava_demo_node.py 데모를 살펴보세요.

토큰화

펼치려면 클릭하세요.

 from rclpy . node import Node
from llama_msgs . srv import Tokenize


class ExampleNode ( Node ):
    def __init__ ( self ) -> None :
        super (). __init__ ( "example_node" )

        # create the client
        self . srv_client = self . create_client ( Tokenize , "/llama/tokenize" )

        # create the request
        req = Tokenize . Request ()
        req . text = "Example text"

        # call the tokenize service
        self . srv_client . wait_for_service ()
        tokens = self . srv_client . call ( req ). tokens

토큰화 해제

펼치려면 클릭하세요.

 from rclpy . node import Node
from llama_msgs . srv import Detokenize


class ExampleNode ( Node ):
    def __init__ ( self ) -> None :
        super (). __init__ ( "example_node" )

        # create the client
        self . srv_client = self . create_client ( Detokenize , "/llama/detokenize" )

        # create the request
        req = Detokenize . Request ()
        req . tokens = [ 123 , 123 ]

        # call the tokenize service
        self . srv_client . wait_for_service ()
        text = self . srv_client . call ( req ). text

임베딩

펼치려면 클릭하세요.

LLM으로 임베딩을 생성할 수 있으려면 임베딩을 true로 설정하여 llama_ros를 실행해야 합니다.

 from rclpy . node import Node
from llama_msgs . srv import Embeddings


class ExampleNode ( Node ):
    def __init__ ( self ) -> None :
        super (). __init__ ( "example_node" )

        # create the client
        self . srv_client = self . create_client ( Embeddings , "/llama/generate_embeddings" )

        # create the request
        req = Embeddings . Request ()
        req . prompt = "Example text"
        req . normalize = True

        # call the embedding service
        self . srv_client . wait_for_service ()
        embeddings = self . srv_client . call ( req ). embeddings

응답 생성

펼치려면 클릭하세요.

 import rclpy
from rclpy . node import Node
from rclpy . action import ActionClient
from llama_msgs . action import GenerateResponse


class ExampleNode ( Node ):
    def __init__ ( self ) -> None :
        super (). __init__ ( "example_node" )

        # create the client
        self . action_client = ActionClient (
            self , GenerateResponse , "/llama/generate_response" )

        # create the goal and set the sampling config
        goal = GenerateResponse . Goal ()
        goal . prompt = self . prompt
        goal . sampling_config . temp = 0.2

        # wait for the server and send the goal
        self . action_client . wait_for_server ()
        send_goal_future = self . action_client . send_goal_async (
            goal )

        # wait for the server
        rclpy . spin_until_future_complete ( self , send_goal_future )
        get_result_future = send_goal_future . result (). get_result_async ()

        # wait again and take the result
        rclpy . spin_until_future_complete ( self , get_result_future )
        result : GenerateResponse . Result = get_result_future . result (). result

응답 생성(llava)

펼치려면 클릭하세요.

 import cv2
from cv_bridge import CvBridge

import rclpy
from rclpy . node import Node
from rclpy . action import ActionClient
from llama_msgs . action import GenerateResponse


class ExampleNode ( Node ):
    def __init__ ( self ) -> None :
        super (). __init__ ( "example_node" )

        # create a cv bridge for the image
        self . cv_bridge = CvBridge ()

        # create the client
        self . action_client = ActionClient (
            self , GenerateResponse , "/llama/generate_response" )

        # create the goal and set the sampling config
        goal = GenerateResponse . Goal ()
        goal . prompt = self . prompt
        goal . sampling_config . temp = 0.2

        # add your image to the goal
        image = cv2 . imread ( "/path/to/your/image" , cv2 . IMREAD_COLOR )
        goal . image = self . cv_bridge . cv2_to_imgmsg ( image )

        # wait for the server and send the goal
        self . action_client . wait_for_server ()
        send_goal_future = self . action_client . send_goal_async (
            goal )

        # wait for the server
        rclpy . spin_until_future_complete ( self , send_goal_future )
        get_result_future = send_goal_future . result (). get_result_async ()

        # wait again and take the result
        rclpy . spin_until_future_complete ( self , get_result_future )
        result : GenerateResponse . Result = get_result_future . result (). result

랭체인

LangChain에는 llama_ros 통합이 있습니다. 따라서 신속한 엔지니어링 기술이 적용될 수 있습니다. 여기 그것을 사용하는 예가 있습니다.

llama_ros (체인)

펼치려면 클릭하세요.

 import rclpy
from llama_ros . langchain import LlamaROS
from langchain . prompts import PromptTemplate
from langchain_core . output_parsers import StrOutputParser


rclpy . init ()

# create the llama_ros llm for langchain
llm = LlamaROS ()

# create a prompt template
prompt_template = "tell me a joke about {topic}"
prompt = PromptTemplate (
    input_variables = [ "topic" ],
    template = prompt_template
)

# create a chain with the llm and the prompt template
chain = prompt | llm | StrOutputParser ()

# run the chain
text = chain . invoke ({ "topic" : "bears" })
print ( text )

rclpy . shutdown ()

llama_ros(스트림)

펼치려면 클릭하세요.

 import rclpy
from llama_ros . langchain import LlamaROS
from langchain . prompts import PromptTemplate
from langchain_core . output_parsers import StrOutputParser


rclpy . init ()

# create the llama_ros llm for langchain
llm = LlamaROS ()

# create a prompt template
prompt_template = "tell me a joke about {topic}"
prompt = PromptTemplate (
    input_variables = [ "topic" ],
    template = prompt_template
)

# create a chain with the llm and the prompt template
chain = prompt | llm | StrOutputParser ()

# run the chain
for c in chain . stream ({ "topic" : "bears" }):
    print ( c , flush = True , end = "" )

rclpy . shutdown ()

llava_ros

펼치려면 클릭하세요.

 import rclpy
from llama_ros . langchain import LlamaROS

rclpy . init ()

# create the llama_ros llm for langchain
llm = LlamaROS ()

# bind the url_image
llm = llm . bind ( image_url = image_url ). stream ( "Describe the image" )
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"

# run the llm
for c in llm :
    print ( c , flush = True , end = "" )

rclpy . shutdown ()

llama_ros_embeddings(RAG)

펼치려면 클릭하세요.

 import rclpy
from langchain_chroma import Chroma
from llama_ros . langchain import LlamaROSEmbeddings


rclpy . init ()

# create the llama_ros embeddings for langchain
embeddings = LlamaROSEmbeddings ()

# create a vector database and assign it
db = Chroma ( embedding_function = embeddings )

# create the retriever
retriever = db . as_retriever ( search_kwargs = { "k" : 5 })

# add your texts
db . add_texts ( texts = [ "your_texts" ])

# retrieve documents
documents = retriever . invoke ( "your_query" )
print ( documents )

rclpy . shutdown ()

llama_ros (렌랭커)

펼치려면 클릭하세요.

 import rclpy
from llama_ros . langchain import LlamaROSReranker
from llama_ros . langchain import LlamaROSEmbeddings

from langchain_community . vectorstores import FAISS
from langchain_community . document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain . retrievers import ContextualCompressionRetriever


rclpy . init ()

# load the documents
documents = TextLoader ( "../state_of_the_union.txt" ,). load ()
text_splitter = RecursiveCharacterTextSplitter (
    chunk_size = 500 , chunk_overlap = 100 )
texts = text_splitter . split_documents ( documents )

# create the llama_ros embeddings
embeddings = LlamaROSEmbeddings ()

# create the VD and the retriever
retriever = FAISS . from_documents (
    texts , embeddings ). as_retriever ( search_kwargs = { "k" : 20 })

# create the compressor using the llama_ros reranker
compressor = LlamaROSReranker ()
compression_retriever = ContextualCompressionRetriever (
    base_compressor = compressor , base_retriever = retriever
)

# retrieve the documents
compressed_docs = compression_retriever . invoke (
    "What did the president say about Ketanji Jackson Brown"
)

for doc in compressed_docs :
    print ( "-" * 50 )
    print ( doc . page_content )
    print ( " n " )

rclpy . shutdown ()