تنزيل llama_ros - تنزيل كود المصدر llama

llama_ros

كود الذكاء الاصطناعي

4.1.2

تنزيل

llama_ros

يوفر هذا المستودع مجموعة من حزم ROS 2 لدمج llama.cpp في ROS 2. باستخدام حزم llama_ros، يمكنك بسهولة دمج إمكانات التحسين القوية لـ llama.cpp في مشاريع ROS 2 الخاصة بك عن طريق تشغيل LLMs وVLMs المستندة إلى GGUF. يمكنك أيضًا استخدام ميزات من llama.cpp مثل قواعد GBNF وتعديل LoRAs في الوقت الفعلي.

جدول المحتويات

المشاريع ذات الصلة
تثبيت
عامل ميناء
الاستخدام
- llama_cli
- إطلاق الملفات
- محولات لورا
- عملاء روس 2
- لانجشين
العروض التوضيحية

المشاريع ذات الصلة

chatbot_ros → يستخدم برنامج الدردشة الآلي هذا، المدمج في ROS 2، whisper_ros للاستماع إلى كلام الأشخاص؛ وllama_ros لتوليد الاستجابات. يتم التحكم في chatbot بواسطة جهاز حالة تم إنشاؤه باستخدام YASMIN.
Explanable_ros → أداة ROS 2 لشرح سلوك الروبوت. باستخدام تكامل LangChain، يتم تخزين السجلات في قاعدة بيانات متجهة. بعد ذلك، يتم تطبيق RAG لاسترداد السجلات ذات الصلة لأسئلة المستخدم التي تمت الإجابة عليها باستخدام llama_ros.

تثبيت

لتشغيل llama_ros مع CUDA، يجب عليك أولاً تثبيت مجموعة أدوات CUDA. بعد ذلك، يمكنك تجميع llama_ros باستخدام --cmake-args -DGGML_CUDA=ON لتمكين دعم CUDA.

 cd ~ /ros2_ws/src
git clone https://github.com/mgonzs13/llama_ros.git
pip3 install -r llama_ros/requirements.txt
cd ~ /ros2_ws
rosdep install --from-paths src --ignore-src -r -y
colcon build --cmake-args -DGGML_CUDA=ON # add this for CUDA

عامل ميناء

أنشئ عامل الإرساء llama_ros أو قم بتنزيل صورة من DockerHub. يمكنك اختيار إنشاء llama_ros باستخدام CUDA ( USE_CUDA ) واختيار إصدار CUDA ( CUDA_VERSION ). تذكر أنه يتعين عليك استخدام DOCKER_BUILDKIT=0 لتجميع llama_ros مع CUDA عند إنشاء الصورة.

DOCKER_BUILDKIT=0 docker build -t llama_ros --build-arg USE_CUDA=1 --build-arg CUDA_VERSION=12-6 .

قم بتشغيل حاوية عامل الإرساء. إذا كنت تريد استخدام CUDA، فيجب عليك تثبيت NVIDIA Container Tollkit وإضافة --gpus all .

docker run -it --rm --gpus all llama_ros

الاستخدام

llama_cli

يتم تضمين الأوامر في llama_ros لتسريع اختبار LLMs المستندة إلى GGUF داخل النظام البيئي ROS 2. بهذه الطريقة، يتم دمج الأوامر التالية في أوامر ROS 2:

يطلق

باستخدام هذا الأمر، قم بتشغيل LLM من ملف YAML. يتم استخدام تكوين YAML لإطلاق LLM بنفس طريقة استخدام ملف الإطلاق العادي. فيما يلي مثال لكيفية استخدامه:

ros2 llama launch ~ /ros2_ws/src/llama_ros/llama_bringup/models/StableLM-Zephyr.yaml

اِسْتَدْعَى

باستخدام هذا الأمر، قم بإرسال مطالبة إلى LLM الذي تم إطلاقه. يستخدم الأمر سلسلة، وهي الموجه وتحتوي على الوسائط التالية:

( -r , --reset ): ما إذا كان سيتم إعادة تعيين LLM قبل المطالبة
( -t , --temp ): قيمة درجة الحرارة
( --image-url ): عنوان url للصورة المراد إرسالها إلى VLM

فيما يلي مثال لكيفية استخدامه:

ros2 llama prompt " Do you know ROS 2? " -t 0.0

إطلاق الملفات

أولاً، تحتاج إلى إنشاء ملف تشغيل لاستخدام llama_ros أو llava_ros. سيحتوي ملف التشغيل هذا على المعلمات الرئيسية لتنزيل النموذج من HuggingFace وتكوينه. ألقِ نظرة على الأمثلة التالية وملفات التشغيل المحددة مسبقًا.

llama_ros (إطلاق بايثون)

انقر للتوسيع

 from launch import LaunchDescription
from llama_bringup . utils import create_llama_launch


def generate_launch_description ():

    return LaunchDescription ([
        create_llama_launch (
            n_ctx = 2048 , # context of the LLM in tokens
            n_batch = 8 , # batch size in tokens
            n_gpu_layers = 0 , # layers to load in GPU
            n_threads = 1 , # threads
            n_predict = 2048 , # max tokens, -1 == inf

            model_repo = "TheBloke/Marcoroni-7B-v3-GGUF" , # Hugging Face repo
            model_filename = "marcoroni-7b-v3.Q4_K_M.gguf" , # model file in repo

            system_prompt_type = "Alpaca" # system prompt type
        )
    ])

ros2 launch llama_bringup marcoroni.launch.py

llama_ros (تكوين YAML)

انقر للتوسيع

 n_ctx : 2048 # context of the LLM in tokens
n_batch : 8 # batch size in tokens
n_gpu_layers : 0 # layers to load in GPU
n_threads : 1 # threads
n_predict : 2048 # max tokens, -1 == inf

model_repo : " cstr/Spaetzle-v60-7b-GGUF " # Hugging Face repo
model_filename : " Spaetzle-v60-7b-q4-k-m.gguf " # model file in repo

system_prompt_type : " Alpaca " # system prompt type

 import os
from launch import LaunchDescription
from llama_bringup . utils import create_llama_launch_from_yaml
from ament_index_python . packages import get_package_share_directory


def generate_launch_description ():
    return LaunchDescription ([
        create_llama_launch_from_yaml ( os . path . join (
            get_package_share_directory ( "llama_bringup" ), "models" , "Spaetzle.yaml" ))
    ])

ros2 launch llama_bringup spaetzle.launch.py

llama_ros (تكوين YAML + أجزاء النموذج)

انقر للتوسيع

 n_ctx : 2048 # context of the LLM in tokens
n_batch : 8 # batch size in tokens
n_gpu_layers : 0 # layers to load in GPU
n_threads : 1 # threads
n_predict : 2048 # max tokens, -1 == inf

model_repo : " Qwen/Qwen2.5-Coder-7B-Instruct-GGUF " # Hugging Face repo
model_filename : " qwen2.5-coder-7b-instruct-q4_k_m-00001-of-00002.gguf " # model shard file in repo

system_prompt_type : " ChatML " # system prompt type

ros2 llama launch Qwen2.yaml

llava_ros (إطلاق بايثون)

انقر للتوسيع

 from launch import LaunchDescription
from llama_bringup . utils import create_llama_launch

def generate_launch_description ():

    return LaunchDescription ([
        create_llama_launch (
            use_llava = True , # enable llava

            n_ctx = 8192 , # context of the LLM in tokens, use a huge context size to load images
            n_batch = 512 , # batch size in tokens
            n_gpu_layers = 33 , # layers to load in GPU
            n_threads = 1 , # threads
            n_predict = 8192 , # max tokens, -1 == inf

            model_repo = "cjpais/llava-1.6-mistral-7b-gguf" , # Hugging Face repo
            model_filename = "llava-v1.6-mistral-7b.Q4_K_M.gguf" , # model file in repo

            mmproj_repo = "cjpais/llava-1.6-mistral-7b-gguf" , # Hugging Face repo
            mmproj_filename = "mmproj-model-f16.gguf" , # mmproj file in repo

            system_prompt_type = "Mistral" # system prompt type
        )
    ])

ros2 launch llama_bringup llava.launch.py

llava_ros (تكوين YAML)

انقر للتوسيع

 use_llava : True # enable llava

n_ctx : 8192 # context of the LLM in tokens use a huge context size to load images
n_batch : 512 # batch size in tokens
n_gpu_layers : 33 # layers to load in GPU
n_threads : 1 # threads
n_predict : 8192 # max tokens -1 : :  inf

model_repo : " cjpais/llava-1.6-mistral-7b-gguf " # Hugging Face repo
model_filename : " llava-v1.6-mistral-7b.Q4_K_M.gguf " # model file in repo

mmproj_repo : " cjpais/llava-1.6-mistral-7b-gguf " # Hugging Face repo
mmproj_filename : " mmproj-model-f16.gguf " # mmproj file in repo

system_prompt_type : " mistral " # system prompt type

 def generate_launch_description ():
    return LaunchDescription ([
        create_llama_launch_from_yaml ( os . path . join (
            get_package_share_directory ( "llama_bringup" ),
            "models" , "llava-1.6-mistral-7b-gguf.yaml" ))
    ])

ros2 launch llama_bringup llava.launch.py

محولات لورا

يمكنك استخدام محولات LoRA عند تشغيل LLMs. باستخدام ميزات llama.cpp، يمكنك تحميل محولات متعددة باختيار المقياس الذي سيتم تطبيقه على كل محول. إليك مثال على استخدام محولات LoRA مع Phi-3. يمكنك إدراج LoRAs باستخدام خدمة /llama/list_loras وتعديل قيم مقاييسها باستخدام خدمة /llama/update_loras . قيمة المقياس 0.0 تعني عدم استخدام LoRA.

انقر للتوسيع

 n_ctx : 2048
n_batch : 8
n_gpu_layers : 0
n_threads : 1
n_predict : 2048

model_repo : " bartowski/Phi-3.5-mini-instruct-GGUF "
model_filename : " Phi-3.5-mini-instruct-Q4_K_M.gguf "

lora_adapters :
  - repo : " zhhan/adapter-Phi-3-mini-4k-instruct_code_writing "
    filename : " Phi-3-mini-4k-instruct-adaptor-f16-code_writer.gguf "
    scale : 0.5
  - repo : " zhhan/adapter-Phi-3-mini-4k-instruct_summarization "
    filename : " Phi-3-mini-4k-instruct-adaptor-f16-summarization.gguf "
    scale : 0.5

system_prompt_type : " Phi-3 "

عملاء روس 2

يوفر كل من llama_ros وllava_ros واجهات ROS 2 للوصول إلى الوظائف الرئيسية للنماذج. إليك بعض الأمثلة حول كيفية استخدامها داخل عقد ROS 2. علاوة على ذلك، قم بإلقاء نظرة على العروض التوضيحية llama_demo_node.py وllava_demo_node.py.

ترميز

انقر للتوسيع

 from rclpy . node import Node
from llama_msgs . srv import Tokenize


class ExampleNode ( Node ):
    def __init__ ( self ) -> None :
        super (). __init__ ( "example_node" )

        # create the client
        self . srv_client = self . create_client ( Tokenize , "/llama/tokenize" )

        # create the request
        req = Tokenize . Request ()
        req . text = "Example text"

        # call the tokenize service
        self . srv_client . wait_for_service ()
        tokens = self . srv_client . call ( req ). tokens

إزالة الرمز

انقر للتوسيع

 from rclpy . node import Node
from llama_msgs . srv import Detokenize


class ExampleNode ( Node ):
    def __init__ ( self ) -> None :
        super (). __init__ ( "example_node" )

        # create the client
        self . srv_client = self . create_client ( Detokenize , "/llama/detokenize" )

        # create the request
        req = Detokenize . Request ()
        req . tokens = [ 123 , 123 ]

        # call the tokenize service
        self . srv_client . wait_for_service ()
        text = self . srv_client . call ( req ). text

التضمين

انقر للتوسيع

تذكر تشغيل llama_ros مع ضبط التضمين على "صحيح" لتتمكن من إنشاء عمليات التضمين باستخدام LLM الخاص بك.

 from rclpy . node import Node
from llama_msgs . srv import Embeddings


class ExampleNode ( Node ):
    def __init__ ( self ) -> None :
        super (). __init__ ( "example_node" )

        # create the client
        self . srv_client = self . create_client ( Embeddings , "/llama/generate_embeddings" )

        # create the request
        req = Embeddings . Request ()
        req . prompt = "Example text"
        req . normalize = True

        # call the embedding service
        self . srv_client . wait_for_service ()
        embeddings = self . srv_client . call ( req ). embeddings

توليد الاستجابة

انقر للتوسيع

 import rclpy
from rclpy . node import Node
from rclpy . action import ActionClient
from llama_msgs . action import GenerateResponse


class ExampleNode ( Node ):
    def __init__ ( self ) -> None :
        super (). __init__ ( "example_node" )

        # create the client
        self . action_client = ActionClient (
            self , GenerateResponse , "/llama/generate_response" )

        # create the goal and set the sampling config
        goal = GenerateResponse . Goal ()
        goal . prompt = self . prompt
        goal . sampling_config . temp = 0.2

        # wait for the server and send the goal
        self . action_client . wait_for_server ()
        send_goal_future = self . action_client . send_goal_async (
            goal )

        # wait for the server
        rclpy . spin_until_future_complete ( self , send_goal_future )
        get_result_future = send_goal_future . result (). get_result_async ()

        # wait again and take the result
        rclpy . spin_until_future_complete ( self , get_result_future )
        result : GenerateResponse . Result = get_result_future . result (). result

توليد الاستجابة (اللافا)

انقر للتوسيع

 import cv2
from cv_bridge import CvBridge

import rclpy
from rclpy . node import Node
from rclpy . action import ActionClient
from llama_msgs . action import GenerateResponse


class ExampleNode ( Node ):
    def __init__ ( self ) -> None :
        super (). __init__ ( "example_node" )

        # create a cv bridge for the image
        self . cv_bridge = CvBridge ()

        # create the client
        self . action_client = ActionClient (
            self , GenerateResponse , "/llama/generate_response" )

        # create the goal and set the sampling config
        goal = GenerateResponse . Goal ()
        goal . prompt = self . prompt
        goal . sampling_config . temp = 0.2

        # add your image to the goal
        image = cv2 . imread ( "/path/to/your/image" , cv2 . IMREAD_COLOR )
        goal . image = self . cv_bridge . cv2_to_imgmsg ( image )

        # wait for the server and send the goal
        self . action_client . wait_for_server ()
        send_goal_future = self . action_client . send_goal_async (
            goal )

        # wait for the server
        rclpy . spin_until_future_complete ( self , send_goal_future )
        get_result_future = send_goal_future . result (). get_result_async ()

        # wait again and take the result
        rclpy . spin_until_future_complete ( self , get_result_future )
        result : GenerateResponse . Result = get_result_future . result (). result

لانجشين

يوجد تكامل llama_ros لـ LangChain. وبالتالي، يمكن تطبيق التقنيات الهندسية السريعة. هنا لديك مثال لاستخدامه.

لاما_روس (سلسلة)

انقر للتوسيع

 import rclpy
from llama_ros . langchain import LlamaROS
from langchain . prompts import PromptTemplate
from langchain_core . output_parsers import StrOutputParser


rclpy . init ()

# create the llama_ros llm for langchain
llm = LlamaROS ()

# create a prompt template
prompt_template = "tell me a joke about {topic}"
prompt = PromptTemplate (
    input_variables = [ "topic" ],
    template = prompt_template
)

# create a chain with the llm and the prompt template
chain = prompt | llm | StrOutputParser ()

# run the chain
text = chain . invoke ({ "topic" : "bears" })
print ( text )

rclpy . shutdown ()

لاما_روس (تيار)

انقر للتوسيع

 import rclpy
from llama_ros . langchain import LlamaROS
from langchain . prompts import PromptTemplate
from langchain_core . output_parsers import StrOutputParser


rclpy . init ()

# create the llama_ros llm for langchain
llm = LlamaROS ()

# create a prompt template
prompt_template = "tell me a joke about {topic}"
prompt = PromptTemplate (
    input_variables = [ "topic" ],
    template = prompt_template
)

# create a chain with the llm and the prompt template
chain = prompt | llm | StrOutputParser ()

# run the chain
for c in chain . stream ({ "topic" : "bears" }):
    print ( c , flush = True , end = "" )

rclpy . shutdown ()

llava_ros

انقر للتوسيع

 import rclpy
from llama_ros . langchain import LlamaROS

rclpy . init ()

# create the llama_ros llm for langchain
llm = LlamaROS ()

# bind the url_image
llm = llm . bind ( image_url = image_url ). stream ( "Describe the image" )
image_url = "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"

# run the llm
for c in llm :
    print ( c , flush = True , end = "" )

rclpy . shutdown ()

لاما_روس_embeddings (RAG)

انقر للتوسيع

 import rclpy
from langchain_chroma import Chroma
from llama_ros . langchain import LlamaROSEmbeddings


rclpy . init ()

# create the llama_ros embeddings for langchain
embeddings = LlamaROSEmbeddings ()

# create a vector database and assign it
db = Chroma ( embedding_function = embeddings )

# create the retriever
retriever = db . as_retriever ( search_kwargs = { "k" : 5 })

# add your texts
db . add_texts ( texts = [ "your_texts" ])

# retrieve documents
documents = retriever . invoke ( "your_query" )
print ( documents )

rclpy . shutdown ()

لاما_روس (رينرانكر)

انقر للتوسيع

 import rclpy
from llama_ros . langchain import LlamaROSReranker
from llama_ros . langchain import LlamaROSEmbeddings

from langchain_community . vectorstores import FAISS
from langchain_community . document_loaders import TextLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain . retrievers import ContextualCompressionRetriever


rclpy . init ()

# load the documents
documents = TextLoader ( "../state_of_the_union.txt" ,). load ()
text_splitter = RecursiveCharacterTextSplitter (
    chunk_size = 500 , chunk_overlap = 100 )
texts = text_splitter . split_documents ( documents )

# create the llama_ros embeddings
embeddings = LlamaROSEmbeddings ()

# create the VD and the retriever
retriever = FAISS . from_documents (
    texts , embeddings ). as_retriever ( search_kwargs = { "k" : 20 })

# create the compressor using the llama_ros reranker
compressor = LlamaROSReranker ()
compression_retriever = ContextualCompressionRetriever (
    base_compressor = compressor , base_retriever = retriever
)

# retrieve the documents
compressed_docs = compression_retriever . invoke (
    "What did the president say about Ketanji Jackson Brown"
)

for doc in compressed_docs :
    print ( "-" * 50 )
    print ( doc . page_content )
    print ( " n " )

rclpy . shutdown ()