FlashRank.jl تنزيل - تنزيل FlashRank.jl كود المصدر

FlashRank.jl

شفرة المصدر الأخرى

v0.4.1

تنزيل

FlashRank.jl

FlashRank.jl مستوحى من حزمة FlashRank Python الرائعة، التي تم تطويرها في الأصل بواسطة Prithiviraj Damodaran. تستفيد هذه الحزمة من أوزان النماذج من Prithiviraj's HF repo وSvilupp's HF repo لتوفير طريقة سريعة وفعالة لتصنيف المستندات ذات الصلة بأي استعلام معين بدون وحدات معالجة الرسومات والتبعيات الكبيرة .

يؤدي ذلك إلى تحسين مسارات إنشاء الاسترجاع المعزز (RAG) من خلال تحديد أولويات المستندات الأكثر ملاءمة. يمكن تشغيل أصغر طراز على أي جهاز تقريبًا.

سمات

أربعة نماذج للتصنيف:
- صغير (~4 ميجابايت، INT8): ms-marco-TinyBERT-L-2-v2 (افتراضي) (الاسم المستعار :tiny )
- MiniLM L-4 (~70 ميجابايت، FP32): ms-marco-MiniLM-L-4-v2 ONNX (الاسم المستعار :mini4 )
- MiniLM L-6 (~83.4 ميجابايت، FP32): ms-marco-MiniLM-L-6-v2 ONNX (الاسم المستعار :mini6 )
- MiniLM L-12 (~23 ميجابايت، INT8): ms-marco-MiniLM-L-12-v2 (الاسم المستعار :mini أو mini12 )
تبعيات خفيفة الوزن، وتجنب الأطر الثقيلة مثل Flux وCUDA لسهولة التكامل.

ما مدى سرعة ذلك؟ باستخدام نموذج Tiny، يمكنك ترتيب 100 مستند في حوالي 0.1 ثانية على جهاز كمبيوتر محمول. باستخدام نموذج MiniLM (12 طبقة)، يمكنك ترتيب 100 مستند في حوالي 0.4 ثانية.

نصيحة: اختر الطراز الأكبر الذي يمكنك تحمله بميزانية زمن الوصول الخاصة بك، على سبيل المثال، MiniLM L-12 هو الأبطأ ولكنه يتمتع بأفضل دقة.

لاحظ أننا نستخدم نماذج BERT بحد أقصى لحجم القطعة يبلغ 512 رمزًا (سيتم اقتطاع أي شيء يزيد عن ذلك).

تثبيت

أضفه إلى بيئتك ببساطة باستخدام:

 using Pkg
Pkg . activate ( " . " )
Pkg . add ( " FlashRank " )

الاستخدام

يعد ترتيب مستنداتك لاستعلام معين أمرًا بسيطًا مثل:

 ENV [ " DATADEPS_ALWAYS_ACCEPT " ] = " true "
using FlashRank

ranker = RankerModel () # Defaults to model = `:tiny`

query = " How to speedup LLMs? "
passages = [
        " Introduce *lookahead decoding*: - a parallel decoding algo to accelerate LLM inference - w/o the need for a draft model or a data store - linearly decreases # decoding steps relative to log(FLOPs) used per decoding step. " ,
        " LLM inference efficiency will be one of the most crucial topics for both industry and academia, simply because the more efficient you are, the more $$$ you will save. vllm project is a must-read for this direction, and now they have just released the paper " ,
        " There are many ways to increase LLM inference throughput (tokens/second) and decrease memory footprint, sometimes at the same time. Here are a few methods I’ve found effective when working with Llama 2. These methods are all well-integrated with Hugging Face. This list is far from exhaustive; some of these techniques can be used in combination with each other and there are plenty of others to try. - Bettertransformer (Optimum Library): Simply call `model.to_bettertransformer()` on your Hugging Face model for a modest improvement in tokens per second. - Fp4 Mixed-Precision (Bitsandbytes): Requires minimal configuration and dramatically reduces the model's memory footprint. - AutoGPTQ: Time-consuming but leads to a much smaller model and faster inference. The quantization is a one-time cost that pays off in the long run. " ,
        " Ever want to make your LLM inference go brrrrr but got stuck at implementing speculative decoding and finding the suitable draft model? No more pain! Thrilled to unveil Medusa, a simple framework that removes the annoying draft model while getting 2x speedup. " ,
        " vLLM is a fast and easy-to-use library for LLM inference and serving. vLLM is fast with: State-of-the-art serving throughput Efficient management of attention key and value memory with PagedAttention Continuous batching of incoming requests Optimized CUDA kernels " ,
];


result = rank (ranker, query, passages)

result من النوع RankResult وتحتوي على الفقرات التي تم فرزها ودرجاتها (0-1، حيث 1 هو الأفضل) ومواضع المستندات التي تم فرزها (بالإشارة إلى ناقل passages الأصلي).

فيما يلي ملخص موجز لكيفية دمج FlashRank.jl في خط أنابيب PromptingTools.jl RAG الخاص بك.

للحصول على مثال كامل، راجع examples/prompting_tools_integration.jl .

 using FlashRank
using PromptingTools
using PromptingTools . Experimental . RAGTools
const RT = PromptingTools . Experimental . RAGTools

# Wrap the model to be a valid Ranker recognized by RAGTools
# It will be provided to the airag/rerank function to avoid instantiating it on every call
struct FlashRanker <: RT.AbstractReranker
    model :: RankerModel
end
reranker = RankerModel ( :tiny ) |> FlashRanker

# Define the method for ranking with it
function RT . rerank (
        reranker :: FlashRanker , index :: RT.AbstractDocumentIndex , question :: AbstractString ,
        candidates :: RT.AbstractCandidateChunks ; kwargs ... )
    # # omitted for brevity
    # # See examples/prompting_tools_integration.jl for details
end

# # Apply to the pipeline configuration, eg, 
cfg = RAGConfig (; retriever = RT . AdvancedRetriever (; reranker))
# # assumes existing index
question = " Tell me about prehistoric animals "
result = airag (cfg, index; question, return_all = true )

الاستخدام المتقدم

يمكنك أيضًا الاستفادة من عمليات التضمين "الخشنة" ولكن السريعة باستخدام نموذج tiny_embed (Bert-L4).

embedder = FlashRank . EmbedderModel ( :tiny_embed )

passages = [ " This is a test " , " This is another test " ]
result = FlashRank . embed (embedder, passages)

شكر وتقدير

لقد كان FlashRank وTransformers.jl أساسيين في تطوير هذه الحزمة.
شكر خاص لـ Prithiviraj Damodaran على FlashRank الأصلي وأوزان النموذج الكمي INT8.
وإلى Transformers.jl لتطبيق WordPiece وأداة BERT المميزة التي تم تشعبها لهذه الحزمة (لتقليل التبعيات).

خريطة الطريق

توفير ملحق الحزمة لـ PromptingTools
إحضار نماذج أصغر (على سبيل المثال، Ber-L2-128D)
قم بتقديم تعديل بسيط يعتمد على الطول لتضمين درجة التشابه
إعادة تحميل النماذج المضمنة باستخدام التجميع القائم على القناع (لا يوجد فرق حقيقي، فقط صحيح من الناحية النظرية)

يوسع

معلومات إضافية

الإصدار v0.4.1
النوع شفرة المصدر الأخرى
وقت التحديث 2024-12-23
الحجم 31.33KB
من Github

تطبيقات ذات صلة

Lib.Net.Http.WebPush

2024-11-10
الخوف 3

2022-09-05
منشئ الكتلة

2022-08-29
حيلة

2022-08-20
كوما

2022-08-11
زار

2022-07-30

نوصي لك

chat.petals.dev

شفرة المصدر الأخرى

1.0.0
GPT Prompt Templates

شفرة المصدر الأخرى

1.0.0
GPTyped

شفرة المصدر الأخرى

GPTyped 1.0.5
waymo open dataset

شفرة المصدر الأخرى

December 2023 Update
SmartTube

شفرة المصدر الأخرى

24.71 Stable
Sunamu

شفرة المصدر الأخرى

Release 2.2.0
waymo open dataset

شفرة المصدر الأخرى

December 2023 Update
wp functions

فئات أخرى

1.0.0
termwind

فئات أخرى

v2.3.0

أخبار ذات صلة الكل