ดาวน์โหลด FlashRank.jl - ดาวน์โหลด FlashRank.jl ซอร์สโค้ด

FlashRank.jl

ซอร์สโค้ดอื่น ๆ

v0.4.1

ดาวน์โหลด

FlashRank.jl

FlashRank.jl ได้รับแรงบันดาลใจจากแพ็คเกจ FlashRank Python ที่ยอดเยี่ยม ซึ่งพัฒนาโดย Prithiviraj Damodaran แพ็คเกจนี้ใช้ประโยชน์จากน้ำหนักโมเดลจาก repo HF ของ Prithiviraj และ repo HF ของ Svilupp เพื่อมอบ วิธีที่รวดเร็วและมีประสิทธิภาพในการจัดอันดับเอกสารที่เกี่ยวข้องกับการสืบค้นที่กำหนดโดยไม่มี GPU และการอ้างอิงจำนวนมาก

สิ่งนี้ช่วยปรับปรุงไปป์ไลน์การดึงข้อมูล Augmented Generation (RAG) โดยการจัดลำดับความสำคัญของเอกสารที่เหมาะสมที่สุด รุ่นที่เล็กที่สุดสามารถใช้งานได้กับเครื่องเกือบทุกเครื่อง

คุณสมบัติ

สี่โมเดลการจัดอันดับ:
- เล็ก (~4MB, INT8): ms-marco-TinyBERT-L-2-v2 (ค่าเริ่มต้น) (นามแฝง :tiny )
- MiniLM L-4 (~70MB, FP32): ms-marco-MiniLM-L-4-v2 ONNX (นามแฝง :mini4 )
- MiniLM L-6 (~83.4MB, FP32): ms-marco-MiniLM-L-6-v2 ONNX (นามแฝง :mini6 )
- MiniLM L-12 (~23MB, INT8): ms-marco-MiniLM-L-12-v2 (นามแฝง :mini หรือ mini12 )
การพึ่งพาแบบน้ำหนักเบา หลีกเลี่ยงเฟรมเวิร์กที่หนักหน่วง เช่น Flux และ CUDA เพื่อความสะดวกในการรวมระบบ

มันเร็วแค่ไหน? ด้วยโมเดล Tiny คุณสามารถจัดอันดับเอกสาร 100 ฉบับได้ในเวลา ~0.1 วินาทีบนแล็ปท็อป ด้วยโมเดล MiniLM (12 เลเยอร์) คุณสามารถจัดอันดับเอกสาร 100 ฉบับได้ภายใน ~0.4 วินาที

เคล็ดลับ: เลือกรุ่นที่ใหญ่ที่สุดที่คุณสามารถซื้อได้ด้วยงบประมาณเวลาแฝงของคุณ เช่น MiniLM L-12 ช้าที่สุดแต่มีความแม่นยำที่สุด

โปรดทราบว่าเรากำลังใช้โมเดล BERT ที่มีขนาดก้อนสูงสุด 512 โทเค็น (สิ่งใดก็ตามที่เกินมาจะถูกตัดทอน)

การติดตั้ง

เพิ่มลงในสภาพแวดล้อมของคุณง่ายๆ ด้วย:

 using Pkg
Pkg . activate ( " . " )
Pkg . add ( " FlashRank " )

การใช้งาน

การจัดอันดับเอกสารของคุณสำหรับการสืบค้นที่กำหนดนั้นง่ายดายเพียง:

 ENV [ " DATADEPS_ALWAYS_ACCEPT " ] = " true "
using FlashRank

ranker = RankerModel () # Defaults to model = `:tiny`

query = " How to speedup LLMs? "
passages = [
        " Introduce *lookahead decoding*: - a parallel decoding algo to accelerate LLM inference - w/o the need for a draft model or a data store - linearly decreases # decoding steps relative to log(FLOPs) used per decoding step. " ,
        " LLM inference efficiency will be one of the most crucial topics for both industry and academia, simply because the more efficient you are, the more $$$ you will save. vllm project is a must-read for this direction, and now they have just released the paper " ,
        " There are many ways to increase LLM inference throughput (tokens/second) and decrease memory footprint, sometimes at the same time. Here are a few methods I’ve found effective when working with Llama 2. These methods are all well-integrated with Hugging Face. This list is far from exhaustive; some of these techniques can be used in combination with each other and there are plenty of others to try. - Bettertransformer (Optimum Library): Simply call `model.to_bettertransformer()` on your Hugging Face model for a modest improvement in tokens per second. - Fp4 Mixed-Precision (Bitsandbytes): Requires minimal configuration and dramatically reduces the model's memory footprint. - AutoGPTQ: Time-consuming but leads to a much smaller model and faster inference. The quantization is a one-time cost that pays off in the long run. " ,
        " Ever want to make your LLM inference go brrrrr but got stuck at implementing speculative decoding and finding the suitable draft model? No more pain! Thrilled to unveil Medusa, a simple framework that removes the annoying draft model while getting 2x speedup. " ,
        " vLLM is a fast and easy-to-use library for LLM inference and serving. vLLM is fast with: State-of-the-art serving throughput Efficient management of attention key and value memory with PagedAttention Continuous batching of incoming requests Optimized CUDA kernels " ,
];


result = rank (ranker, query, passages)

result เป็นประเภท RankResult และมีข้อความที่เรียงลำดับ คะแนน (0-1 โดยที่ 1 ดีที่สุด) และตำแหน่งของเอกสารที่เรียงลำดับ (อ้างอิงถึงเวกเตอร์ passages ต้นฉบับ)

ต่อไปนี้เป็นโครงร่างโดยย่อเกี่ยวกับวิธีที่คุณสามารถรวม FlashRank.jl เข้ากับไปป์ไลน์ PromptingTools.jl RAG ของคุณ

สำหรับตัวอย่างแบบเต็ม โปรดดู examples/prompting_tools_integration.jl

 using FlashRank
using PromptingTools
using PromptingTools . Experimental . RAGTools
const RT = PromptingTools . Experimental . RAGTools

# Wrap the model to be a valid Ranker recognized by RAGTools
# It will be provided to the airag/rerank function to avoid instantiating it on every call
struct FlashRanker <: RT.AbstractReranker
    model :: RankerModel
end
reranker = RankerModel ( :tiny ) |> FlashRanker

# Define the method for ranking with it
function RT . rerank (
        reranker :: FlashRanker , index :: RT.AbstractDocumentIndex , question :: AbstractString ,
        candidates :: RT.AbstractCandidateChunks ; kwargs ... )
    # # omitted for brevity
    # # See examples/prompting_tools_integration.jl for details
end

# # Apply to the pipeline configuration, eg, 
cfg = RAGConfig (; retriever = RT . AdvancedRetriever (; reranker))
# # assumes existing index
question = " Tell me about prehistoric animals "
result = airag (cfg, index; question, return_all = true )

การใช้งานขั้นสูง

คุณยังสามารถใช้ประโยชน์จากการฝังที่ค่อนข้าง "หยาบ" แต่รวดเร็วด้วยโมเดล tiny_embed (Bert-L4)

embedder = FlashRank . EmbedderModel ( :tiny_embed )

passages = [ " This is a test " , " This is another test " ]
result = FlashRank . embed (embedder, passages)

รับทราบ

FlashRank และ Transformers.jl มีความสำคัญในการพัฒนาแพ็คเกจนี้
ขอขอบคุณเป็นพิเศษสำหรับ Prithiviraj Damodaran สำหรับ FlashRank ดั้งเดิมและตุ้มน้ำหนักแบบจำลองเชิงปริมาณ INT8
และสำหรับ Transformers.jl สำหรับการใช้งาน Word Piece และโทเค็น BERT ซึ่งได้รับการแยกสำหรับแพ็คเกจนี้ (เพื่อลดการพึ่งพา)

แผนการทำงาน

จัดเตรียมส่วนขยายแพ็คเกจสำหรับ PromptingTools
นำรุ่นที่เล็กกว่านี้มาด้วย (เช่น Ber-L2-128D)
แนะนำการปรับตามความยาวอย่างง่ายๆ เพื่อฝังคะแนนความคล้ายคลึงกัน
อัปโหลดโมเดลแบบฝังอีกครั้งโดยมีการรวมกลุ่มตามมาสก์ (ไม่มีความแตกต่างที่แท้จริง เพียงถูกต้องตามทฤษฎี)

ขยาย

ข้อมูลเพิ่มเติม

เวอร์ชัน v0.4.1
ประเภท ซอร์สโค้ดอื่น ๆ
เวลาอัปเดต 2024-12-23
ขนาด 31.33KB
มาจาก Github

แอปที่เกี่ยวข้อง

Lib.Net.Http.WebPush

2024-11-10
ความกลัว 3

2022-09-05
ผู้สร้างมวล

2022-08-29
รูส

2022-08-20
โคมะ

2022-08-11
ซาร์

2022-07-30

แนะนำสำหรับคุณ

chat.petals.dev

ซอร์สโค้ดอื่น ๆ

1.0.0
GPT Prompt Templates

ซอร์สโค้ดอื่น ๆ

1.0.0
GPTyped

ซอร์สโค้ดอื่น ๆ

GPTyped 1.0.5
waymo open dataset

ซอร์สโค้ดอื่น ๆ

December 2023 Update
SmartTube

ซอร์สโค้ดอื่น ๆ

24.71 Stable
Sunamu

ซอร์สโค้ดอื่น ๆ

Release 2.2.0
waymo open dataset

ซอร์สโค้ดอื่น ๆ

December 2023 Update
wp functions

หมวดหมู่อื่นๆ

1.0.0
termwind

หมวดหมู่อื่นๆ

v2.3.0

ข้อมูลที่เกี่ยวข้อง ทั้งหมด