ai21 tokenizer下載 - ai21 tokenizer原始碼下載

ai21 tokenizer

其他源碼

v0.12.0

下載

AI21 實驗室分詞器

基於 SentencePiece 的分詞器，用於 AI21 模型的生產使用

先決條件

如果您希望使用Jamba 1.5 Mini或Jamba 1.5 Large的標記器，您將需要請求存取相關模型的 HuggingFace 儲存庫：
- 堅巴1.5迷你
- 堅巴 1.5 大號

安裝

點

pip install ai21-tokenizer

詩

poetry add ai21-tokenizer

用法

分詞器創建

Jamba 1.5 迷你分詞器

 from ai21_tokenizer import Tokenizer , PreTrainedTokenizers

tokenizer = Tokenizer . get_tokenizer ( PreTrainedTokenizers . JAMBA_1_5_MINI_TOKENIZER )
# Your code here

另一種方法是直接使用我們的 Jamba 1.5 Mini tokenizer：

 from ai21_tokenizer import Jamba1_5Tokenizer

model_path = "<Path to your vocabs file>"
tokenizer = Jamba1_5Tokenizer ( model_path = model_path )
# Your code here

非同步使用

 from ai21_tokenizer import Tokenizer , PreTrainedTokenizers

tokenizer = await Tokenizer . get_async_tokenizer ( PreTrainedTokenizers . JAMBA_1_5_MINI_TOKENIZER )
# Your code here

Jamba 1.5 大型分詞器

 from ai21_tokenizer import Tokenizer , PreTrainedTokenizers

tokenizer = Tokenizer . get_tokenizer ( PreTrainedTokenizers . JAMBA_1_5_LARGE_TOKENIZER )
# Your code here

另一種方法是直接使用我們的 Jamba 1.5 Large tokenizer：

 from ai21_tokenizer import Jamba1_5Tokenizer

model_path = "<Path to your vocabs file>"
tokenizer = Jamba1_5Tokenizer ( model_path = model_path )
# Your code here

非同步使用

 from ai21_tokenizer import Tokenizer , PreTrainedTokenizers

tokenizer = await Tokenizer . get_async_tokenizer ( PreTrainedTokenizers . JAMBA_1_5_LARGE_TOKENIZER )
# Your code here

Jamba 指令分詞器

 from ai21_tokenizer import Tokenizer , PreTrainedTokenizers

tokenizer = Tokenizer . get_tokenizer ( PreTrainedTokenizers . JAMBA_INSTRUCT_TOKENIZER )
# Your code here

另一種方法是直接使用我們的 Jamba 標記器：

 from ai21_tokenizer import JambaInstructTokenizer

model_path = "<Path to your vocabs file>"
tokenizer = JambaInstructTokenizer ( model_path = model_path )
# Your code here

非同步使用

 from ai21_tokenizer import Tokenizer , PreTrainedTokenizers

tokenizer = await Tokenizer . get_async_tokenizer ( PreTrainedTokenizers . JAMBA_INSTRUCT_TOKENIZER )
# Your code here

另一種方法是使用我們的非同步 Jamba tokenizer 類別方法 create：

 from ai21_tokenizer import AsyncJambaInstructTokenizer

model_path = "<Path to your vocabs file>"
tokenizer = AsyncJambaInstructTokenizer . create ( model_path = model_path )
# Your code here

J2 分詞器

 from ai21_tokenizer import Tokenizer

tokenizer = Tokenizer . get_tokenizer ()
# Your code here

另一種方法是直接使用我們的侏羅紀模型：

 from ai21_tokenizer import JurassicTokenizer

model_path = "<Path to your vocabs file. This is usually a binary file that end with .model>"
config = {} # "dictionary object of your config.json file"
tokenizer = JurassicTokenizer ( model_path = model_path , config = config )

非同步使用

 from ai21_tokenizer import Tokenizer

tokenizer = await Tokenizer . get_async_tokenizer ()
# Your code here

另一種方法是使用我們的非同步 Jamba tokenizer 類別方法 create：

 from ai21_tokenizer import AsyncJurassicTokenizer

model_path = "<Path to your vocabs file. This is usually a binary file that end with .model>"
config = {} # "dictionary object of your config.json file"
tokenizer = AsyncJurassicTokenizer . create ( model_path = model_path , config = config )
# Your code here

功能

編碼和解碼

這些函數可讓您將文字編碼為令牌 ID 清單並傳回明文

 text_to_encode = "apple orange banana"
encoded_text = tokenizer . encode ( text_to_encode )
print ( f"Encoded text: { encoded_text } " )

decoded_text = tokenizer . decode ( encoded_text )
print ( f"Decoded text: { decoded_text } " )

非同步

 # Assuming you have created an async tokenizer
text_to_encode = "apple orange banana"
encoded_text = await tokenizer . encode ( text_to_encode )
print ( f"Encoded text: { encoded_text } " )

decoded_text = await tokenizer . decode ( encoded_text )
print ( f"Decoded text: { decoded_text } " )

如果您想將令牌轉換為 id 或反之亦然怎麼辦？

 tokens = tokenizer . convert_ids_to_tokens ( encoded_text )
print ( f"IDs corresponds to Tokens: { tokens } " )

ids = tokenizer . convert_tokens_to_ids ( tokens )

非同步

 # Assuming you have created an async tokenizer
tokens = await tokenizer . convert_ids_to_tokens ( encoded_text )
print ( f"IDs corresponds to Tokens: { tokens } " )

ids = tokenizer . convert_tokens_to_ids ( tokens )

有關更多範例，請參閱我們的範例資料夾。

展開

附加信息

版本 v0.12.0
類型其他源碼
更新時間 2024-12-05
大小 3.11MB
來自於 Github

相關應用

GitHub sgrebnov/cordova plugin background download

2024-11-05
Wa ch ull navra maza navsacha 2 2024 ull ovie Fr e Online On Strea ings

2024-11-03
Wa ch navra maza navsacha 2 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-03
Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-02
wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

2024-11-01
GitHub the via/releases

2024-11-01

爲您推薦

chat.petals.dev

其他源碼

1.0.0
GPT Prompt Templates

其他源碼

1.0.0
GPTyped

其他源碼

GPTyped 1.0.5
waymo open dataset

其他源碼

December 2023 Update
SmartTube

其他源碼

24.71 Stable
Sunamu

其他源碼

Release 2.2.0
waymo open dataset

其他源碼

December 2023 Update
wp functions

其他類別

1.0.0
termwind

其他類別

v2.3.0

相關資訊全部