Descarga ai21 tokenizer - Descarga del código fuente ai21 tokenizer

ai21 tokenizer

Otro código fuente

v0.12.0

Descargar

Tokenizador de laboratorios AI21

Un tokenizador basado en SentencePiece para usos de producción con los modelos AI21

Requisitos previos

Si desea utilizar los tokenizadores para Jamba 1.5 Mini o Jamba 1.5 Large , deberá solicitar acceso al repositorio HuggingFace del modelo correspondiente:
- Jamba 1.5 Mini
- Jamba 1.5 Grande

Instalación

pepita

pip install ai21-tokenizer

poesía

poetry add ai21-tokenizer

Uso

Creación de tokenizadores

Mini tokenizador Jamba 1.5

 from ai21_tokenizer import Tokenizer , PreTrainedTokenizers

tokenizer = Tokenizer . get_tokenizer ( PreTrainedTokenizers . JAMBA_1_5_MINI_TOKENIZER )
# Your code here

Otra forma sería utilizar nuestro tokenizador Jamba 1.5 Mini directamente:

 from ai21_tokenizer import Jamba1_5Tokenizer

model_path = "<Path to your vocabs file>"
tokenizer = Jamba1_5Tokenizer ( model_path = model_path )
# Your code here

Uso asíncrono

 from ai21_tokenizer import Tokenizer , PreTrainedTokenizers

tokenizer = await Tokenizer . get_async_tokenizer ( PreTrainedTokenizers . JAMBA_1_5_MINI_TOKENIZER )
# Your code here

Tokenizador grande Jamba 1.5

 from ai21_tokenizer import Tokenizer , PreTrainedTokenizers

tokenizer = Tokenizer . get_tokenizer ( PreTrainedTokenizers . JAMBA_1_5_LARGE_TOKENIZER )
# Your code here

Otra forma sería utilizar nuestro tokenizador Jamba 1.5 Large directamente:

 from ai21_tokenizer import Jamba1_5Tokenizer

model_path = "<Path to your vocabs file>"
tokenizer = Jamba1_5Tokenizer ( model_path = model_path )
# Your code here

Uso asíncrono

 from ai21_tokenizer import Tokenizer , PreTrainedTokenizers

tokenizer = await Tokenizer . get_async_tokenizer ( PreTrainedTokenizers . JAMBA_1_5_LARGE_TOKENIZER )
# Your code here

Tokenizador de instrucciones Jamba

 from ai21_tokenizer import Tokenizer , PreTrainedTokenizers

tokenizer = Tokenizer . get_tokenizer ( PreTrainedTokenizers . JAMBA_INSTRUCT_TOKENIZER )
# Your code here

Otra forma sería utilizar nuestro tokenizador Jamba directamente:

 from ai21_tokenizer import JambaInstructTokenizer

model_path = "<Path to your vocabs file>"
tokenizer = JambaInstructTokenizer ( model_path = model_path )
# Your code here

Uso asíncrono

 from ai21_tokenizer import Tokenizer , PreTrainedTokenizers

tokenizer = await Tokenizer . get_async_tokenizer ( PreTrainedTokenizers . JAMBA_INSTRUCT_TOKENIZER )
# Your code here

Otra forma sería utilizar nuestro método de clase de tokenizador Jamba asíncrono para crear:

 from ai21_tokenizer import AsyncJambaInstructTokenizer

model_path = "<Path to your vocabs file>"
tokenizer = AsyncJambaInstructTokenizer . create ( model_path = model_path )
# Your code here

Tokenizador J2

 from ai21_tokenizer import Tokenizer

tokenizer = Tokenizer . get_tokenizer ()
# Your code here

Otra forma sería utilizar nuestro modelo Jurásico directamente:

 from ai21_tokenizer import JurassicTokenizer

model_path = "<Path to your vocabs file. This is usually a binary file that end with .model>"
config = {} # "dictionary object of your config.json file"
tokenizer = JurassicTokenizer ( model_path = model_path , config = config )

Uso asíncrono

 from ai21_tokenizer import Tokenizer

tokenizer = await Tokenizer . get_async_tokenizer ()
# Your code here

Otra forma sería utilizar nuestro método de clase de tokenizador Jamba asíncrono para crear:

 from ai21_tokenizer import AsyncJurassicTokenizer

model_path = "<Path to your vocabs file. This is usually a binary file that end with .model>"
config = {} # "dictionary object of your config.json file"
tokenizer = AsyncJurassicTokenizer . create ( model_path = model_path , config = config )
# Your code here

Funciones

Codificar y decodificar

Estas funciones le permiten codificar su texto en una lista de identificadores de token y volver a texto sin formato.

 text_to_encode = "apple orange banana"
encoded_text = tokenizer . encode ( text_to_encode )
print ( f"Encoded text: { encoded_text } " )

decoded_text = tokenizer . decode ( encoded_text )
print ( f"Decoded text: { decoded_text } " )

asíncrono

 # Assuming you have created an async tokenizer
text_to_encode = "apple orange banana"
encoded_text = await tokenizer . encode ( text_to_encode )
print ( f"Encoded text: { encoded_text } " )

decoded_text = await tokenizer . decode ( encoded_text )
print ( f"Decoded text: { decoded_text } " )

¿Qué pasaría si hubieras querido convertir tus tokens en identificadores o viceversa?

 tokens = tokenizer . convert_ids_to_tokens ( encoded_text )
print ( f"IDs corresponds to Tokens: { tokens } " )

ids = tokenizer . convert_tokens_to_ids ( tokens )

asíncrono

 # Assuming you have created an async tokenizer
tokens = await tokenizer . convert_ids_to_tokens ( encoded_text )
print ( f"IDs corresponds to Tokens: { tokens } " )

ids = tokenizer . convert_tokens_to_ids ( tokens )

Para obtener más ejemplos, consulte nuestra carpeta de ejemplos.

Expandir

Información adicional

Versión v0.12.0
Tipo Otro código fuente
Fecha de actualización 2024-12-05
tamaño 3.11MB
Proviene de Github

Aplicaciones relacionadas

GitHub sgrebnov/cordova plugin background download

2024-11-05
Wa ch ull navra maza navsacha 2 2024 ull ovie Fr e Online On Strea ings

2024-11-03
Wa ch navra maza navsacha 2 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-03
Wa ch the greatest of all time 2024 ull ovie Online For Fr e Strea ings At Home

2024-11-02
wolfs 2024 f llmo ie f lmyz lla dow load ree 7 0p 4 0p a d 10 0p

2024-11-01
GitHub the via/releases

2024-11-01

Recomendado para ti

chat.petals.dev

Otro código fuente

1.0.0
GPT Prompt Templates

Otro código fuente

1.0.0
GPTyped

Otro código fuente

GPTyped 1.0.5
waymo open dataset

Otro código fuente

December 2023 Update
SmartTube

Otro código fuente

24.71 Stable
Sunamu

Otro código fuente

Release 2.2.0
waymo open dataset

Otro código fuente

December 2023 Update
wp functions

Otras categorias

1.0.0
termwind

Otras categorias

v2.3.0

Información relacionada Todo