tokvizダウンロード - tokvizソースコードのダウンロード

tokviz

AI ソースコード

1.0.0

ダウンロード

tokviz

ファイル構造

tokviz/ ├── assets/ │ ├── example-deberta-v3-small.png │ └── example-gpt2.png ├── tokviz / │ ├── __init__.py │ └── visualization.py ├── README.md ├── LICENSE ├── setup.py └── pyproject.toml">

 tokviz /
├── assets/
│   ├── example-deberta-v3-small.png
│   └── example-gpt2.png
├── tokviz /
│   ├── __init__.py
│   └── visualization.py
├── README.md
├── LICENSE
├── setup.py
└── pyproject.toml

tokviz 、さまざまな言語モデルにわたるトークン化パターンを視覚化するための Python ライブラリです。このライブラリは、研究者、データサイエンティスト、NLP 愛好家が、さまざまな言語モデルがテキストをどのように処理してトークン化するかについての洞察を得るために、包括的なプラットフォームを提供します。

主な特徴:

モデルの比較:ビジュアライザーを使用すると、GPT-2、DistilGPT-2、DeBERTa-v3-small などの一般的なモデルを含む複数の言語モデルにわたるトークン化パターンを比較できます。色分けされたトークンを並べて表示することで、ユーザーはトークン化動作の相違点と類似点を簡単に識別できます。

柔軟な入力:ユーザーは任意のテキストを入力できるため、さまざまなテキスト入力にわたるトークン化パターンを動的に探索できます。短い文、段落、文書全体を分析する場合でも、ビジュアライザーはユーザーの入力に適応して包括的な分析を行います。

色分けされた視覚化:トークンはそのプロパティとインデックスに基づいて色分けされ、トークン化パターンを視覚的に直感的に表現できます。これにより、ユーザーはテキスト内の個々のトークンとパターンを迅速に識別できるようになり、より深い分析と解釈が容易になります。

インストール

pip 経由でtokvizインストールできます。

pip install tokviz

使用法

tokviz import token_visualizer # Define input text text = "In this example, the get_color function would need to be adjusted based on the specific properties of your model's tokenizer. You might want to inspect the special tokens, check if a token is part of a special group, or use any other relevant information provided by the tokenizer. Keep in mind that the color logic may vary depending on the model, so you need to tailor it to your specific use case." # Compare tokenization across different language models token_visualizer(text, models=['microsoft/deberta-v3-small', 'openai-community/gpt2'])">

 from tokviz import token_visualizer

# Define input text
text = "In this example, the get_color function would need to be adjusted based on the specific properties of your model's tokenizer. 
You might want to inspect the special tokens, check if a token is part of a special group, 
or use any other relevant information provided by the tokenizer. 
Keep in mind that the color logic may vary depending on the model, 
so you need to tailor it to your specific use case."

# Compare tokenization across different language models
token_visualizer ( text , models = [ 'microsoft/deberta-v3-small' , 'openai-community/gpt2' ])