pecoreダウンロード - pecoreソースコードのダウンロード

ニューラル機械翻訳におけるコンテキスト依存性の妥当性の定量化

ガブリエレ・サルティ、グジェゴシュ・チュルパワ、マルヴィーナ・ニッシム、アリアナ・ビサッツァ

<スパンクラス= pecore 2 段階プロセス" width="300" style="max-width: 100%;"> ペコアの例" width="500" style="max-width: 100%;">

要約:言語モデルが人間がもっともらしい方法でコンテキスト情報を使用できるかどうかを確立することは、現実世界の設定で安全に採用されることを保証するために重要です。ただし、コンテキストのどの部分がモデル生成にいつ影響するかという問題は通常、個別に扱われ、現在の妥当性評価は事実上、少数の人為的なベンチマークに限定されています。これに対処するために、言語モデルの世代におけるコンテキストの使用を定量化するために設計されたエンドツーエンドの解釈可能性フレームワークである、コンテキスト依存性の妥当性評価 ( pecore ) を導入します。私たちのアプローチは、モデルの内部構造を活用して、(i) 生成されたテキスト内の文脈依存のターゲットトークンを対照的に識別し、(ii) 予測を正当化する文脈上の手がかりにそれらをリンクします。私たちはpecore使用して、いくつかの談話レベルの現象にわたってモデルの理論的根拠と人間による注釈を比較し、コンテキスト認識型機械翻訳モデルの妥当性を定量化します。最後に、この方法をアノテーションのない世代に適用して、コンテキスト媒介の予測を特定し、モデル変換における（ありえない）コンテキストの使用例を強調します。

このリポジトリには、論文「ニューラル機械翻訳におけるコンテキスト依存性の妥当性の定量化」に関連するスクリプトとノートブックが含まれています。以下のコンテンツを作品に使用する場合は、論文を引用していただくようお願いいたします。

 @inproceedings { sarti-etal-2023-quantifying ,
    title = " Quantifying the Plausibility of Context Reliance in Neural Machine Translation " ,
    author = " Sarti, Gabriele and 
        Chrupa{l}a, Grzegorz and 
        Nissim, Malvina and
        Bisazza, Arianna " ,
    booktitle = " The Twelfth International Conference on Learning Representations (ICLR 2024) " ,
    month = may,
    year = " 2024 " ,
    address = " Vienna, Austria " ,
    publisher = " OpenReview " ,
    url = " https://openreview.net/forum?id=XTHfNGI3zT "
}

pecoreの使用

ヒント

Hugging Face Spaces のオンラインデモからpecore試すことができます。

このリポジトリは前述の論文の実験的評価で使用された関数を実装していますが、Inseq 解釈可能ライブラリを通じてpecoreの新しい CLI 実装を提供しています。この実装はより堅牢で一般化可能であり、入出力コンテキスト依存性の検出と属性のために Huggingface ライブラリのすべてのデコーダのみおよびエンコーダ-デコーダモデルをサポートしているため、研究者にはその実装を採用することを強くお勧めします。詳細については、Inseq README のinseq attribute-contextセクションを参照してください。

アーティファクト

微調整されたモデルやトレーニング/評価データセットを含む論文のすべてのアーティファクトは、 pecore HuggingFace コレクションで利用できます。デモは近々公開される予定ですので、お楽しみに！

コンテキスト認識型 NMT モデルをトレーニングする

コンテキスト認識 NMT モデルは、 train_context_aware_mt_model.pyスクリプトを使用してトレーニングされます。このスクリプトは、元のrun_translation_no_trainer.pyを変更したものです。このスクリプトは、コンテキストモデルのトレーニング用に次のフィールドを追加します。

context_size : トレーニングに使用するコンテキスト文の数。デフォルト値は 0 (文レベルのトレーニング) です。
sample_context : 設定されている場合、すべての例のコンテキストのサイズは、0 からcontext_size (両端を含む) までの一様な分布からサンプリングされます。渡されず、 context_sizeが 0 より大きい場合、コンテキストサイズは常にcontext_sizeと等しくなります。
context_word_dropout : コンテキストから単語が削除される確率 (0 ～ 1)。デフォルト値は 0 (ドロップアウトなし) です。
use_target_context : 設定されている場合、コンテキストはトレーニング損失の翻訳されたテキストにも含まれます。その場合、入力src_ctx <brk> srcの出力形式はtgt_ctx <brk> tgtなります。それ以外の場合、出力形式はtgtです ( srcのみが変換されます)。

使用例

以下は、最大 4 つのコンテキスト文と 10% のコンテキスト単語ドロップアウトを含むコンテキスト拡張 IWSLT17 データセットで mBART 1 ～ 50 モデルを微調整する例です。

accelerate launch scripts/train_context_aware_mt_model.py 
    --model_name_or_path facebook/mbart-large-50-one-to-many-mmt 
    --source_lang en_XX 
    --target_lang fr_XX 
    --dataset_name gsarti/iwslt2017_context 
    --dataset_config_name iwslt2017-en-fr 
    --output_dir outputs/models/iwslt17-mbart50-1toM-ctx4-cwd1-en-fr 
    --num_beams 5 
    --max_source_length 512 
    --max_target_length 128 
    --num_train_epochs 20 
    --gradient_accumulation_steps 4 
    --per_device_train_batch_size 8 
    --num_warmup_steps 500 
    --learning_rate 3e-4 
    --checkpointing_steps epoch 
    --with_tracking 
    --report_to tensorboard 
    --context_size 4 
    --sample_context 
    --context_word_dropout 0.1

以下は、最大 4 つのコンテキストセンテンスと 10% のコンテキストワードドロップアウトを使用して、SCAT のトレーニング部分でコンテキスト認識 En->Fr OpusMT モデルの微調整を継続する例です。

accelerate launch scripts/train_context_aware_mt_model.py 
    --model_name_or_path context-mt/iwslt17-marian-big-ctx4-cwd1-en-fr 
    --dataset_name inseq/scat 
    --dataset_config_name sentences 
    --output_dir outputs/models/scat-marian-big-ctx4-cwd1-en-fr 
    --num_beams 5 
    --max_source_length 512 
    --max_target_length 128 
    --num_train_epochs 2 
    --gradient_accumulation_steps 2 
    --per_device_train_batch_size 8 
    --num_warmup_steps 0 
    --learning_rate 5e-5 
    --checkpointing_steps 1000 
    --logging_steps 200 
    --with_tracking 
    --report_to tensorboard 
    --context_size 4 
    --sample_context 
    --context_word_dropout 0.1

pecore CLI の使用

pecore CLI は、特定のモデルとデータセットに対してpecoreステップを実行するためのコマンドラインインターフェイスです。 CLI はpecore /cli.pyスクリプトに実装されており、 pip install -e でパッケージをインストールするときにpecore -vizとして使用できますpip install -e . 。現在の実装は、言語プレフィックスタグ (mBART-50、NLLB、M2M100) を持つモデルを含む、Inseq フレームワークによってサポートされるすべてのエンコーダー/デコーダーモデルのコンテキスト依存ターゲット (CTI) の識別とコンテキストキューの代入 (CCI) をサポートしています。特別なコンテキストタグを使用してトレーニングされたモデル (HF ハブの context-mt 組織にあるモデルのコレクションなど)。次のように、CLI を使用して、特定のモデルおよび例でpecoreステップを実行できます。

pecore-viz --model_name context-mt/scat-marian-small-ctx4-cwd1-en-fr --attributions_aggregate_fns sum --model_use_ctx_break --impute_with_contextless_output --force_context_aware_output_prefix --input "Did I mention we stole a cow? A beautiful animal, truly. We brought it to the stable and kept it there for ages.<brk> Sadly, we could not foresee it would disappear."">

 pecore -viz 
    --model_name context-mt/scat-marian-small-ctx4-cwd1-en-fr 
    --attributions_aggregate_fns sum 
    --model_use_ctx_break 
    --impute_with_contextless_output 
    --force_context_aware_output_prefix 
    --input " Did I mention we stole a cow? A beautiful animal, truly. We brought it to the stable and kept it there for ages.<brk> Sadly, we could not foresee it would disappear. "

上記の例では次の出力が生成され、コンテキスト内の名詞「cow」および「animal」に対する代名詞「il」の依存関係が正しく強調表示されます。

Context with contextual cues (std λ=1.00) followed by output sentence
with context-sensitive target spans (std λ=1.00):

Input context:  Did I mention we stole a cow ? A beautiful animal, truly. We brought it to the stable and kept it there for ages.
Input current:  Sadly, we could not foresee it would disappear.
Context-aware output:   Malheureusement, nous n ' avons pas pu prévoir qu ' il disparaîtrait.
Using ' <brk> ' to separate context and current inputs.

# 1. (CTI |kl_divergence| > 0.14, CCI |saliency| > 0.71)
Contextless output:     Malheureusement, nous n ' avons pas pu prévoir qu ' il disparaîtrait.
Current output:  Malheureusement, nous n ' avons pas pu prévoir qu ' il(0.412) disparaîtrait.
Input context:   Did I mention we stole a cow(1.524) ? A beautiful animal(1.472), truly. We brought it to the stable and kept it 
there for ages.

CLI を使用して通常のモデルを実行する場合、出力がユーザーによって強制されない場合は、モデルの生成におけるコンテキストブレークの位置を指定する追加の手順が必要になります。以下は、HF ハブの通常の mBART-50 モデルを使用した例です。

pecore-viz --model_name facebook/mbart-large-50-one-to-many-mmt --input_lang eng --output_lang fra --model_type mbart50-1toM --impute_with_contextless_output --force_context_aware_output_prefix --input "Did I mention we stole a cow? A beautiful animal, truly. We brought it to the stable and kept it there for ages.<brk> Sadly, we could not foresee it would disappear."">

 pecore -viz 
    --model_name facebook/mbart-large-50-one-to-many-mmt 
    --input_lang eng --output_lang fra --model_type mbart50-1toM 
    --impute_with_contextless_output 
    --force_context_aware_output_prefix 
    --input " Did I mention we stole a cow? A beautiful animal, truly. We brought it to the stable and kept it there for ages.<brk> Sadly, we could not foresee it would disappear. "

ユーザーには次のメッセージが表示されます。

The following output was generate by the model: J’ai mentionné que nous avons volé une vache, c’est vraiment un beau animal, que nous avons emmené à l’élevage et que nous l’avons gardée pendant des époques. Malheureusement, nous n’avons pas pu prévoir qu’elle disparaîtrait.
Rewrite it here by adding ' <brk> ' wherever appropriate to mark context break:

ユーザーは、適切な場所に<brk>を追加してコンテキストブレークをマークすることにより、出力を書き換えることができます。

J’ai mentionné que nous avons volé une vache, c’est vraiment un beau animal, que nous avons emmené à l’élevage et que nous l’avons gardée pendant des époques. < brk > Malheureusement, nous n’avons pas pu prévoir qu’elle disparaîtrait.

最終的な出力は次のようになります。

Context with contextual cues (std λ=1.00) followed by output sentence
with context-sensitive target spans (std λ=1.00):

Input context:  Did I mention we stole a cow ? A beautiful animal, truly. We brought it to the stable and kept it there for ages.
Input current:  Sadly, we could not foresee it would disappear.
Output context: J’ai mentionné que nous avons volé une vache, c’est vraiment un beau animal, que nous avons emmené à l’élevage et que nous l’avons gardée pendant 
des époques.
Context-aware output:   J’ai mentionné que nous avons volé une vache, c’est vraiment un beau animal, que nous avons emmené à l’élevage et que nous l’avons gardée 
pendant des époques. Malheureusement, nous n’avons pas pu prévoir qu’elle disparaîtrait.
Using language tags for model type ' mbart50-1toM ' (eng - > fra).

# 1. (CTI |kl_divergence| > 1.08, CCI |saliency| > 0.00)
Contextless output:     Malheureusement, nous n ' avons pas pu prévoir sa disparition.
Current output:  Malheureusement, nous n’(3.505)avons pas pu prévoir qu’elle disparaîtrait.
Input context:   Did I mention we stole a cow? A beautiful animal, truly. We brought it to the stable(0.002) and kept it there for ages.
Output context:  J’(0.004)ai mentionné que nous avons volé une vache, c’(0.002)est vraiment un beau animal, que nous avons emmené à l’(0.003)élevage et que nous 
l’(0.007)avons gardée pendant des époques.

この場合、モデルは出力コンテキストスタイルに固執するために、コンテキストのない出力でデフォルトで使用される' 'アポストロフィを生成することを選択し、その文字を何度か使用します ( pecoreによってコンテキストキューとして識別されます)。。

アトリビューション方法のカスタマイズ

この例では、コンテキスト依存性を明らかにするために、レイヤー 5 の頭部 8 の注意の重みを使用します。この頭は人間の直観とよく一致することが経験的に判明しました。

pecore-viz --model_name context-mt/scat-marian-small-ctx4-cwd1-en-fr --attributions_aggregate_fns mean mean --model_use_ctx_break --impute_with_contextless_output --force_context_aware_output_prefix --input "Did I mention we stole a cow? A beautiful animal, truly. We brought it to the stable and kept it there for ages.<brk> Sadly, we could not foresee it would disappear." --attribution_method attention --select_attributions_idx 7 4">

 pecore -viz 
    --model_name context-mt/scat-marian-small-ctx4-cwd1-en-fr 
    --attributions_aggregate_fns mean mean 
    --model_use_ctx_break 
    --impute_with_contextless_output 
    --force_context_aware_output_prefix 
    --input " Did I mention we stole a cow? A beautiful animal, truly. We brought it to the stable and kept it there for ages.<brk> Sadly, we could not foresee it would disappear. " 
    --attribution_method attention 
    --select_attributions_idx 7 4

論文の結果を再現する

コンテキスト認識 NMT モデルを使用した翻訳

python scripts/translate.py 
    --model_type mbart50-1toM 
    --model_id mbart50-1toM-scat 
    --model_name context-mt/scat-mbart50-1toM-ctx4-cwd1-en-fr 
    --context_size 4  
    --dataset scat 
    --context_word_dropout 1

python scripts/translate.py 
    --model_type marian-big 
    --model_id marian-big-scat-target 
    --model_name context-mt/scat-marian-big-target-ctx4-cwd0-en-fr 
    --context_size 4 
    --dataset disc_eval_mt 
    --context_word_dropout 0 
    --dataset_config anaphora

python scripts/translate.py 
    --model_type marian-big 
    --model_id marian-big-scat-target 
    --model_name context-mt/scat-marian-big-target-ctx4-cwd0-en-fr 
    --context_size 4 
    --dataset disc_eval_mt 
    --context_word_dropout 0 
    --dataset_config lexical-choice

python scripts/translate.py 
    --model_type marian-big 
    --model_id marian-big-scat 
    --model_name context-mt/scat-marian-big-ctx4-cwd1-en-fr 
    --context_size 4 
    --dataset disc_eval_mt 
    --context_word_dropout 1 
    --dataset_config anaphora

python scripts/translate.py 
    --model_type marian-big 
    --model_id marian-big-scat 
    --model_name context-mt/scat-marian-big-ctx4-cwd1-en-fr 
    --context_size 4 
    --dataset disc_eval_mt 
    --context_word_dropout 1 
    --dataset_config lexical-choice

python scripts/translate.py 
    --model_type mbart50-1toM 
    --model_id mbart50-1toM-scat 
    --model_name context-mt/scat-mbart50-1toM-ctx4-cwd1-en-fr 
    --context_size 0 
    --dataset disc_eval_mt 
    --context_word_dropout 0 
    --dataset_config lexical-choice

コンテキスト認識型 NMT モデルを評価する

python scripts/evaluate_mt_outputs.py 
    --filepath outputs/translations/ctx/scat-marian-small-scat-target.txt 
    --model_id marian-small-scat-target 
    --dataset scat 
    --src_lang eng 
    --tgt_lang fra 
    --metrics bleu comet accuracy flip 
    --has_target_context 
    --max_idx 250

python scripts/evaluate_mt_outputs.py 
    --filepath outputs/translations/ctx/disc_eval_mt-anaphora-marian-small-scat-target.txt 
    --model_id marian-small-scat-target 
    --dataset disc_eval_mt 
    --src_lang eng 
    --tgt_lang fra 
    --metrics bleu comet accuracy flip 
    --has_target_context 
    --max_idx 250

python scripts/evaluate_mt_outputs.py 
    --filepath outputs/translations/ctx/scat-mbart50-1toM-scat.txt 
    --model_id mbart50-1toM-scat 
    --dataset scat 
    --src_lang eng 
    --tgt_lang fra 
    --metrics bleu comet accuracy

python scripts/evaluate_mt_outputs.py 
    --filepath outputs/translations/ctx/scat-mbart50-1toM-scat.txt 
    --model_id mbart50-1toM-scat 
    --dataset scat 
    --src_lang eng 
    --tgt_lang fra 
    --metrics comet accuracy

pecoreステップを実行するためのサンプルを作成する

python scripts/generate_examples.py 
    --dataset scat 
    --model_name context-mt/scat-marian-small-target-ctx4-cwd0-en-fr 
    --src_lang eng 
    --tgt_lang fra 
    --model_id marian-small-scat-target 
    --model_type marian-small 
    --has_context 
    --has_contrast 
    --has_target_context

python scripts/generate_examples.py 
    --dataset scat 
    --model_name context-mt/scat-mbart50-1toM-target-ctx4-cwd0-en-fr 
    --src_lang eng 
    --tgt_lang fra 
    --model_id mbart50-1toM-scat-target 
    --model_type mbart50-1toM 
    --has_context 
    --has_target_context 
    --has_contrast

python scripts/generate_examples.py 
    --dataset disc_eval_mt 
    --dataset_config anaphora 
    --model_name context-mt/scat-marian-small-ctx4-cwd1-en-fr 
    --src_lang eng 
    --tgt_lang fra 
    --model_id marian-small-scat 
    --model_type marian-small 
    --has_context 
    --has_contrast

python scripts/generate_examples.py 
    --dataset scat 
    --model_name Helsinki-NLP/opus-mt-en-fr 
    --src_lang eng 
    --tgt_lang fra 
    --model_id marian-small 
    --model_type marian-small 
    --has_contrast

pecoreステップ 1: コンテキスト依存ターゲット識別 (CTI)

python scripts/tag_cti_metrics.py 
    --examples_path outputs/processed_examples/scat-marian-small-scat.tsv 
    --model_name context-mt/scat-marian-small-ctx4-cwd1-en-fr 
    --model_type marian-small

python scripts/tag_cti_metrics.py 
    --examples_path outputs/processed_examples/scat-marian-big-scat.tsv 
    --model_name context-mt/scat-marian-big-ctx4-cwd1-en-fr 
    --model_type marian-big

python scripts/tag_cti_metrics.py 
    --examples_path outputs/processed_examples/scat-mbart50-1toM-scat.tsv 
    --model_name context-mt/scat-mbart50-1toM-ctx4-cwd1-en-fr 
    --model_type mbart50-1toM

python scripts/tag_cti_metrics.py 
    --examples_path outputs/processed_examples/scat-marian-small-scat-target.tsv 
    --model_name context-mt/scat-marian-small-target-ctx4-cwd0-en-fr 
    --model_type marian-small

python scripts/tag_cti_metrics.py 
    --examples_path outputs/processed_examples/scat-marian-big-scat-target.tsv 
    --model_name context-mt/scat-marian-big-target-ctx4-cwd0-en-fr 
    --model_type marian-big

python scripts/tag_cti_metrics.py 
    --examples_path outputs/processed_examples/scat-mbart50-1toM-scat-target.tsv 
    --model_name context-mt/scat-mbart50-1toM-target-ctx4-cwd0-en-fr 
    --model_type mbart50-1toM

pecoreステップ 2: コンテキストキュー補完 (CCI)

python scripts/tag_cci_metrics.py 
    --examples_path outputs/processed_examples/scat-marian-small-scat.tsv 
    --model_name context-mt/scat-marian-small-ctx4-cwd1-en-fr 
    --model_type marian-small

python scripts/tag_cci_metrics.py 
    --examples_path outputs/processed_examples/scat-mbart50-1toM-scat-target.tsv 
    --model_name context-mt/scat-mbart50-1toM-target-ctx4-cwd0-en-fr 
    --model_type mbart50-1toM

pecoreメトリクスを評価する

python scripts/evaluate_tagged_metrics.py 
    --scores_path outputs/scores/scat-marian-small-scat-cti.tsv 
    --eval_mode cti 
    --use_trained_model

python scripts/evaluate_tagged_metrics.py 
    --scores_path outputs/scores/scat-marian-small-scat-cti.tsv 
    --eval_mode cti 
    --average_example_scores 
    --metrics random pcxmi kl_divergence 
    --save_preds

python scripts/evaluate_tagged_metrics.py 
    --scores_path outputs/scores/scat-marian-small-scat-cci.tsv 
    --eval_mode cci 
    --example_target_column is_supporting_context 
    --average_example_scores 
    --metrics random saliency_contrast_prob_diff attention_default attention_best

python scripts/evaluate_tagged_metrics.py 
    --scores_path outputs/scores/scat-marian-small-scat-target-cti.tsv 
    --eval_mode cti 
    --average_example_scores 
    --metrics random pcxmi kl_divergence 
    --save_preds

拡大する