Загрузка pecore - Загрузка исходного кода pecore

Количественная оценка правдоподобия зависимости от контекста при нейронном машинном переводе

Габриэле Сарти • Гжегож Хрупала • Мальвина Ниссим • Арианна Бисацца

<класс диапазона= двухэтапный процесс pecore" width="300" style="max-width: 100%;"> примеры pecore" width="500" style="max-width: 100%;">

Аннотация: Установление того, могут ли языковые модели использовать контекстную информацию понятным для человека способом, важно для обеспечения их безопасного внедрения в реальных условиях. Однако вопросы о том, когда и какие части контекста влияют на создание моделей, обычно решаются отдельно, а текущие оценки правдоподобия практически ограничиваются несколькими искусственными критериями. Чтобы решить эту проблему, мы представляем оценку достоверности контекстной зависимости ( pecore ), комплексную структуру интерпретируемости, предназначенную для количественной оценки использования контекста в поколениях языковых моделей. Наш подход использует внутренние возможности модели, чтобы (i) контрастно идентифицировать контекстно-зависимые целевые токены в сгенерированных текстах и (ii) связать их с контекстными сигналами, подтверждающими их прогноз. Мы используем pecore для количественной оценки правдоподобия моделей контекстно-зависимого машинного перевода, сравнивая обоснования моделей с человеческими аннотациями в отношении нескольких явлений на уровне дискурса. Наконец, мы применяем наш метод к неаннотированным поколениям, чтобы идентифицировать контекстно-опосредованные прогнозы и выделять случаи (не)правдоподобного использования контекста в переводах моделей.

Этот репозиторий содержит сценарии и блокноты, связанные с статьей «Количественная оценка достоверности контекстной зависимости в нейронном машинном переводе». Если вы используете что-либо из следующего в своей работе, мы просим вас цитировать нашу статью:

 @inproceedings { sarti-etal-2023-quantifying ,
    title = " Quantifying the Plausibility of Context Reliance in Neural Machine Translation " ,
    author = " Sarti, Gabriele and 
        Chrupa{l}a, Grzegorz and 
        Nissim, Malvina and
        Bisazza, Arianna " ,
    booktitle = " The Twelfth International Conference on Learning Representations (ICLR 2024) " ,
    month = may,
    year = " 2024 " ,
    address = " Vienna, Austria " ,
    publisher = " OpenReview " ,
    url = " https://openreview.net/forum?id=XTHfNGI3zT "
}

Использование pecore

Кончик

Вы можете попробовать pecore из нашей онлайн-демоверсии Hugging Face Spaces.

Хотя этот репозиторий реализует функции, использованные в экспериментальной оценке вышеупомянутой статьи, мы предоставляем новую реализацию pecore с интерфейсом командной строки через библиотеку интерпретируемости Inseq. Мы настоятельно советуем исследователям принять эту реализацию, поскольку она более надежна и обобщаема, поддерживая все модели «только декодер» и «кодировщик-декодер» из библиотеки Huggingface для обнаружения и атрибуции контекстных зависимостей ввода и вывода. Дополнительные сведения см. в разделе inseq attribute-context в файле README Inseq.

Артефакты

Все артефакты для статьи, включая точно настроенные модели и наборы данных для обучения/оценки, доступны в коллекции pecore HuggingFace Collection. Скоро будет доступна демо-версия, следите за обновлениями!

Обучение контекстно-зависимой модели NMT

Контекстно-зависимые модели NMT обучаются с помощью сценария train_context_aware_mt_model.py . Скрипт является модификацией оригинального run_translation_no_trainer.py . Скрипт добавляет следующие поля для обучения контекстной модели:

context_size : количество контекстных предложений, используемых для обучения. Значение по умолчанию — 0 (обучение на уровне предложения).
sample_context : Если установлено, размер контекста для каждого примера выбирается из равномерного распределения между 0 и context_size (включительно). Если он не передан и context_size больше 0, размер контекста всегда равен context_size .
context_word_dropout : вероятность от 0 до 1 выпадения слова из контекста. Значение по умолчанию — 0 (без исключения).
use_target_context : если установлено, контекст также включается в переведенный текст для потери обучения. В этом случае выходной формат для входных данных src_ctx <brk> src становится tgt_ctx <brk> tgt . В противном случае формат вывода — tgt (переводится только src ).

Пример использования

Вот пример точной настройки модели mBART от 1 до 50 на наборе данных IWSLT17 с дополненным контекстом, содержащим до 4 контекстных предложений и 10% пропуском контекстных слов:

accelerate launch scripts/train_context_aware_mt_model.py 
    --model_name_or_path facebook/mbart-large-50-one-to-many-mmt 
    --source_lang en_XX 
    --target_lang fr_XX 
    --dataset_name gsarti/iwslt2017_context 
    --dataset_config_name iwslt2017-en-fr 
    --output_dir outputs/models/iwslt17-mbart50-1toM-ctx4-cwd1-en-fr 
    --num_beams 5 
    --max_source_length 512 
    --max_target_length 128 
    --num_train_epochs 20 
    --gradient_accumulation_steps 4 
    --per_device_train_batch_size 8 
    --num_warmup_steps 500 
    --learning_rate 3e-4 
    --checkpointing_steps epoch 
    --with_tracking 
    --report_to tensorboard 
    --context_size 4 
    --sample_context 
    --context_word_dropout 0.1

Вот пример продолжения тонкой настройки контекстно-зависимой модели En->Fr OpusMT на обучающей части SCAT с использованием до 4 контекстных предложений и 10%-м пропуском контекстных слов:

accelerate launch scripts/train_context_aware_mt_model.py 
    --model_name_or_path context-mt/iwslt17-marian-big-ctx4-cwd1-en-fr 
    --dataset_name inseq/scat 
    --dataset_config_name sentences 
    --output_dir outputs/models/scat-marian-big-ctx4-cwd1-en-fr 
    --num_beams 5 
    --max_source_length 512 
    --max_target_length 128 
    --num_train_epochs 2 
    --gradient_accumulation_steps 2 
    --per_device_train_batch_size 8 
    --num_warmup_steps 0 
    --learning_rate 5e-5 
    --checkpointing_steps 1000 
    --logging_steps 200 
    --with_tracking 
    --report_to tensorboard 
    --context_size 4 
    --sample_context 
    --context_word_dropout 0.1

Использование интерфейса командной строки pecore

pecore CLI — это интерфейс командной строки для выполнения шагов pecore для заданной модели и набора данных. CLI реализован в сценарии pecore /cli.py и может использоваться как pecore -viz после установки пакета с помощью pip install -e . . Текущая реализация поддерживает идентификацию контекстно-зависимых целей (CTI) и вменение контекстных сигналов (CCI) для всех моделей кодировщика-декодера, поддерживаемых платформой Inseq, включая модели с тегами языковых префиксов (mBART-50, NLLB, M2M100). и модели, обученные с помощью специальных тегов контекста (например, коллекция моделей, найденная в организации context-mt в HF Hub). Интерфейс командной строки можно использовать для выполнения шагов pecore для данной модели и примера следующим образом:

pecore-viz --model_name context-mt/scat-marian-small-ctx4-cwd1-en-fr --attributions_aggregate_fns sum --model_use_ctx_break --impute_with_contextless_output --force_context_aware_output_prefix --input "Did I mention we stole a cow? A beautiful animal, truly. We brought it to the stable and kept it there for ages.<brk> Sadly, we could not foresee it would disappear."">

 pecore -viz 
    --model_name context-mt/scat-marian-small-ctx4-cwd1-en-fr 
    --attributions_aggregate_fns sum 
    --model_use_ctx_break 
    --impute_with_contextless_output 
    --force_context_aware_output_prefix 
    --input " Did I mention we stole a cow? A beautiful animal, truly. We brought it to the stable and kept it there for ages.<brk> Sadly, we could not foresee it would disappear. "

В приведенном выше примере выводится следующий результат, правильно подчеркивающий зависимость местоимения «il» от существительных «корова» и «животное» в контексте.

Context with contextual cues (std λ=1.00) followed by output sentence
with context-sensitive target spans (std λ=1.00):

Input context:  Did I mention we stole a cow ? A beautiful animal, truly. We brought it to the stable and kept it there for ages.
Input current:  Sadly, we could not foresee it would disappear.
Context-aware output:   Malheureusement, nous n ' avons pas pu prévoir qu ' il disparaîtrait.
Using ' <brk> ' to separate context and current inputs.

# 1. (CTI |kl_divergence| > 0.14, CCI |saliency| > 0.71)
Contextless output:     Malheureusement, nous n ' avons pas pu prévoir qu ' il disparaîtrait.
Current output:  Malheureusement, nous n ' avons pas pu prévoir qu ' il(0.412) disparaîtrait.
Input context:   Did I mention we stole a cow(1.524) ? A beautiful animal(1.472), truly. We brought it to the stable and kept it 
there for ages.

При использовании CLI для запуска обычной модели потребуется дополнительный шаг для указания положения разрыва контекста при создании модели, если вывод не задается пользователем. Вот пример использования штатной модели mBART-50 от HF Hub:

pecore-viz --model_name facebook/mbart-large-50-one-to-many-mmt --input_lang eng --output_lang fra --model_type mbart50-1toM --impute_with_contextless_output --force_context_aware_output_prefix --input "Did I mention we stole a cow? A beautiful animal, truly. We brought it to the stable and kept it there for ages.<brk> Sadly, we could not foresee it would disappear."">

 pecore -viz 
    --model_name facebook/mbart-large-50-one-to-many-mmt 
    --input_lang eng --output_lang fra --model_type mbart50-1toM 
    --impute_with_contextless_output 
    --force_context_aware_output_prefix 
    --input " Did I mention we stole a cow? A beautiful animal, truly. We brought it to the stable and kept it there for ages.<brk> Sadly, we could not foresee it would disappear. "

Пользователю будет предложено следующее сообщение:

The following output was generate by the model: J’ai mentionné que nous avons volé une vache, c’est vraiment un beau animal, que nous avons emmené à l’élevage et que nous l’avons gardée pendant des époques. Malheureusement, nous n’avons pas pu prévoir qu’elle disparaîtrait.
Rewrite it here by adding ' <brk> ' wherever appropriate to mark context break:

Затем пользователь может переписать вывод, добавив <brk> там, где это необходимо, чтобы отметить разрыв контекста:

J’ai mentionné que nous avons volé une vache, c’est vraiment un beau animal, que nous avons emmené à l’élevage et que nous l’avons gardée pendant des époques. < brk > Malheureusement, nous n’avons pas pu prévoir qu’elle disparaîtrait.

Окончательный результат будет:

Context with contextual cues (std λ=1.00) followed by output sentence
with context-sensitive target spans (std λ=1.00):

Input context:  Did I mention we stole a cow ? A beautiful animal, truly. We brought it to the stable and kept it there for ages.
Input current:  Sadly, we could not foresee it would disappear.
Output context: J’ai mentionné que nous avons volé une vache, c’est vraiment un beau animal, que nous avons emmené à l’élevage et que nous l’avons gardée pendant 
des époques.
Context-aware output:   J’ai mentionné que nous avons volé une vache, c’est vraiment un beau animal, que nous avons emmené à l’élevage et que nous l’avons gardée 
pendant des époques. Malheureusement, nous n’avons pas pu prévoir qu’elle disparaîtrait.
Using language tags for model type ' mbart50-1toM ' (eng - > fra).

# 1. (CTI |kl_divergence| > 1.08, CCI |saliency| > 0.00)
Contextless output:     Malheureusement, nous n ' avons pas pu prévoir sa disparition.
Current output:  Malheureusement, nous n’(3.505)avons pas pu prévoir qu’elle disparaîtrait.
Input context:   Did I mention we stole a cow? A beautiful animal, truly. We brought it to the stable(0.002) and kept it there for ages.
Output context:  J’(0.004)ai mentionné que nous avons volé une vache, c’(0.002)est vraiment un beau animal, que nous avons emmené à l’(0.003)élevage et que nous 
l’(0.007)avons gardée pendant des époques.

В этом случае мы видим, что модель предпочитает генерировать изогнутый апостроф ' а не прямой ' используемый по умолчанию в бесконтекстном выводе, чтобы придерживаться стиля выходного контекста, используя этот символ несколько раз (определяемых pecore как контекстные подсказки). .

Настройка метода атрибуции

В этом примере мы используем вес внимания главы 8 в слое 5 для определения контекстной зависимости. Эмпирически было обнаружено, что эта голова хорошо соответствует человеческой интуиции.

pecore-viz --model_name context-mt/scat-marian-small-ctx4-cwd1-en-fr --attributions_aggregate_fns mean mean --model_use_ctx_break --impute_with_contextless_output --force_context_aware_output_prefix --input "Did I mention we stole a cow? A beautiful animal, truly. We brought it to the stable and kept it there for ages.<brk> Sadly, we could not foresee it would disappear." --attribution_method attention --select_attributions_idx 7 4">

 pecore -viz 
    --model_name context-mt/scat-marian-small-ctx4-cwd1-en-fr 
    --attributions_aggregate_fns mean mean 
    --model_use_ctx_break 
    --impute_with_contextless_output 
    --force_context_aware_output_prefix 
    --input " Did I mention we stole a cow? A beautiful animal, truly. We brought it to the stable and kept it there for ages.<brk> Sadly, we could not foresee it would disappear. " 
    --attribution_method attention 
    --select_attributions_idx 7 4

Воспроизведение бумажных результатов

Перевод с помощью контекстно-зависимой модели NMT

python scripts/translate.py 
    --model_type mbart50-1toM 
    --model_id mbart50-1toM-scat 
    --model_name context-mt/scat-mbart50-1toM-ctx4-cwd1-en-fr 
    --context_size 4  
    --dataset scat 
    --context_word_dropout 1

python scripts/translate.py 
    --model_type marian-big 
    --model_id marian-big-scat-target 
    --model_name context-mt/scat-marian-big-target-ctx4-cwd0-en-fr 
    --context_size 4 
    --dataset disc_eval_mt 
    --context_word_dropout 0 
    --dataset_config anaphora

python scripts/translate.py 
    --model_type marian-big 
    --model_id marian-big-scat-target 
    --model_name context-mt/scat-marian-big-target-ctx4-cwd0-en-fr 
    --context_size 4 
    --dataset disc_eval_mt 
    --context_word_dropout 0 
    --dataset_config lexical-choice

python scripts/translate.py 
    --model_type marian-big 
    --model_id marian-big-scat 
    --model_name context-mt/scat-marian-big-ctx4-cwd1-en-fr 
    --context_size 4 
    --dataset disc_eval_mt 
    --context_word_dropout 1 
    --dataset_config anaphora

python scripts/translate.py 
    --model_type marian-big 
    --model_id marian-big-scat 
    --model_name context-mt/scat-marian-big-ctx4-cwd1-en-fr 
    --context_size 4 
    --dataset disc_eval_mt 
    --context_word_dropout 1 
    --dataset_config lexical-choice

python scripts/translate.py 
    --model_type mbart50-1toM 
    --model_id mbart50-1toM-scat 
    --model_name context-mt/scat-mbart50-1toM-ctx4-cwd1-en-fr 
    --context_size 0 
    --dataset disc_eval_mt 
    --context_word_dropout 0 
    --dataset_config lexical-choice

Оценка контекстно-зависимой модели NMT

python scripts/evaluate_mt_outputs.py 
    --filepath outputs/translations/ctx/scat-marian-small-scat-target.txt 
    --model_id marian-small-scat-target 
    --dataset scat 
    --src_lang eng 
    --tgt_lang fra 
    --metrics bleu comet accuracy flip 
    --has_target_context 
    --max_idx 250

python scripts/evaluate_mt_outputs.py 
    --filepath outputs/translations/ctx/disc_eval_mt-anaphora-marian-small-scat-target.txt 
    --model_id marian-small-scat-target 
    --dataset disc_eval_mt 
    --src_lang eng 
    --tgt_lang fra 
    --metrics bleu comet accuracy flip 
    --has_target_context 
    --max_idx 250

python scripts/evaluate_mt_outputs.py 
    --filepath outputs/translations/ctx/scat-mbart50-1toM-scat.txt 
    --model_id mbart50-1toM-scat 
    --dataset scat 
    --src_lang eng 
    --tgt_lang fra 
    --metrics bleu comet accuracy

python scripts/evaluate_mt_outputs.py 
    --filepath outputs/translations/ctx/scat-mbart50-1toM-scat.txt 
    --model_id mbart50-1toM-scat 
    --dataset scat 
    --src_lang eng 
    --tgt_lang fra 
    --metrics comet accuracy

Создание примеров выполнения шагов pecore

python scripts/generate_examples.py 
    --dataset scat 
    --model_name context-mt/scat-marian-small-target-ctx4-cwd0-en-fr 
    --src_lang eng 
    --tgt_lang fra 
    --model_id marian-small-scat-target 
    --model_type marian-small 
    --has_context 
    --has_contrast 
    --has_target_context

python scripts/generate_examples.py 
    --dataset scat 
    --model_name context-mt/scat-mbart50-1toM-target-ctx4-cwd0-en-fr 
    --src_lang eng 
    --tgt_lang fra 
    --model_id mbart50-1toM-scat-target 
    --model_type mbart50-1toM 
    --has_context 
    --has_target_context 
    --has_contrast

python scripts/generate_examples.py 
    --dataset disc_eval_mt 
    --dataset_config anaphora 
    --model_name context-mt/scat-marian-small-ctx4-cwd1-en-fr 
    --src_lang eng 
    --tgt_lang fra 
    --model_id marian-small-scat 
    --model_type marian-small 
    --has_context 
    --has_contrast

python scripts/generate_examples.py 
    --dataset scat 
    --model_name Helsinki-NLP/opus-mt-en-fr 
    --src_lang eng 
    --tgt_lang fra 
    --model_id marian-small 
    --model_type marian-small 
    --has_contrast

pecore Шаг 1: Контекстно-зависимая идентификация цели (CTI)

python scripts/tag_cti_metrics.py 
    --examples_path outputs/processed_examples/scat-marian-small-scat.tsv 
    --model_name context-mt/scat-marian-small-ctx4-cwd1-en-fr 
    --model_type marian-small

python scripts/tag_cti_metrics.py 
    --examples_path outputs/processed_examples/scat-marian-big-scat.tsv 
    --model_name context-mt/scat-marian-big-ctx4-cwd1-en-fr 
    --model_type marian-big

python scripts/tag_cti_metrics.py 
    --examples_path outputs/processed_examples/scat-mbart50-1toM-scat.tsv 
    --model_name context-mt/scat-mbart50-1toM-ctx4-cwd1-en-fr 
    --model_type mbart50-1toM

python scripts/tag_cti_metrics.py 
    --examples_path outputs/processed_examples/scat-marian-small-scat-target.tsv 
    --model_name context-mt/scat-marian-small-target-ctx4-cwd0-en-fr 
    --model_type marian-small

python scripts/tag_cti_metrics.py 
    --examples_path outputs/processed_examples/scat-marian-big-scat-target.tsv 
    --model_name context-mt/scat-marian-big-target-ctx4-cwd0-en-fr 
    --model_type marian-big

python scripts/tag_cti_metrics.py 
    --examples_path outputs/processed_examples/scat-mbart50-1toM-scat-target.tsv 
    --model_name context-mt/scat-mbart50-1toM-target-ctx4-cwd0-en-fr 
    --model_type mbart50-1toM

pecore Шаг 2: Вменение контекстных сигналов (CCI)

python scripts/tag_cci_metrics.py 
    --examples_path outputs/processed_examples/scat-marian-small-scat.tsv 
    --model_name context-mt/scat-marian-small-ctx4-cwd1-en-fr 
    --model_type marian-small

python scripts/tag_cci_metrics.py 
    --examples_path outputs/processed_examples/scat-mbart50-1toM-scat-target.tsv 
    --model_name context-mt/scat-mbart50-1toM-target-ctx4-cwd0-en-fr 
    --model_type mbart50-1toM

Оцените показатели pecore

python scripts/evaluate_tagged_metrics.py 
    --scores_path outputs/scores/scat-marian-small-scat-cti.tsv 
    --eval_mode cti 
    --use_trained_model

python scripts/evaluate_tagged_metrics.py 
    --scores_path outputs/scores/scat-marian-small-scat-cti.tsv 
    --eval_mode cti 
    --average_example_scores 
    --metrics random pcxmi kl_divergence 
    --save_preds

python scripts/evaluate_tagged_metrics.py 
    --scores_path outputs/scores/scat-marian-small-scat-cci.tsv 
    --eval_mode cci 
    --example_target_column is_supporting_context 
    --average_example_scores 
    --metrics random saliency_contrast_prob_diff attention_default attention_best

python scripts/evaluate_tagged_metrics.py 
    --scores_path outputs/scores/scat-marian-small-scat-target-cti.tsv 
    --eval_mode cti 
    --average_example_scores 
    --metrics random pcxmi kl_divergence 
    --save_preds

Расширять