ดาวน์โหลด pecore - ดาวน์โหลด pecore ซอร์สโค้ด

การหาปริมาณความน่าเชื่อถือของการพึ่งพาบริบทในการแปลด้วยเครื่องประสาท

กาเบรียล ซาร์ติ • เกรเซกอร์ซ ครูปาวา • มัลวินา นิสซิม • อาเรียนนา บิซาซซา

<ช่วงคลาส= กระบวนการสองขั้นตอน pecore" width="300" style="max-width: 100%;"> ตัวอย่างพีช" width="500" style="max-width: 100%;">

บทคัดย่อ: การพิจารณาว่าโมเดลภาษาสามารถใช้ข้อมูลเชิงบริบทในลักษณะที่เป็นไปได้ของมนุษย์หรือไม่นั้นเป็นสิ่งสำคัญเพื่อให้แน่ใจว่ามีการนำไปใช้อย่างปลอดภัยในสภาพแวดล้อมจริง อย่างไรก็ตาม คำถามว่าเมื่อใดและส่วนใดของบริบทที่ส่งผลต่อการสร้างแบบจำลอง โดยทั่วไปจะแยกประเด็นออกไป และการประเมินความเป็นไปได้ในปัจจุบันนั้นจำกัดอยู่เพียงเกณฑ์มาตรฐานปลอมเพียงไม่กี่ข้อเท่านั้น เพื่อแก้ไขปัญหานี้ เราขอแนะนำการประเมินความน่าเชื่อถือของการพึ่งพาบริบท ( pecore ) ซึ่งเป็นกรอบงานการตีความตั้งแต่ต้นทางถึงปลายทางที่ออกแบบมาเพื่อวัดปริมาณการใช้บริบทในรุ่นของโมเดลภาษา แนวทางของเราใช้ประโยชน์จากแบบจำลองภายในเพื่อ (i) ระบุโทเค็นเป้าหมายที่ไวต่อบริบทในข้อความที่สร้างขึ้นและ (ii) เชื่อมโยงโทเค็นเหล่านั้นกับตัวชี้นำตามบริบทที่พิสูจน์ให้เห็นถึงการคาดการณ์ เราใช้ pecore เพื่อหาปริมาณความน่าเชื่อถือของโมเดลการแปลด้วยเครื่องรับรู้บริบท โดยเปรียบเทียบเหตุผลของโมเดลกับคำอธิบายประกอบของมนุษย์ในปรากฏการณ์ระดับวาทกรรมต่างๆ สุดท้ายนี้ เราใช้วิธีการของเรากับรุ่นที่ไม่มีคำอธิบายประกอบเพื่อระบุการคาดคะเนตามบริบทและเน้นอินสแตนซ์ของ (im) การใช้บริบทที่สมเหตุสมผลในการแปลแบบจำลอง

พื้นที่เก็บข้อมูลนี้ประกอบด้วยสคริปต์และสมุดบันทึกที่เกี่ยวข้องกับรายงาน "การหาปริมาณความน่าเชื่อถือของการพึ่งพาบริบทในการแปลด้วยเครื่องประสาท" หากคุณใช้เนื้อหาใดๆ ต่อไปนี้สำหรับงานของคุณ เราขอให้คุณอ้างอิงรายงานของเรา:

 @inproceedings { sarti-etal-2023-quantifying ,
    title = " Quantifying the Plausibility of Context Reliance in Neural Machine Translation " ,
    author = " Sarti, Gabriele and 
        Chrupa{l}a, Grzegorz and 
        Nissim, Malvina and
        Bisazza, Arianna " ,
    booktitle = " The Twelfth International Conference on Learning Representations (ICLR 2024) " ,
    month = may,
    year = " 2024 " ,
    address = " Vienna, Austria " ,
    publisher = " OpenReview " ,
    url = " https://openreview.net/forum?id=XTHfNGI3zT "
}

การใช้ pecore

เคล็ดลับ

คุณสามารถลอง pecore ได้จากการสาธิตออนไลน์ของเราที่ Hugging Face Spaces

แม้ว่าพื้นที่เก็บข้อมูลนี้จะใช้ฟังก์ชันที่ใช้ในการประเมินผลเชิงทดลองของงานวิจัยที่กล่าวมาข้างต้น แต่เราได้จัดให้มีการนำ CLI ใหม่ของ pecore ไปใช้ผ่านไลบรารีความสามารถในการตีความของ Inseq เราขอแนะนำอย่างยิ่งให้นักวิจัยนำการใช้งานดังกล่าวมาใช้ เนื่องจากมีประสิทธิภาพและสามารถสรุปได้ทั่วไปมากกว่า โดยรองรับโมเดลตัวถอดรหัสเท่านั้นและตัวเข้ารหัส-ตัวถอดรหัสทั้งหมดจากไลบรารี Huggingface สำหรับการตรวจจับและการระบุแหล่งที่มาของการพึ่งพาบริบทอินพุตและเอาท์พุต โปรดดูส่วน inseq attribute-context ใน Inseq README สำหรับรายละเอียดเพิ่มเติม

สิ่งประดิษฐ์

สิ่งประดิษฐ์ทั้งหมดสำหรับรายงาน รวมถึงแบบจำลองที่ได้รับการปรับแต่งและชุดข้อมูลการฝึกอบรม/การประเมินมีอยู่ในคอลเลกชัน pecore HuggingFace จะมีการสาธิตให้ใช้งานเร็วๆ นี้ โปรดคอยติดตาม!

ฝึกอบรมโมเดล NMT แบบ Context-Aware

โมเดล NMT แบบรับรู้บริบทได้รับการฝึกฝนโดยใช้สคริปต์ train_context_aware_mt_model.py สคริปต์นี้เป็นการแก้ไขของ run_translation_no_trainer.py ดั้งเดิม สคริปต์เพิ่มฟิลด์ต่อไปนี้สำหรับการฝึกโมเดลตามบริบท:

context_size : จำนวนประโยคบริบทที่จะใช้สำหรับการฝึกอบรม ค่าเริ่มต้นคือ 0 (การฝึกอบรมระดับประโยค)
sample_context : หากตั้งค่าไว้ ขนาดของบริบทสำหรับทุกตัวอย่างจะถูกสุ่มตัวอย่างจากการแจกแจงแบบสม่ำเสมอระหว่าง 0 ถึง context_size (รวม) หากไม่ผ่านและ context_size มากกว่า 0 ขนาดบริบทจะเท่ากับ context_size เสมอ
context_word_dropout : ความน่าจะเป็นระหว่าง 0 ถึง 1 ในการปล่อยคำออกจากบริบท ค่าเริ่มต้นคือ 0 (ไม่มีการออกกลางคัน)
use_target_context : หากตั้งค่าไว้ บริบทจะรวมอยู่ในข้อความที่แปลสำหรับการสูญเสียการฝึกด้วย ในกรณีนั้น รูปแบบเอาต์พุตสำหรับอินพุต src_ctx <brk> src จะกลายเป็น tgt_ctx <brk> tgt มิฉะนั้นรูปแบบเอาต์พุตจะเป็น tgt (แปลเฉพาะ src เท่านั้น)

ตัวอย่างการใช้งาน

นี่คือตัวอย่างของการปรับแต่งโมเดล mBART 1 ถึง 50 อย่างละเอียดบนชุดข้อมูล IWSLT17 ที่เสริมบริบทด้วยประโยคบริบทสูงสุด 4 ประโยคและคำบริบทตกหล่น 10%:

accelerate launch scripts/train_context_aware_mt_model.py 
    --model_name_or_path facebook/mbart-large-50-one-to-many-mmt 
    --source_lang en_XX 
    --target_lang fr_XX 
    --dataset_name gsarti/iwslt2017_context 
    --dataset_config_name iwslt2017-en-fr 
    --output_dir outputs/models/iwslt17-mbart50-1toM-ctx4-cwd1-en-fr 
    --num_beams 5 
    --max_source_length 512 
    --max_target_length 128 
    --num_train_epochs 20 
    --gradient_accumulation_steps 4 
    --per_device_train_batch_size 8 
    --num_warmup_steps 500 
    --learning_rate 3e-4 
    --checkpointing_steps epoch 
    --with_tracking 
    --report_to tensorboard 
    --context_size 4 
    --sample_context 
    --context_word_dropout 0.1

นี่คือตัวอย่างของการปรับโมเดล En->Fr OpusMT ที่รับรู้บริบทอย่างต่อเนื่องในส่วนการฝึกอบรมของ SCAT โดยมีประโยคบริบทสูงสุด 4 ประโยคและคำที่บริบทหายไป 10%:

accelerate launch scripts/train_context_aware_mt_model.py 
    --model_name_or_path context-mt/iwslt17-marian-big-ctx4-cwd1-en-fr 
    --dataset_name inseq/scat 
    --dataset_config_name sentences 
    --output_dir outputs/models/scat-marian-big-ctx4-cwd1-en-fr 
    --num_beams 5 
    --max_source_length 512 
    --max_target_length 128 
    --num_train_epochs 2 
    --gradient_accumulation_steps 2 
    --per_device_train_batch_size 8 
    --num_warmup_steps 0 
    --learning_rate 5e-5 
    --checkpointing_steps 1000 
    --logging_steps 200 
    --with_tracking 
    --report_to tensorboard 
    --context_size 4 
    --sample_context 
    --context_word_dropout 0.1

การใช้ pecore CLI

pecore CLI เป็นอินเทอร์เฟซบรรทัดคำสั่งสำหรับการรันขั้นตอน pecore บนโมเดลและชุดข้อมูลที่กำหนด CLI ถูกนำไปใช้ในสคริปต์ pecore /cli.py และสามารถใช้เป็น pecore -viz เมื่อติดตั้งแพ็คเกจด้วย pip install -e . - การใช้งานปัจจุบันรองรับการระบุเป้าหมายที่ไวต่อบริบท (CTI) และการใส่ความหมายตามบริบท (CCI) สำหรับโมเดลตัวเข้ารหัสและตัวถอดรหัสทั้งหมดที่รองรับโดยเฟรมเวิร์ก Inseq รวมถึงโมเดลที่มีแท็กคำนำหน้าภาษา (mBART-50, NLLB, M2M100) และแบบจำลองที่ได้รับการฝึกด้วยแท็กบริบทพิเศษ (เช่น คอลเลกชันของแบบจำลองที่พบในองค์กรบริบทบน HF Hub) CLI สามารถใช้เพื่อรันขั้นตอน pecore บนโมเดลที่กำหนดและตัวอย่างดังต่อไปนี้:

pecore-viz --model_name context-mt/scat-marian-small-ctx4-cwd1-en-fr --attributions_aggregate_fns sum --model_use_ctx_break --impute_with_contextless_output --force_context_aware_output_prefix --input "Did I mention we stole a cow? A beautiful animal, truly. We brought it to the stable and kept it there for ages.<brk> Sadly, we could not foresee it would disappear."">

 pecore -viz 
    --model_name context-mt/scat-marian-small-ctx4-cwd1-en-fr 
    --attributions_aggregate_fns sum 
    --model_use_ctx_break 
    --impute_with_contextless_output 
    --force_context_aware_output_prefix 
    --input " Did I mention we stole a cow? A beautiful animal, truly. We brought it to the stable and kept it there for ages.<brk> Sadly, we could not foresee it would disappear. "

ตัวอย่างด้านบนให้ผลลัพธ์ต่อไปนี้ โดยเน้นการพึ่งพาสรรพนาม "il" ในคำนาม "cow" และ "animal" ในบริบทอย่างถูกต้อง

Context with contextual cues (std λ=1.00) followed by output sentence
with context-sensitive target spans (std λ=1.00):

Input context:  Did I mention we stole a cow ? A beautiful animal, truly. We brought it to the stable and kept it there for ages.
Input current:  Sadly, we could not foresee it would disappear.
Context-aware output:   Malheureusement, nous n ' avons pas pu prévoir qu ' il disparaîtrait.
Using ' <brk> ' to separate context and current inputs.

# 1. (CTI |kl_divergence| > 0.14, CCI |saliency| > 0.71)
Contextless output:     Malheureusement, nous n ' avons pas pu prévoir qu ' il disparaîtrait.
Current output:  Malheureusement, nous n ' avons pas pu prévoir qu ' il(0.412) disparaîtrait.
Input context:   Did I mention we stole a cow(1.524) ? A beautiful animal(1.472), truly. We brought it to the stable and kept it 
there for ages.

เมื่อใช้ CLI เพื่อรันโมเดลปกติ จำเป็นต้องมีขั้นตอนเพิ่มเติมเพื่อระบุตำแหน่งของการแบ่งบริบทในการสร้างโมเดล หากผู้ใช้ไม่ได้บังคับเอาต์พุต นี่คือตัวอย่างการใช้รุ่น mBART-50 ปกติจาก HF Hub:

pecore-viz --model_name facebook/mbart-large-50-one-to-many-mmt --input_lang eng --output_lang fra --model_type mbart50-1toM --impute_with_contextless_output --force_context_aware_output_prefix --input "Did I mention we stole a cow? A beautiful animal, truly. We brought it to the stable and kept it there for ages.<brk> Sadly, we could not foresee it would disappear."">

 pecore -viz 
    --model_name facebook/mbart-large-50-one-to-many-mmt 
    --input_lang eng --output_lang fra --model_type mbart50-1toM 
    --impute_with_contextless_output 
    --force_context_aware_output_prefix 
    --input " Did I mention we stole a cow? A beautiful animal, truly. We brought it to the stable and kept it there for ages.<brk> Sadly, we could not foresee it would disappear. "

ผู้ใช้จะได้รับข้อความแจ้งต่อไปนี้:

The following output was generate by the model: J’ai mentionné que nous avons volé une vache, c’est vraiment un beau animal, que nous avons emmené à l’élevage et que nous l’avons gardée pendant des époques. Malheureusement, nous n’avons pas pu prévoir qu’elle disparaîtrait.
Rewrite it here by adding ' <brk> ' wherever appropriate to mark context break:

จากนั้นผู้ใช้สามารถเขียนเอาต์พุตใหม่ได้โดยการเพิ่ม <brk> ตามความเหมาะสมเพื่อทำเครื่องหมายตัวแบ่งบริบท:

J’ai mentionné que nous avons volé une vache, c’est vraiment un beau animal, que nous avons emmené à l’élevage et que nous l’avons gardée pendant des époques. < brk > Malheureusement, nous n’avons pas pu prévoir qu’elle disparaîtrait.

ผลลัพธ์สุดท้ายจะเป็น:

Context with contextual cues (std λ=1.00) followed by output sentence
with context-sensitive target spans (std λ=1.00):

Input context:  Did I mention we stole a cow ? A beautiful animal, truly. We brought it to the stable and kept it there for ages.
Input current:  Sadly, we could not foresee it would disappear.
Output context: J’ai mentionné que nous avons volé une vache, c’est vraiment un beau animal, que nous avons emmené à l’élevage et que nous l’avons gardée pendant 
des époques.
Context-aware output:   J’ai mentionné que nous avons volé une vache, c’est vraiment un beau animal, que nous avons emmené à l’élevage et que nous l’avons gardée 
pendant des époques. Malheureusement, nous n’avons pas pu prévoir qu’elle disparaîtrait.
Using language tags for model type ' mbart50-1toM ' (eng - > fra).

# 1. (CTI |kl_divergence| > 1.08, CCI |saliency| > 0.00)
Contextless output:     Malheureusement, nous n ' avons pas pu prévoir sa disparition.
Current output:  Malheureusement, nous n’(3.505)avons pas pu prévoir qu’elle disparaîtrait.
Input context:   Did I mention we stole a cow? A beautiful animal, truly. We brought it to the stable(0.002) and kept it there for ages.
Output context:  J’(0.004)ai mentionné que nous avons volé une vache, c’(0.002)est vraiment un beau animal, que nous avons emmené à l’(0.003)élevage et que nous 
l’(0.007)avons gardée pendant des époques.

ในกรณีนี้ เราเห็นว่าแบบจำลองเลือกที่จะสร้างเครื่องหมายอะพอสทรอฟีแบบโค้ง ' แทนที่จะเป็นแบบตรง ' ที่ใช้โดยค่าเริ่มต้นในเอาต์พุตแบบไร้บริบทเพื่อยึดติดกับรูปแบบบริบทของเอาต์พุต โดยใช้อักขระนั้นหลายครั้ง (ระบุเป็นตัวชี้นำตามบริบทโดย pecore ) .

การปรับแต่งวิธีการระบุแหล่งที่มา

ในตัวอย่างนี้ เราใช้น้ำหนักความสนใจของส่วนหัว 8 ในเลเยอร์ 5 เพื่อระบุแหล่งที่มาของการพึ่งพาบริบท ศีรษะนี้ถูกค้นพบโดยเชิงประจักษ์เพื่อให้สอดคล้องกับสัญชาตญาณของมนุษย์

pecore-viz --model_name context-mt/scat-marian-small-ctx4-cwd1-en-fr --attributions_aggregate_fns mean mean --model_use_ctx_break --impute_with_contextless_output --force_context_aware_output_prefix --input "Did I mention we stole a cow? A beautiful animal, truly. We brought it to the stable and kept it there for ages.<brk> Sadly, we could not foresee it would disappear." --attribution_method attention --select_attributions_idx 7 4">

 pecore -viz 
    --model_name context-mt/scat-marian-small-ctx4-cwd1-en-fr 
    --attributions_aggregate_fns mean mean 
    --model_use_ctx_break 
    --impute_with_contextless_output 
    --force_context_aware_output_prefix 
    --input " Did I mention we stole a cow? A beautiful animal, truly. We brought it to the stable and kept it there for ages.<brk> Sadly, we could not foresee it would disappear. " 
    --attribution_method attention 
    --select_attributions_idx 7 4

การสร้างผลลัพธ์บนกระดาษอีกครั้ง

แปลด้วยโมเดล NMT แบบ Context-Aware

python scripts/translate.py 
    --model_type mbart50-1toM 
    --model_id mbart50-1toM-scat 
    --model_name context-mt/scat-mbart50-1toM-ctx4-cwd1-en-fr 
    --context_size 4  
    --dataset scat 
    --context_word_dropout 1

python scripts/translate.py 
    --model_type marian-big 
    --model_id marian-big-scat-target 
    --model_name context-mt/scat-marian-big-target-ctx4-cwd0-en-fr 
    --context_size 4 
    --dataset disc_eval_mt 
    --context_word_dropout 0 
    --dataset_config anaphora

python scripts/translate.py 
    --model_type marian-big 
    --model_id marian-big-scat-target 
    --model_name context-mt/scat-marian-big-target-ctx4-cwd0-en-fr 
    --context_size 4 
    --dataset disc_eval_mt 
    --context_word_dropout 0 
    --dataset_config lexical-choice

python scripts/translate.py 
    --model_type marian-big 
    --model_id marian-big-scat 
    --model_name context-mt/scat-marian-big-ctx4-cwd1-en-fr 
    --context_size 4 
    --dataset disc_eval_mt 
    --context_word_dropout 1 
    --dataset_config anaphora

python scripts/translate.py 
    --model_type marian-big 
    --model_id marian-big-scat 
    --model_name context-mt/scat-marian-big-ctx4-cwd1-en-fr 
    --context_size 4 
    --dataset disc_eval_mt 
    --context_word_dropout 1 
    --dataset_config lexical-choice

python scripts/translate.py 
    --model_type mbart50-1toM 
    --model_id mbart50-1toM-scat 
    --model_name context-mt/scat-mbart50-1toM-ctx4-cwd1-en-fr 
    --context_size 0 
    --dataset disc_eval_mt 
    --context_word_dropout 0 
    --dataset_config lexical-choice

ประเมินโมเดล NMT แบบ Context-Aware

python scripts/evaluate_mt_outputs.py 
    --filepath outputs/translations/ctx/scat-marian-small-scat-target.txt 
    --model_id marian-small-scat-target 
    --dataset scat 
    --src_lang eng 
    --tgt_lang fra 
    --metrics bleu comet accuracy flip 
    --has_target_context 
    --max_idx 250

python scripts/evaluate_mt_outputs.py 
    --filepath outputs/translations/ctx/disc_eval_mt-anaphora-marian-small-scat-target.txt 
    --model_id marian-small-scat-target 
    --dataset disc_eval_mt 
    --src_lang eng 
    --tgt_lang fra 
    --metrics bleu comet accuracy flip 
    --has_target_context 
    --max_idx 250

python scripts/evaluate_mt_outputs.py 
    --filepath outputs/translations/ctx/scat-mbart50-1toM-scat.txt 
    --model_id mbart50-1toM-scat 
    --dataset scat 
    --src_lang eng 
    --tgt_lang fra 
    --metrics bleu comet accuracy

python scripts/evaluate_mt_outputs.py 
    --filepath outputs/translations/ctx/scat-mbart50-1toM-scat.txt 
    --model_id mbart50-1toM-scat 
    --dataset scat 
    --src_lang eng 
    --tgt_lang fra 
    --metrics comet accuracy

สร้างตัวอย่างสำหรับการรันขั้นตอน pecore

python scripts/generate_examples.py 
    --dataset scat 
    --model_name context-mt/scat-marian-small-target-ctx4-cwd0-en-fr 
    --src_lang eng 
    --tgt_lang fra 
    --model_id marian-small-scat-target 
    --model_type marian-small 
    --has_context 
    --has_contrast 
    --has_target_context

python scripts/generate_examples.py 
    --dataset scat 
    --model_name context-mt/scat-mbart50-1toM-target-ctx4-cwd0-en-fr 
    --src_lang eng 
    --tgt_lang fra 
    --model_id mbart50-1toM-scat-target 
    --model_type mbart50-1toM 
    --has_context 
    --has_target_context 
    --has_contrast

python scripts/generate_examples.py 
    --dataset disc_eval_mt 
    --dataset_config anaphora 
    --model_name context-mt/scat-marian-small-ctx4-cwd1-en-fr 
    --src_lang eng 
    --tgt_lang fra 
    --model_id marian-small-scat 
    --model_type marian-small 
    --has_context 
    --has_contrast

python scripts/generate_examples.py 
    --dataset scat 
    --model_name Helsinki-NLP/opus-mt-en-fr 
    --src_lang eng 
    --tgt_lang fra 
    --model_id marian-small 
    --model_type marian-small 
    --has_contrast

pecore ขั้นตอนที่ 1: การระบุเป้าหมายตามบริบท (CTI)

python scripts/tag_cti_metrics.py 
    --examples_path outputs/processed_examples/scat-marian-small-scat.tsv 
    --model_name context-mt/scat-marian-small-ctx4-cwd1-en-fr 
    --model_type marian-small

python scripts/tag_cti_metrics.py 
    --examples_path outputs/processed_examples/scat-marian-big-scat.tsv 
    --model_name context-mt/scat-marian-big-ctx4-cwd1-en-fr 
    --model_type marian-big

python scripts/tag_cti_metrics.py 
    --examples_path outputs/processed_examples/scat-mbart50-1toM-scat.tsv 
    --model_name context-mt/scat-mbart50-1toM-ctx4-cwd1-en-fr 
    --model_type mbart50-1toM

python scripts/tag_cti_metrics.py 
    --examples_path outputs/processed_examples/scat-marian-small-scat-target.tsv 
    --model_name context-mt/scat-marian-small-target-ctx4-cwd0-en-fr 
    --model_type marian-small

python scripts/tag_cti_metrics.py 
    --examples_path outputs/processed_examples/scat-marian-big-scat-target.tsv 
    --model_name context-mt/scat-marian-big-target-ctx4-cwd0-en-fr 
    --model_type marian-big

python scripts/tag_cti_metrics.py 
    --examples_path outputs/processed_examples/scat-mbart50-1toM-scat-target.tsv 
    --model_name context-mt/scat-mbart50-1toM-target-ctx4-cwd0-en-fr 
    --model_type mbart50-1toM

pecore ขั้นตอนที่ 2: การใส่นัยตามบริบท (CCI)

python scripts/tag_cci_metrics.py 
    --examples_path outputs/processed_examples/scat-marian-small-scat.tsv 
    --model_name context-mt/scat-marian-small-ctx4-cwd1-en-fr 
    --model_type marian-small

python scripts/tag_cci_metrics.py 
    --examples_path outputs/processed_examples/scat-mbart50-1toM-scat-target.tsv 
    --model_name context-mt/scat-mbart50-1toM-target-ctx4-cwd0-en-fr 
    --model_type mbart50-1toM

ประเมินตัวชี้วัด pecore

python scripts/evaluate_tagged_metrics.py 
    --scores_path outputs/scores/scat-marian-small-scat-cti.tsv 
    --eval_mode cti 
    --use_trained_model

python scripts/evaluate_tagged_metrics.py 
    --scores_path outputs/scores/scat-marian-small-scat-cti.tsv 
    --eval_mode cti 
    --average_example_scores 
    --metrics random pcxmi kl_divergence 
    --save_preds

python scripts/evaluate_tagged_metrics.py 
    --scores_path outputs/scores/scat-marian-small-scat-cci.tsv 
    --eval_mode cci 
    --example_target_column is_supporting_context 
    --average_example_scores 
    --metrics random saliency_contrast_prob_diff attention_default attention_best

python scripts/evaluate_tagged_metrics.py 
    --scores_path outputs/scores/scat-marian-small-scat-target-cti.tsv 
    --eval_mode cti 
    --average_example_scores 
    --metrics random pcxmi kl_divergence 
    --save_preds

ขยาย