Highway Transformer
1.0.0
This repo is the demo code of Transformer-XL using Self-Dependency Unit. This work is closedly related to Gating-enhanced Transformer variants, such as Google's Switch Transformers.
Yekun Chai et. al., Highway Transformer: Self-Gating Enhanced Self-Attentive Networks (ACL 2020)
bash getdata.sh
cd pytorch/xl_L6_scripts && bash <script-name>.sh train --work_dir "PATH_TO_WORK_DIR"
cd XL-L6-results && tensorboard --logdir=.
training bpc | training loss |
---|---|
eval bpc | eval loss |
---|---|
For attribution in academic contexts, please cite this work as:
@inproceedings{chai-etal-2020-highway,
title = "Highway Transformer: Self-Gating Enhanced Self-Attentive Networks",
author = "Chai, Yekun and
Jin, Shuo and
Hou, Xinwen",
booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2020",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.acl-main.616",
pages = "6887--6900"
}