basis embedding下载 - basis embedding源代码下载

basis embedding

Ai源码

1.0.0

下载

basis embedding

低内存神经网络语言模型的结构化词嵌入代码

用于减少模型大小和内存消耗的basis embedding的代码存储库此存储库基于 github 上的 pytorch/examples 存储库构建

参数介绍

basis embedding相关参数：

--basis <0>: 分解嵌入矩阵的基数，0为普通模式
--num_clusters ：所有词汇的簇数
--load_input_embedding ：用于输入嵌入的预训练嵌入矩阵的路径
--load_output_embedding ：用于输出嵌入的预训练嵌入矩阵的路径

其他选项：

-c或--config ：配置文件的路径，它将覆盖参数解析器的默认值并被命令行选项覆盖
--train ：训练或仅评估现有模型
--dict <None> : 如果指定则使用词汇文件，否则使用 train.txt 中的单词

例子

python main.py -c config/default.conf  # train a cross-entropy baseline
python main.py -c config/ptb_basis_tied.conf # basis embedding inited via tied embedding on ptb

在训练期间，如果收到键盘中断 (Ctrl-C)，训练就会停止，并根据测试数据集评估当前模型。

main.py脚本接受以下参数：

basis embedding related parameters">

optional arguments:
  -h, --help         show this help message and exit
  -c, --config PATH  preset configurations to load
  --data DATA        location of the data corpus
  --model MODEL      type of recurrent net (RNN_TANH, RNN_RELU, LSTM, GRU)
  --emsize EMSIZE    size of word embeddings
  --nhid NHID        humber of hidden units per layer
  --nlayers NLAYERS  number of layers
  --lr LR            initial learning rate
  --clip CLIP        gradient clipping
  --epochs EPOCHS    upper epoch limit
  --batch-size N     batch size
  --dropout DROPOUT  dropout applied to layers (0 = no dropout)
  --tied             tie the word embedding and softmax weights
  --seed SEED        random seed
  --cuda             use CUDA
  --log-interval N   report interval
  --save SAVE        path to save the final model
  ... more from previous basis embedding related parameters