NeuralDialog CVAE下载 - NeuralDialog CVAE源码下载

NeuralDialog CVAE

Ai源码

1.0.0

下载

用于生成对话的知识引导 CVAE

我们提供了基于 CVAE 的对话模型的 TensorFlow 实现，该实现在《使用条件变分自动编码器学习神经对话模型的话语级多样性》中进行了描述，该论文在 ACL 2017 上作为长论文发表。有关更多详细信息，请参阅该论文。

参考

如果您在工作中使用此工具包中包含的任何源代码或数据集，请引用以下论文。 bibtex 列出如下：

 [Zhao et al, 2017]:
 @inproceedings{zhao2017learning,
   title={Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders},
   author={Zhao, Tiancheng and Zhao, Ran and Eskenazi, Maxine},
   booktitle={Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
   volume={1},
   pages={654--664},
   year={2017}
 }

外部实施

使用相同 SwitchBoard 数据集的基线方法 HRED 也在通用文本生成工具包Texar上实现。在这里结帐。

先决条件

TensorFlow 1.3.0
cuDNN 6
Python 2.7
麻木
NLTK
如果模块丢失，您可能需要 pip install beeprint

用法

训练新模型

 python kgcvae_swda.py

将运行默认训练并将模型保存到 ./working

测试现有模型

如下修改 kgcvae_swda.py 顶部的 TF 标志以运行现有模型

 forward_only: False -> True
test_path: set to the folder contains the model. E.g. runxxxx

然后您可以通过以下方式运行模型：

 python kgcvae_swda.py

输出将打印到 stdout，生成的响应将保存在 test_path 中的 test.txt 中。

使用预先训练的 Word2vec

从 https://nlp.stanford.edu/projects/glove/ 下载 Glove 词嵌入默认设置使用在 Twitter 上训练的 200 维词嵌入。

最后，在kgcvae_swda.py的第15行设置word2vec_path 。

数据集

我们发布了两个数据集：

full_swda_clean_42da_sentiment_dialog_corpus.p 是使用 python Pickle 库的二进制转储，其中包含原始数据并用于训练
json_format：相同的对话框数据也以 JSONL 格式呈现在数据目录中。
test_mutl_ref.json 只是带有对话行为注释的多个引用响应的测试数据集。根据论文附录中描述的方法收集多个参考文献。

数据格式

如果您想根据自己的数据训练模型。请创建一个具有以下格式的pickle文件：

 # The top directory is a python dictionary
type(data) = dict
data.keys() = ['train', 'valid', 'test']

# Train/valid/test is a list, each element is one dialog
train = data['train']
type(train) = list

# Each dialog is a dict
dialog = train[0]
type(dialog)= dict
dialog.keys() = ['A', 'B', 'topic', 'utts']

# A, B contain meta info about speaker A and B.
# topic defines the dialog prompt topic in Switchboard Corpus.

# utts is a list, each element is a tuple that contain info about an utterance
utts = dialog['utts']
type(utts) = list
utts[0] = ("A" or "B", "utterance in string", [dialog_act, other_meta_info])

# For example, a utterance look like this:
('B','especially your foreign cars',['statement-non-opinion'])

将生成的文件放入./data并在kgcvae_swda.py中设置data_dir

展开

附加信息