NeuralDialog CVAE下載 - NeuralDialog CVAE原始碼下載

NeuralDialog CVAE

Ai源碼

1.0.0

下載

用於生成對話的知識引導 CVAE

我們提供了基於 CVAE 的對話模型的 TensorFlow 實現，該實現在《使用條件變分自動編碼器學習神經對話模型的話語級多樣性》中進行了描述，該論文在ACL 2017 上作為長論文發表。更多詳細信息，請參閱該論文。

參考

如果您在工作中使用此工具包中包含的任何原始程式碼或資料集，請引用以下論文。 bibtex 列出如下：

 [Zhao et al, 2017]:
 @inproceedings{zhao2017learning,
   title={Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders},
   author={Zhao, Tiancheng and Zhao, Ran and Eskenazi, Maxine},
   booktitle={Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
   volume={1},
   pages={654--664},
   year={2017}
 }

外部實施

使用相同 SwitchBoard 資料集的基線方法 HRED 也在通用文字產生工具包Texar上實現。在這裡結帳。

先決條件

TensorFlow 1.3.0
cuDNN 6
Python 2.7
麻木
NLTK
如果模組遺失，您可能需要 pip install beeprint

用法

訓練新模型

 python kgcvae_swda.py

將運行預設訓練並將模型儲存到 ./working

測試現有模型

如下修改 kgcvae_swda.py 頂部的 TF 標誌以運行現有模型

 forward_only: False -> True
test_path: set to the folder contains the model. E.g. runxxxx

然後您可以透過以下方式運行模型：

 python kgcvae_swda.py

輸出將列印到 stdout，產生的回應將保存在 test_path 中的 test.txt 中。

使用預先訓練的 Word2vec

從 https://nlp.stanford.edu/projects/glove/ 下載 Glove 字嵌入預設設定使用在 Twitter 上訓練的 200 維詞嵌入。

最後，在kgcvae_swda.py的第15行設定word2vec_path 。

數據集

我們發布了兩個數據集：

full_swda_clean_42da_sentiment_dialog_corpus.p 是使用 python Pickle 函式庫的二進位轉儲，其中包含原始資料並用於訓練
json_format：相同的對話方塊資料也以 JSONL 格式呈現在資料目錄中。
test_mutl_ref.json 只是帶有對話行為註釋的多個引用回應的測試資料集。根據論文附錄中所述的方法收集多個參考文獻。

資料格式

如果您想根據自己的資料訓練模型。請建立一個具有以下格式的pickle檔案：

 # The top directory is a python dictionary
type(data) = dict
data.keys() = ['train', 'valid', 'test']

# Train/valid/test is a list, each element is one dialog
train = data['train']
type(train) = list

# Each dialog is a dict
dialog = train[0]
type(dialog)= dict
dialog.keys() = ['A', 'B', 'topic', 'utts']

# A, B contain meta info about speaker A and B.
# topic defines the dialog prompt topic in Switchboard Corpus.

# utts is a list, each element is a tuple that contain info about an utterance
utts = dialog['utts']
type(utts) = list
utts[0] = ("A" or "B", "utterance in string", [dialog_act, other_meta_info])

# For example, a utterance look like this:
('B','especially your foreign cars',['statement-non-opinion'])

將產生的檔案放入./data並在kgcvae_swda.py中設定data_dir

展開

附加信息