NeuralDialog CVAE 다운로드 - NeuralDialog CVAE 소스 코드 다운로드

NeuralDialog CVAE

AI 소스 코드

1.0.0

다운로드

대화 생성을 위한 지식 기반 CVAE

우리는 ACL 2017에 장편 논문으로 게시된 조건부 변형 자동 인코더를 사용하여 신경 대화 모델에 대한 담화 수준 다양성 학습 에 설명된 CVAE 기반 대화 모델의 TensorFlow 구현을 제공합니다. 자세한 내용은 논문을 참조하세요.

참고자료

이 툴킷에 포함된 소스 코드나 데이터세트를 작업에 사용하는 경우 다음 논문을 인용해 주세요. Bibtex는 다음과 같습니다.

 [Zhao et al, 2017]:
 @inproceedings{zhao2017learning,
   title={Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders},
   author={Zhao, Tiancheng and Zhao, Ran and Eskenazi, Maxine},
   booktitle={Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
   volume={1},
   pages={654--664},
   year={2017}
 }

외부 구현

동일한 SwitchBoard 데이터 세트를 사용하는 기본 방법 HRED는 범용 텍스트 생성 툴킷인 Texar 에서도 구현됩니다. 여기에서 결제하세요.

전제조건

텐서플로우 1.3.0
cuDNN 6
파이썬 2.7
넘피
NLTK
모듈이 누락된 경우 pip install beeprint가 필요할 수 있습니다.

용법

새 모델 훈련

 python kgcvae_swda.py

기본 훈련을 실행하고 모델을 ./working에 저장합니다.

기존 모델 테스트

기존 모델을 실행하려면 kgcvae_swda.py 상단의 TF 플래그를 다음과 같이 수정하세요.

 forward_only: False -> True
test_path: set to the folder contains the model. E.g. runxxxx

그런 다음 다음을 수행하여 모델을 실행할 수 있습니다.

 python kgcvae_swda.py

출력은 stdout으로 인쇄되고 생성된 응답은 test_path의 test.txt에 저장됩니다.

사전 훈련된 Word2vec 사용

https://nlp.stanford.edu/projects/glove/에서 Glove 단어 임베딩을 다운로드하세요. 기본 설정은 Twitter에서 훈련된 200차원 단어 임베딩을 사용합니다.

마지막으로 kgcvae_swda.py의 15번째 줄에 word2vec_path를 설정합니다.

데이터세트

우리는 두 가지 데이터세트를 공개합니다:

full_swda_clean_42da_sentiment_dialog_corpus.p는 원시 데이터를 포함하고 훈련에 사용되는 Python Pickle 라이브러리를 사용하는 바이너리 덤프입니다.
json_format: 동일한 대화 상자 데이터가 데이터 디렉터리에 JSONL 형식으로도 표시됩니다.
test_mutl_ref.json은 화행 주석이 있는 여러 참조 응답이 있는 테스트 데이터 세트일 뿐입니다. 복수의 추천서는 논문의 부록에 기술된 방법에 따라 수집됩니다.

데이터 형식

자신의 데이터로 모델을 훈련하려는 경우. 다음 형식의 피클 파일을 생성해 주세요.

 # The top directory is a python dictionary
type(data) = dict
data.keys() = ['train', 'valid', 'test']

# Train/valid/test is a list, each element is one dialog
train = data['train']
type(train) = list

# Each dialog is a dict
dialog = train[0]
type(dialog)= dict
dialog.keys() = ['A', 'B', 'topic', 'utts']

# A, B contain meta info about speaker A and B.
# topic defines the dialog prompt topic in Switchboard Corpus.

# utts is a list, each element is a tuple that contain info about an utterance
utts = dialog['utts']
type(utts) = list
utts[0] = ("A" or "B", "utterance in string", [dialog_act, other_meta_info])

# For example, a utterance look like this:
('B','especially your foreign cars',['statement-non-opinion'])

결과 파일을 ./data에 넣고 kgcvae_swda.py에 data_dir을 설정합니다.

확장하다

추가 정보