PCPM 다운로드 - PCPM 소스 코드 다운로드

PCPM

AI 소스 코드

1.0.0

다운로드

PCPM

사전 훈련된 모델 의 C orpus를 제시 합니다. NLP의 사전 훈련된 모델과 훈련 스크립트가 포함된 음성에 대한 링크입니다.

NLP의 급속한 발전으로 텍스트와 관련된 기계 학습 프로젝트를 부트스트랩하는 것이 점점 더 쉬워지고 있습니다. 기본 코드로 시작하는 대신 이제 기본 사전 학습 모델로 시작할 수 있으며 몇 번의 반복 내에 SOTA 성능을 얻을 수 있습니다. 이 리포지토리는 사전 훈련된 모델이 집단적 인적 노력과 자원 비용을 최소화하여 현장 개발을 가속화한다는 관점에서 만들어졌습니다.

나열된 모델은 널리 사용되는 pytorch 또는 tensorflow용으로 선별되었습니다.

참고: pytorch-transofmers 는 NLP의 사전 훈련된 많은 모델로부터 빠르게 추론/미세 조정하는 데 사용할 수 있는 멋진 라이브러리입니다. 사전 훈련된 모델은 여기에 포함되지 않습니다.

내용물

텍스트 ML 모델
음성을 텍스트로 변환하는 모델
데이터 세트
수치의 전당
비 영어 모델
기타 컬렉션

텍스트 ML

언어 모델

이름	링크	훈련 날짜	훈련 스크립트
트랜스포머-XL	https://github.com/kimiyoung/transformer-xl/tree/master/tf#obtain-and-evaluate-pretrained-sota-models	`enwik8` , `lm1b` , `wt103` , `text8`	https://github.com/kimiyoung/transformer-xl
GPT-2	https://github.com/openai/gpt-2/blob/master/download_model.py	`webtext`	https://github.com/nshepperd/gpt-2/
적응형 입력(fairseq)	https://github.com/pytorch/fairseq/blob/master/examples/언어_model/README.md#pre-trained-models	`lm1b`	https://github.com/pytorch/fairseq/blob/master/examples/언어_model/README.md

순열 언어 모델링 기반 - XLNet

이름	링크	훈련 날짜	훈련 스크립트
XLnet	https://github.com/zihangdai/xlnet/#released-models	`booksCorpus` + `English Wikipedia` + `Giga5` + `ClueWeb 2012-B` + `Common Crawl`	https://github.com/zihangdai/xlnet/

마스크된 언어 모델링 기반 - Bert

이름	링크	훈련 날짜	훈련 스크립트
로베르타	https://github.com/pytorch/fairseq/tree/master/examples/roberta#pre-trained-models	booksCorpus+CC-N EWS+OpenWebText+CommonCrawl-Stories	https://github.com/huggingface/transformers
버트	https://github.com/google-research/bert/	책코퍼스+영어 위키피디아	https://github.com/huggingface/transformers
MT-DNN	https://mrc.blob.core.windows.net/mt-dnn-model/mt_dnn_base.pt (https://github.com/namisan/mt-dnn/blob/master/download.sh)	아교	https://github.com/namisan/mt-dnn

기계 번역

이름	링크	훈련 날짜	훈련 스크립트
오픈NMT	http://opennmt.net/Models-py/(pytorch) http://opennmt.net/Models-tf/(tensorflow)	영어-독일어	https://github.com/OpenNMT/OpenNMT-py
Fairseq(다중 모델)	https://github.com/pytorch/fairseq/blob/master/examples/translation/README.md#pre-trained-models	WMT14 영어-프랑스어, WMT16 영어-독일어	https://github.com/pytorch/fairseq/blob/master/examples/translation/README.md

감정

이름	링크	훈련 날짜	훈련 스크립트
Nvidia 감정 발견	https://github.com/NVIDIA/sentiment-discovery#pretrained-models	SST, imdb, Semeval-2018-트윗-감정	https://github.com/NVIDIA/sentiment-discovery
MT-DNN 감정	https://drive.google.com/open?id=1-ld8_WpdQVDjPeYhb3AK8XYLGlZEbs-l	SST	https://github.com/namisan/mt-dnn

독해력

스쿼드 1.1

계급	이름	링크	훈련 스크립트
49	비다프	https://s3-us-west-2.amazonaws.com/allennlp/models/bidaf-model-2017.09.15-charpad.tar.gz	https://github.com/allenai/allennlp

요약

영어 요약 모델

이름	링크	훈련 날짜	훈련 스크립트
오픈NMT	http://opennmt.net/Models-py/	기가워드 표준	https://github.com/OpenNMT/OpenNMT-py

음성을 텍스트로

이름	링크	훈련 날짜	훈련 스크립트
NeMo-쿼츠넷	https://ngc.nvidia.com/catalog/models/nvidia:quartznet15x5	librispeech, mozilla-common-voice	https://github.com/NVIDIA/NeMo
OpenSeq2Seq-Jasper	https://nvidia.github.io/OpenSeq2Seq/html/speech-recognition.html#models	도서관 연설	https://github.com/NVIDIA/OpenSeq2Seq
에스프넷	https://github.com/espnet/espnet#asr-results	librispeech,Aishell,HKUST,TEDLIUM2	https://github.com/espnet/espnet
wav2문자++	https://talonvoice.com/research/	도서관 연설	https://github.com/facebookresearch/wav2letter
Deepspeech2 파이토치	SeanNaren/deepspeech.pytorch#299(댓글)	도서관 연설	https://github.com/SeanNaren/deepspeech.pytorch
깊은 연설	https://github.com/mozilla/DeepSpeech#getting-the-pre-trained-model	mozilla-common-voice, librispeech, 피셔, 스위치보드	https://github.com/mozilla/DeepSpeech
음성-텍스트-wavenet	https://github.com/buriburisuri/speech-to-text-wavenet#pre-trained-models	vctk	https://github.com/buriburisuri/speech-to-text-wavenet
16k에	https://github.com/at16k/at16k#download-models	해당 없음	해당 없음

데이터 세트

이 문서에서 참조된 데이터세트

언어 모델 데이터

일반적인 크롤링

http://commoncrawl.org/

엔윅8

Wikipedia 데이터 덤프(대형 텍스트 압축 벤치마크) http://mattmahoney.net/dc/textdata.html

텍스트8

Wikipedia 정리된 텍스트(대형 텍스트 압축 벤치마크) http://mattmahoney.net/dc/textdata.html

lm1b

10억 단어 언어 모델 벤치마크 https://www.statmt.org/lm-benchmark/

중량103

위키텍스트 103 https://blog.einstein.ai/the-wikitext-long-term-dependent-언어-modeling-dataset/

웹텍스트

저자가 공개하지 않은 원본 데이터 세트. 오픈 소스 컬렉션은 https://skylion007.github.io/OpenWebTextCorpus/에서 사용할 수 있습니다.

영어 위키피디아

https://en.wikipedia.org/wiki/Wikipedia:Database_download#English-언어_Wikipedia

책코퍼스

https://yknzhu.wixsite.com/mbweb https://github.com/soskek/bookcorpus

감정

SST

스탠포드 감정 트리 뱅크 https://nlp.stanford.edu/sentiment/index.html. Glue 작업 중 하나입니다.

IMDB

감정 분류에 사용되는 IMDB 영화 리뷰 데이터 세트 http://ai.stanford.edu/~amaas/data/sentiment

Semeval2018te

Semeval 2018 트윗 감정 데이터세트 https://competitions.codalab.org/competitions/17751

아교

Glue는 자연어 시스템을 벤치마킹하기 위한 리소스 모음입니다. https://gluebenchmark.com/ 자연어 추론, 감정 분류, 의역 감지, 유사성 일치 및 언어 수용 가능성에 대한 데이터 세트가 포함되어 있습니다.

음성을 텍스트 데이터로

어부

https://pdfs.semanticscholar.org/a723/97679079439b075de815553c7b687ccfa886.pdf

도서관 연설

www.danielpovey.com/files/2015_icassp_librispeech.pdf

배전반

https://ieeeexplore.ieee.org/document/225858/

모질라 공통 음성

https://github.com/mozilla/voice-web

vctk

https://datashare.is.ed.ac.uk/handle/10283/2651

수치의 전당

사전 훈련된 모델 및/또는 공개용 코드를 포함하지 않는 고품질 연구입니다.

KERMIT https://arxiv.org/abs/1906.01604 시퀀스를 위한 생성적 삽입 기반 모델링. 코드가 없습니다.

비영어권

기타 컬렉션

알렌 NLP

pytorch를 기반으로 구축된 Allen nlp는 SOTA 모델을 생산하고 이를 오픈 소스로 제공했습니다. https://github.com/allenai/allennlp/blob/master/MODELS.md

https://demo.allennelp.org/에서 다양한 작업에 대한 깔끔한 대화형 데모를 볼 수 있습니다.

GluonNLP

MXNet을 기반으로 하는 이 라이브러리에는 NLP의 다양한 작업에 대한 사전 훈련된 모델의 광범위한 목록이 있습니다. http://gluon-nlp.mxnet.io/master/index.html#model-zoo

확장하다

추가 정보

버전 1.0.0
유형 AI 소스 코드
업데이트 시간 2024-12-31
크기 50MB
출처 Github