PCPMダウンロード - PCPMソースコードのダウンロード

PCPM

AI ソースコード

1.0.0

ダウンロード

PCPM

事前訓練済みモデルのコーパスを提示します。 NLP および音声によるトレーニングスクリプトの事前トレーニング済みモデルへのリンク。

NLP の急速な進歩により、テキストを含む機械学習プロジェクトのブートストラップが容易になってきています。基本コードから開始する代わりに、基本の事前トレーニング済みモデルから開始して、数回の反復で SOTA パフォーマンスを得ることができるようになりました。このリポジトリは、事前トレーニングされたモデルによって人的労力とリソースのコストが最小限に抑えられ、現場での開発が加速されるという観点から作成されています。

リストされているモデルは、幅広く使用されているため、pytorch または tensorflow 用に厳選されています。

注: pytorch-transofmers NLP で事前トレーニングされた多くのモデルから迅速に推論/微調整するために使用できる素晴らしいライブラリです。これらの事前トレーニング済みモデルはここには含まれていません。

コンテンツ

テキスト ML モデル
音声テキスト変換モデル
データセット
恥の館
英語以外のモデル
その他のコレクション

テキストML

言語モデル

名前	リンク	訓練された	トレーニングスクリプト
トランスフォーマー-XL	https://github.com/kimiyoung/transformer-xl/tree/master/tf#obtain-and-evaluate-pretrained-sota-models	`enwik8` 、 `lm1b` 、 `wt103` 、 `text8`	https://github.com/kimiyoung/transformer-xl
GPT-2	https://github.com/openai/gpt-2/blob/master/download_model.py	`webtext`	https://github.com/nshepperd/gpt-2/
アダプティブ入力 (fairseq)	https://github.com/pytorch/fairseq/blob/master/examples/ language_model/README.md#pre-trained-models	`lm1b`	https://github.com/pytorch/fairseq/blob/master/examples/ language_model/README.md

順列言語モデリングベース - XLNet

名前	リンク	訓練された	トレーニングスクリプト
XLネット	https://github.com/zihangdai/xlnet/#published-models	`booksCorpus` + `English Wikipedia` + `Giga5` + `ClueWeb 2012-B` + `Common Crawl`	https://github.com/zihangdai/xlnet/

マスクされた言語モデリングベース - Bert

名前	リンク	訓練された	トレーニングスクリプト
ロベルタ	https://github.com/pytorch/fairseq/tree/master/examples/roberta#pre-trained-models	書籍コーパス+CC-N EWS+OpenWebText+CommonCrawl-Stories	https://github.com/huggingface/transformers
バート	https://github.com/google-research/bert/	書籍コーパス+英語版ウィキペディア	https://github.com/huggingface/transformers
MT-DNN	https://mrc.blob.core.windows.net/mt-dnn-model/mt_dnn_base.pt (https://github.com/namisan/mt-dnn/blob/master/download.sh)	のり	https://github.com/namisan/mt-dnn

機械翻訳

名前	リンク	訓練された	トレーニングスクリプト
OpenNMT	http://opennmt.net/Models-py/ (pytorch) http://opennmt.net/Models-tf/ (tensorflow)	英語-ドイツ語	https://github.com/OpenNMT/OpenNMT-py
Fairseq (複数のモデル)	https://github.com/pytorch/fairseq/blob/master/examples/translation/README.md#pre-trained-models	WMT14 英語-フランス語、WMT16 英語-ドイツ語	https://github.com/pytorch/fairseq/blob/master/examples/translation/README.md

感情

名前	リンク	訓練された	トレーニングスクリプト
Nvidia 感情発見	https://github.com/NVIDIA/sentiment-discovery#pretrained-models	SST、imdb、Semeval-2018-tweet-emotion	https://github.com/NVIDIA/sentiment-discovery
MT-DNN のセンチメント	https://drive.google.com/open?id=1-ld8_WpdQVDjPeYhb3AK8XYLGlZEbs-l	SST	https://github.com/namisan/mt-dnn

読解

分隊 1.1

ランク	名前	リンク	トレーニングスクリプト
49	ビダフ	https://s3-us-west-2.amazonaws.com/allennlp/models/bidaf-model-2017.09.15-charpad.tar.gz	https://github.com/allenai/allennlp

要約

英語要約のモデル

名前	リンク	訓練された	トレーニングスクリプト
OpenNMT	http://opennmt.net/Models-py/	ギガワード規格	https://github.com/OpenNMT/OpenNMT-py

音声からテキストへ

名前	リンク	訓練された	トレーニングスクリプト
NeMo-クォーツネット	https://ngc.nvidia.com/catalog/models/nvidia:quartznet15x5	librispeech、mozilla-common-voice	https://github.com/NVIDIA/NeMo
OpenSeq2Seq-ジャスパー	https://nvidia.github.io/OpenSeq2Seq/html/speech-recognition.html#models	リブスピーチ	https://github.com/NVIDIA/OpenSeq2Seq
エスプネット	https://github.com/espnet/espnet#asr-results	librispeech、Aishell、HKUST、TEDLIUM2	https://github.com/espnet/espnet
wav2文字++	https://talonvoice.com/research/	リブスピーチ	https://github.com/facebookresearch/wav2letter
Deepspeech2 パイトーチ	SeanNaren/deepspeech.pytorch#299 (コメント)	リブスピーチ	https://github.com/SeanNaren/deepspeech.pytorch
ディープスピーチ	https://github.com/mozilla/DeepSpeech#getting-the-pre-trained-model	mozilla-common-voice、librispeech、フィッシャー、スイッチボード	https://github.com/mozilla/DeepSpeech
音声からテキストへのウェーブネット	https://github.com/buriburisuri/speech-to-text-wavenet#pre-trained-models	vctk	https://github.com/buriburisuri/speech-to-text-wavenet
16kで	https://github.com/at16k/at16k#download-models	NA	NA

データセット

このドキュメントで参照されているデータセット

言語モデルデータ

一般的なクロール

http://commoncrawl.org/

エンウィク8

Wikipedia データダンプ (大きなテキスト圧縮ベンチマーク) http://mattmahoney.net/dc/textdata.html

テキスト8

Wikipedia のクリーンテキスト (ラージテキスト圧縮ベンチマーク) http://mattmahoney.net/dc/textdata.html

lm1b

10 億ワード言語モデルベンチマーク https://www.statmt.org/lm-benchmark/

wt103

ウィキテキスト 103 https://blog.einstein.ai/the-wikitext-long-term-dependency- language-modeling-dataset/

ウェブテキスト

オリジナルのデータセットは著者によって公開されていません。オープンソースコレクションは https://skylion007.github.io/OpenWebTextCorpus/ で入手できます。

英語版ウィキペディア

https://en.wikipedia.org/wiki/Wikipedia:Database_download#English-lang_Wikipedia

書籍コーパス

https://yknzhu.wixsite.com/mbweb https://github.com/soskek/bookcorpus

感情

SST

スタンフォードセンチメントツリーバンク https://nlp.stanford.edu/sentiment/index.html Glue タスクの 1 つ。

IMDB

感情分類に使用される IMDB 映画レビューデータセット http://ai.stanford.edu/~amaas/data/sentiment

Semeval2018te

Semeval 2018 ツイート感情データセット https://competitions.codalab.org/competitions/17751

のり

Glue は、自然言語システムのベンチマークを行うためのリソースのコレクションです。 https://gluebenchmark.com/ 自然言語推論、感情分類、言い換え検出、類似性照合、および言語の受容性に関するデータセットが含まれています。

音声をテキストデータに変換

漁師

https://pdfs.semanticscholar.org/a723/97679079439b075de815553c7b687ccfa886.pdf

リブスピーチ

www.danielpovey.com/files/2015_icassp_librispeech.pdf

配電盤

https://ieeexplore.ieee.org/document/225858/

Mozilla の共通の声

https://github.com/mozilla/voice-web

vctk

https://datashare.is.ed.ac.uk/handle/10283/2651

恥の館

公開用の事前トレーニング済みモデルやコードが含まれていない高品質の研究。

KERMIT https://arxiv.org/abs/1906.01604 シーケンスの生成挿入ベースのモデリング。コードはありません。

英語以外

その他のコレクション

アレン NLP

pytorch 上に構築された allen nlp は SOTA モデルを作成し、オープンソース化しました。 https://github.com/allenai/allennlp/blob/master/MODELS.md

https://demo.allennlp.org/ には、さまざまなタスクに関するきちんとしたインタラクティブなデモがあります。

グルーオンNLP

MXNet に基づいたこのライブラリには、NLP のさまざまなタスクに関する事前トレーニング済みモデルの広範なリストが含まれています。 http://gluon-nlp.mxnet.io/master/index.html#model-zoo

拡大する

追加情報

バージョン 1.0.0
タイプ AI ソースコード
更新時間 2024-12-31
サイズ 50MB
から Github