Téléchargement MinimalGPT - Téléchargement du code source MinimalGPT

MinimalGPT

Code Source AI

v2.0.0

Télécharger

️ Tout le support de MinimalGPT est terminé et est déprécié ! Utilisez Corpus2GPT dans un avenir proche !

https://github.com/abhaskumarsinha/Corpus2GPT

MinimalGPT : le « modèle GPT le plus petit et le plus simple »

[ GPT-1 Paper ] [ 1002 short stories from project guttenberg ] [ logo.com ] [ Transformer - Paper ] [ Huggingface Transformers ] [ TensorFlow ] [ BPE Tokenizer: subword-nmt ]

MinimalGPT est un cadre de code concis, adaptable et rationalisé qui englobe les composants essentiels nécessaires à la construction, à la formation, à l'inférence et au réglage fin du modèle GPT. Ce framework est implémenté exclusivement à l'aide de Keras et TensorFlow, garantissant la compatibilité et la cohérence au sein de l'écosystème plus large d'apprentissage en profondeur.

NOUVEAU : Prise en charge CPU/GPU/TPU et prise en charge du chargement de gros ensembles de données de fichiers !

Spécifications des codes

Dans le référentiel, nous introduisons deux fichiers intégraux qui composent notre framework proposé. Le premier fichier, GPT.py , sert de cadre fondamental et englobe des composants cruciaux tels que des blocs et des couches. Ces composants englobent l'attention multi-têtes, les mécanismes de rétroaction, l'attention aux produits scalaires mis à l'échelle, le codage positionnel, la sortie softmaxed et une fonction d'inférence pour la prédiction du modèle. Le deuxième fichier, MinimalGPT .py , rationalise l'utilisation de notre framework en offrant une interface de ligne de commande concise. Cette interface permet aux utilisateurs d'effectuer sans effort des opérations essentielles, notamment la création de modèles, la formation, la sauvegarde, le chargement, le réglage fin et l'inférence, le tout condensé en une seule exécution de ligne de commande. De plus, les fichiers peuvent être facilement importés dans le code Python, permettant aux utilisateurs de les intégrer de manière transparente dans leurs projets via un simple appel de fonction.

Exigences

Exécutez la commande suivante pour installer les dépendances requises à partir du fichier Requirements.txt :


pip install -r requirements.txt

Usage

L'architecture du modèle est régie par plusieurs paramètres critiques, notamment GPT_INPUT, D_MODEL, MULTI_HEAD et DECODER_STACKS . Il est impératif d'assurer la cohérence de ces paramètres pour éviter les problèmes liés au chargement du modèle pour des processus de recyclage ou d'inférence ultérieurs. Dans les situations où une incertitude surgit, la référence au fichier de configuration généré lors de l'exécution précédente peut fournir des informations précieuses. De plus, les paramètres VOCABULARY_START et VOCABULARY_END jouent un rôle crucial dans la définition des marqueurs de fenêtre pour le corpus. Ces marqueurs aident à générer la couche Vectorizer, qui extrait le vocabulaire du corpus dans les limites du nombre de jetons START et END spécifié. Il est essentiel de noter que les jetons au sein du corpus sont séparés par des espaces, et l'inclusion de VOCABULARY_START et VOCABULARY_END devient particulièrement pertinente lorsqu'un fichier de jetons n'est pas explicitement spécifié.

Notez également que les DEUX fichiers de tokenizer ainsi que les poids du modèle sont enregistrés/chargés à la fois. Actuellement, le code ne prend pas en charge la sauvegarde/le chargement de ces deux fichiers séparément.

Le mode d'inférence (-i) ne nécessite pas seulement des paramètres de modèle et un fichier de tokeniseur et de pondération enregistré pour générer des données d'inférence. Il doit être utilisé avec le commutateur (-ol).

MinimalGPT.py [-h] [-d DATA_PATH] [-l LEARNING_RATE] [-ol OUTPUT_LENGTH] [-e EPOCHS] [-b BATCH_SIZE] [-s GPT_INPUT] [-dm D_MODEL] [-p MULTI_HEAD] [-ds DECODER_STACKS] [-ts TOKEN_START] [-te TOKEN_END] [-vs VOCABULARY_START] [-ve VOCABULARY_END] [-sd] [-lt LOAD_TOKENIZER] [-lw LOAD_WEIGHTS] [-st SAVE_TOKENIZER] [-sw SAVE_WEIGHTS] [-ot OPTIMIZER] [-i] [-mv] [-mvo] optional arguments: -h, --help show this help message and exit -d DATA_PATH, --data-path DATA_PATH File: Corresponding to corpus or training text [String] -l LEARNING_RATE, --learning-rate LEARNING_RATE Float: Learning Rate. The model will train ONLY IF the rate is > 0, skip otherwise [Float] -ol OUTPUT_LENGTH, --output-length OUTPUT_LENGTH Length of the output sequence to be generated -e EPOCHS, --epochs EPOCHS Number of training Epochs [Int] -b BATCH_SIZE, --batch-size BATCH_SIZE Size of each batch [Int] -s GPT_INPUT, --gpt-input GPT_INPUT Number of Tokens of text the model inputs at a time [Int] -dm D_MODEL, --d-model D_MODEL Embedding layer output dimensions [Int] -p MULTI_HEAD, --multi-head MULTI_HEAD Number of Multi-head Attention layer in parallel [Int] -ds DECODER_STACKS, --decoder-stacks DECODER_STACKS Number of stacked Decoder layer [Int] -ts TOKEN_START, --token-start TOKEN_START The token number in the corpus to mark it as the starting point of the training [Int] -te TOKEN_END, --token-end TOKEN_END The token number in the corpus to mark it as the end point of the training [Int] -vs VOCABULARY_START, --vocabulary-start VOCABULARY_START Token number from the corpus to mark the starting point of vocabulary data [Int] -ve VOCABULARY_END, --vocabulary-end VOCABULARY_END Token number from the corpus to mark the end point of vocabulary data [Int] -sd, --save Save the Model and Vectorizer data to disk [True/False] -lt LOAD_TOKENIZER, --load-tokenizer LOAD_TOKENIZER File: Vectorization layer [File] -lw LOAD_WEIGHTS, --load-weights LOAD_WEIGHTS File: Model Weights [File] -st SAVE_TOKENIZER, --save-tokenizer SAVE_TOKENIZER File: Saving Vectorizer File [File] -sw SAVE_WEIGHTS, --save-weights SAVE_WEIGHTS File: Saving Model Weights[File] -ot OPTIMIZER, --optimizer OPTIMIZER Optimizer consistent to TensorFlow optimizer class [tf.keras.optimizers] -i, --inference-only Only Print the output of the model in Inference Mode [True/False] -mv, --model-vectorizer Return Model, Vectorizer Tuple [True/False] -mvo, --model-vectorizer-output Return Model, Vectorizer, Output Tuple [True/False]">


usage: MinimalGPT .py [-h] [-d DATA_PATH] [-l LEARNING_RATE]
                     [-ol OUTPUT_LENGTH] [-e EPOCHS] [-b BATCH_SIZE]
                     [-s GPT_INPUT] [-dm D_MODEL] [-p MULTI_HEAD]
                     [-ds DECODER_STACKS] [-ts TOKEN_START] [-te TOKEN_END]
                     [-vs VOCABULARY_START] [-ve VOCABULARY_END] [-sd]
                     [-lt LOAD_TOKENIZER] [-lw LOAD_WEIGHTS]
                     [-st SAVE_TOKENIZER] [-sw SAVE_WEIGHTS] [-ot OPTIMIZER]
                     [-i] [-mv] [-mvo]

optional arguments:
  -h, --help            show this help message and exit
  -d DATA_PATH, --data-path DATA_PATH
                        File: Corresponding to corpus or training text
                        [String]
  -l LEARNING_RATE, --learning-rate LEARNING_RATE
                        Float: Learning Rate. The model will train ONLY IF the
                        rate is > 0, skip otherwise [Float]
  -ol OUTPUT_LENGTH, --output-length OUTPUT_LENGTH
                        Length of the output sequence to be generated
  -e EPOCHS, --epochs EPOCHS
                        Number of training Epochs [Int]
  -b BATCH_SIZE, --batch-size BATCH_SIZE
                        Size of each batch [Int]
  -s GPT_INPUT, --gpt-input GPT_INPUT
                        Number of Tokens of text the model inputs at a time
                        [Int]
  -dm D_MODEL, --d-model D_MODEL
                        Embedding layer output dimensions [Int]
  -p MULTI_HEAD, --multi-head MULTI_HEAD
                        Number of Multi-head Attention layer in parallel [Int]
  -ds DECODER_STACKS, --decoder-stacks DECODER_STACKS
                        Number of stacked Decoder layer [Int]
  -ts TOKEN_START, --token-start TOKEN_START
                        The token number in the corpus to mark it as the
                        starting point of the training [Int]
  -te TOKEN_END, --token-end TOKEN_END
                        The token number in the corpus to mark it as the end
                        point of the training [Int]
  -vs VOCABULARY_START, --vocabulary-start VOCABULARY_START
                        Token number from the corpus to mark the starting
                        point of vocabulary data [Int]
  -ve VOCABULARY_END, --vocabulary-end VOCABULARY_END
                        Token number from the corpus to mark the end point of
                        vocabulary data [Int]
  -sd, --save           Save the Model and Vectorizer data to disk
                        [True/False]
  -lt LOAD_TOKENIZER, --load-tokenizer LOAD_TOKENIZER
                        File: Vectorization layer [File]
  -lw LOAD_WEIGHTS, --load-weights LOAD_WEIGHTS
                        File: Model Weights [File]
  -st SAVE_TOKENIZER, --save-tokenizer SAVE_TOKENIZER
                        File: Saving Vectorizer File [File]
  -sw SAVE_WEIGHTS, --save-weights SAVE_WEIGHTS
                        File: Saving Model Weights[File]
  -ot OPTIMIZER, --optimizer OPTIMIZER
                        Optimizer consistent to TensorFlow optimizer class
                        [tf.keras.optimizers]
  -i, --inference-only  Only Print the output of the model in Inference Mode
                        [True/False]
  -mv, --model-vectorizer
                        Return Model, Vectorizer Tuple [True/False]
  -mvo, --model-vectorizer-output
                        Return Model, Vectorizer, Output Tuple [True/False]

Exemples

Exemple de création de modèle et de formation

En supposant que les spécifications du modèle souhaité impliquent GPT_INPUT = 10, D_MODEL = 128, MULTI_HEAD = 8 et DECODER_STACKS = 1, ainsi que la plage de jetons de corpus pour les périodes de formation de TOKEN_START = 0 à TOKEN_END = 40 000, et génèrent la couche de vectorisation à partir de la période de corpus de VOCABULARY_START = 0 à VOCABULARY_END = 200000, la commande suivante est exécutée pour lancer le processus de formation du modèle. Les poids résultants et les données du tokenizer sont enregistrés dans le dossier désigné. Les sorties suivantes illustrent le résultat de l’exécution de cette commande.

MinimalGPT.py -d './dataset/output_dataset.txt' -l 0.001 -ol 200 -e 4 -b 512 -s 10 -dm 128 -p 8 -ds 1 -ts 0 -te 40000 -vs 0 -ve 200000 -sd -st './models/tokenizer.mgt' -sw './models/weights.mgw' Total tokens: 40000 100%|██████████████████████████████████████████████████████████████████████████████| 200000/200000 [02:02<00:00, 1636.38it/s] New Vectorizer created successfully... Vocabulary Size: 14270 100%|██████████████████████████████████████████████████████████████████████████████| 39989/39989 [00:00<00:00, 302926.25it/s] 100%|█████████████████████████████████████████████████████████████████████████████| 39989/39989 [00:00<00:00, 1289942.19it/s] (None, 10, 128) Epoch 1/4 79/79 [==============================] - 88s 1s/step - loss: 7.8692 Epoch 2/4 79/79 [==============================] - 92s 1s/step - loss: 3.8066 Epoch 3/4 79/79 [==============================] - 93s 1s/step - loss: 1.1487 Epoch 4/4 79/79 [==============================] - 92s 1s/step - loss: 0.2900 100%|██████████████████████████████████████████████████████████████████████████████████████| 190/190 [00:05<00:00, 34.70it/s] Vocabulary size saved: 14270 and her eyes in the library. She was the rather large woman, although not fat, and when she wore high heels--which sh e was not prone to do, because although Cutter would not have cared, she kept trying to project into other people's minds and trying, as she said, "Not to do anything to them, that I wouldn't want them to do you me."--she rose a good inch above Cutter. She was pleasant humored, and cooperative, and the one great irritant about her that annoyed Cutter, was the fact that she wa s not capable of meeting life wholeheartedly and with strength. She steadily worried about other people's feelings and thought s, so that Cutter wondered if she were capable of the slightest personal conviction. Yet that weakness was an advantage at the same time, to him, because she worked constantly toward making him happy. The house was run to his minutest liking, and the s ervants liked her, so that while she did not use a strong enough">


PS C:gpt> python MinimalGPT .py -d './dataset/output_dataset.txt' -l 0.001 -ol 200 -e 4 -b 512 -s 10 -dm 128 -p 8 -ds 1 -ts 0 -te 40000 -vs 0 -ve 200000 -sd -st './models/tokenizer.mgt' -sw './models/weights.mgw'
Total tokens: 40000
100%|██████████████████████████████████████████████████████████████████████████████| 200000/200000 [02:02<00:00, 1636.38it/s]
New Vectorizer created successfully...
Vocabulary Size: 14270
100%|██████████████████████████████████████████████████████████████████████████████| 39989/39989 [00:00<00:00, 302926.25it/s]
100%|█████████████████████████████████████████████████████████████████████████████| 39989/39989 [00:00<00:00, 1289942.19it/s]
(None, 10, 128)
Epoch 1/4
79/79 [==============================] - 88s 1s/step - loss: 7.8692
Epoch 2/4
79/79 [==============================] - 92s 1s/step - loss: 3.8066
Epoch 3/4
79/79 [==============================] - 93s 1s/step - loss: 1.1487
Epoch 4/4
79/79 [==============================] - 92s 1s/step - loss: 0.2900
100%|██████████████████████████████████████████████████████████████████████████████████████| 190/190 [00:05<00:00, 34.70it/s]
Vocabulary size saved: 14270
         and her eyes in the library. She was the rather large woman, although not fat, and when she wore high heels--which sh
e was not prone to do, because although Cutter would not have cared, she kept trying to project into other people's minds and
trying, as she said, "Not to do anything to them, that I wouldn't want them to do you me."--she rose a good inch above Cutter.
 She was pleasant humored, and cooperative, and the one great irritant about her that annoyed Cutter, was the fact that she wa
s not capable of meeting life wholeheartedly and with strength. She steadily worried about other people's feelings and thought
s, so that Cutter wondered if she were capable of the slightest personal conviction. Yet that weakness was an advantage at the
 same time, to him, because she worked constantly toward making him happy. The house was run to his minutest liking, and the s
ervants liked her, so that while she did not use a strong enough

Réglage fin

Supposons que nous souhaitions affiner le modèle ci-dessus (ou le recycler), alors la commande pour recharger le tokenizer et les poids et le recycler sur un nouveau texte d'une plage de fenêtres spécifiée du corpus est donnée ci-dessous :

MinimalGPT.py -d './dataset/output_dataset.txt' -l 0.00005 -ol 200 -e 1 -b 512 -s 10 -dm 128 -p 8 -ds 1 -ts 80000 -te 120000 -sd -st './models/tokenizer2.mgt' -sw './models/weights2.mgw' -lt './models/tokenizer.mgt' -lw './models/weights.mgw' Total tokens: 40000 100%|██████████████████████████████████████████████████████████████████████████████| 39989/39989 [00:00<00:00, 302923.51it/s] 100%|█████████████████████████████████████████████████████████████████████████████| 39989/39989 [00:00<00:00, 1428099.68it/s] (None, 10, 128) 79/79 [==============================] - 81s 993ms/step - loss: 7.9725 100%|██████████████████████████████████████████████████████████████████████████████████████| 190/190 [00:06<00:00, 30.29it/s] Vocabulary size saved: 14270 of her own the black of my own and my wife had could seen the house at the same moment her mind caught the first sugg estion of the folded paper. “But he must have a name! Where is the paper?” She moved to the desk, and began to turn over the s cattered documents that littered it. The first that caught her eye was an unfinished letter in her husband’s hand, with his pe n lying across it, as though dropped there at a sudden summons. “My dear Parvis,”--who was Parvis?--“I have just received your letter announcing Elwell’s death, and while I suppose there is now no farther risk of trouble, it might be safer--” That was all. The “risk of trouble” was easily explained by the newspaper clipping which had apprised Mary of the suit brought against her husband by one of his associates in the Blue Star enterprise. The only new information conveyed in the letter was the fact of its showing Boyne,">


PS C:gpt> python MinimalGPT .py -d './dataset/output_dataset.txt' -l 0.00005 -ol 200 -e 1 -b 512 -s 10 -dm 128 -p 8 -ds 1 -ts 80000 -te 120000 -sd -st './models/tokenizer2.mgt' -sw './models/weights2.mgw' -lt './models/tokenizer.mgt' -lw './models/weights.mgw'
Total tokens: 40000
100%|██████████████████████████████████████████████████████████████████████████████| 39989/39989 [00:00<00:00, 302923.51it/s]
100%|█████████████████████████████████████████████████████████████████████████████| 39989/39989 [00:00<00:00, 1428099.68it/s]
(None, 10, 128)
79/79 [==============================] - 81s 993ms/step - loss: 7.9725
100%|██████████████████████████████████████████████████████████████████████████████████████| 190/190 [00:06<00:00, 30.29it/s]
Vocabulary size saved: 14270
         of her own the black of my own and my wife had could seen the house at the same moment her mind caught the first sugg
estion of the folded paper. “But he must have a name! Where is the paper?” She moved to the desk, and began to turn over the s
cattered documents that littered it. The first that caught her eye was an unfinished letter in her husband’s hand, with his pe
n lying across it, as though dropped there at a sudden summons. “My dear Parvis,”--who was Parvis?--“I have just received your
 letter announcing Elwell’s death, and while I suppose there is now no farther risk of trouble, it might be safer--” That was
all. The “risk of trouble” was easily explained by the newspaper clipping which had apprised Mary of the suit brought against
her husband by one of his associates in the Blue Star enterprise. The only new information conveyed in the letter was the fact
 of its showing Boyne,

Mode d'inférence

Le mode d'inférence implique le chargement de poids pré-entraînés et d'un vectoriseur. Ces composants sont ensuite utilisés pour exécuter le modèle, générant des sorties d'une longueur spécifiée comme spécifié.

MinimalGPT.py -i -ol 500 -e 6 -b 512 -s 10 -dm 128 -p 8 -ds 1 -lt './models/tokenizer2.mgt' -lw './models/weights2.mgw' (None, 10, 128) 100%|██████████████████████████████████████████████████████████████████████████████████████| 490/490 [00:13<00:00, 35.93it/s] of her own “on the other from the inel’--a little sensational, of course. But I guess you’d better look it over.” He held out a newspaper to Mary, who unfolded it slowly, remembering, as she did so, the evening when, in that same room, the per usal of a clipping from the “Sentinel” had first shaken the depths of her security. As she opened the paper, her eyes, shrinki ng from the glaring head-lines, “Widow of Boyne’s Victim Forced to Appeal for Aid,” ran down the column of text to two portrai ts inserted in it. The first was her husband’s, taken from a photograph made the year they had come to England. It was the pic ture of him that she liked best, the one that stood on the writing-table up-stairs in her bedroom. As the eyes in the photogra ph met hers, she felt it would be impossible to read what was said of him, and closed her lids with the sharpness of the pain. “I thought if you felt disposed to put your name down--” she heard Parvis continue. She opened her eyes with an effort, and t hey fell on the other portrait. It was that of a youngish man, slightly built, in rough clothes, with features somewhat blurre d by the shadow of a projecting hat-brim. Where had she seen that outline before? She stared at it confusedly, her heart hamme ring in her throat and ears. Then she gave a cry. “This is the man--the man who came for my husband!” She heard Parvis start t o his feet, and was dimly aware that she had slipped backward into the corner of the sofa, and that he was bending above her i n alarm. With an intense effort she straightened herself, and reached out for the paper, which she had dropped. “It’s the man! I should know him anywhere!” she cried in a voice that sounded in her own ears like a scream. Parvis’s voice seemed to come t o her from far off, down endless, fog-muffled windings. “Mrs. Boyne, you’re not very well. Shall I call somebody? Shall I get a glass of water?” “No, no, no!” She threw herself toward him, her hand frantically clenching the newspaper. “I tell you, it’s the man! I KNOW him! He spoke to me in the garden!” Parvis took the journal from her, directing his glasses to the portrait. “It can’t be, Mrs. Boyne. It’s Robert Elwell.” “Robert Elwell?” Her white">

 PS C:gpt> python MinimalGPT .py -i -ol 500 -e 6 -b 512 -s 10 -dm 128 -p 8 -ds 1 -lt './models/tokenizer2.mgt' -lw './models/weights2.mgw'
(None, 10, 128)
100%|██████████████████████████████████████████████████████████████████████████████████████| 490/490 [00:13<00:00, 35.93it/s]
of her own “on the other from the inel’--a little sensational, of course. But I guess you’d better look it over.” He
held out a newspaper to Mary, who unfolded it slowly, remembering, as she did so, the evening when, in that same room, the per
usal of a clipping from the “Sentinel” had first shaken the depths of her security. As she opened the paper, her eyes, shrinki
ng from the glaring head-lines, “Widow of Boyne’s Victim Forced to Appeal for Aid,” ran down the column of text to two portrai
ts inserted in it. The first was her husband’s, taken from a photograph made the year they had come to England. It was the pic
ture of him that she liked best, the one that stood on the writing-table up-stairs in her bedroom. As the eyes in the photogra
ph met hers, she felt it would be impossible to read what was said of him, and closed her lids with the sharpness of the pain.
“I thought if you felt disposed to put your name down--” she heard Parvis continue. She opened her eyes with an effort, and t
hey fell on the other portrait. It was that of a youngish man, slightly built, in rough clothes, with features somewhat blurre
d by the shadow of a projecting hat-brim. Where had she seen that outline before? She stared at it confusedly, her heart hamme
ring in her throat and ears. Then she gave a cry. “This is the man--the man who came for my husband!” She heard Parvis start t
o his feet, and was dimly aware that she had slipped backward into the corner of the sofa, and that he was bending above her i
n alarm. With an intense effort she straightened herself, and reached out for the paper, which she had dropped. “It’s the man!
I should know him anywhere!” she cried in a voice that sounded in her own ears like a scream. Parvis’s voice seemed to come t
o her from far off, down endless, fog-muffled windings. “Mrs. Boyne, you’re not very well. Shall I call somebody? Shall I get
a glass of water?” “No, no, no!” She threw herself toward him, her hand frantically clenching the newspaper. “I tell you, it’s
the man! I KNOW him! He spoke to me in the garden!” Parvis took the journal from her, directing his glasses to the portrait.
“It can’t be, Mrs. Boyne. It’s Robert Elwell.” “Robert Elwell?” Her white

Importer le modèle dans un projet

L'intégration des modèles formés générés par l'utilisation de MinimalGPT .py dans votre projet est un processus simple facilité par l'importation de la fonction MinimalGPT et sa configuration selon les spécifications souhaitées. Ceci peut être réalisé en définissant les paramètres return_model_and_vectorizer = True ou return_model_and_vectorizer_and_output = True dans le cadre inference_only = True (mode d'inférence). De plus, la formation, la création et l'exportation du modèle peuvent être réalisées en utilisant une approche similaire, parallèlement au mode ligne de commande. Pour une illustration complète de ces procédures, le Jupyter Notebook fourni fournit un exemple de démonstration.

from MinimalGPT import MinimalGPT model = MinimalGPT (output_length = 200, gpt_input = 10, d_model = 128, h = 8, decoder_stacks = 1, load_tokenizer = './models/tokenizer3.mgt', load_weights = './models/weights3.mgw', inference_only = True, return_model_and_vectorizer_and_output = True) model[0].summary()
 Model: "model"
 Layer (type) Output Shape Param
 ================================================================= input_1 (InputLayer) [(None, 10)] 0
 embedding (Embedding) (None, 10, 128) 1826816
 positional_embedding (Posit (None, 10, 128) 0
 ionalEmbedding)
 decoder (Decoder) (None, 10, 128) 37160
 flatten (Flatten) (None, 1280) 0
 dense (Dense) (None, 14273) 18283713
 tf.nn.softmax (TFOpLambda) (None, 14273) 0
 ================================================================= Total params: 20,147,689 Trainable params: 20,147,689 Non-trainable params: 0

Spécifications de mise en œuvre

Le modèle implémenté ici diffère un peu par rapport à l'implémentation papier originale. La matrice formée après la concaténation des têtes de la sortie du produit scalaire mis à l'échelle est multipliée par le paramètre matriciel de taille clé dimension x d_model. Pour des raisons pratiques, ce petit ajustement visant à réduire le nombre de paramètres entraînerait une légère augmentation des performances grâce à l'optimisation des paramètres pouvant être entraînés.

Résultats

Suivez le dossier d'exemple pour les blocs-notes contenant les exemples.

Dépannage

N'hésitez pas à ouvrir des tickets dans l'onglet Problème si vous rencontrez une erreur ou si vous avez une demande de fonctionnalité spécifique en tête.

Références/Lectures complémentaires

Vaswani, Ashish et coll. "L'attention est tout ce dont vous avez besoin." Progrès dans les systèmes de traitement de l’information neuronale 30 (2017).
Radford, Alec et coll. "Améliorer la compréhension des langues grâce à une pré-formation générative." (2018).
Radford, Alec et coll. "Les modèles linguistiques sont des apprenants multitâches non supervisés." Blogue OpenAI 1.8 (2019) : 9.
Brown, Tom et coll. "Les modèles linguistiques sont des apprenants rares." Progrès des systèmes de traitement de l'information neuronale 33 (2020) : 1877-1901.
Howard, Jeremy et Sebastian Ruder. "Affinement du modèle de langage universel pour la classification des textes." Préimpression arXiv arXiv:1801.06146 (2018).
Petroni, Fabio et coll. "Les modèles linguistiques comme bases de connaissances ?." Préimpression arXiv arXiv:1909.01066 (2019).

Développer

Informations supplémentaires

Version v2.0.0
Type Code Source AI
Date de mise à jour 2024-12-30
taille 50MB
Provenant de Github

Applications connexes

node telegram bot api

2024-12-14
typebot.io

2024-12-14
python wechaty getting started

2024-12-14
TranscriberBot

2024-12-14
genal chat

2024-12-14
Facemoji

2024-12-14

Recommandé pour vous

chat.petals.dev

Autre code source

1.0.0
GPT Prompt Templates

Autre code source

1.0.0
GPTyped

Autre code source

GPTyped 1.0.5
node telegram bot api

Code Source AI

v0.50.0
typebot.io

Code Source AI

v3.1.2
python wechaty getting started

Code Source AI

1.0.0
waymo open dataset

Autre code source

December 2023 Update
termwind

Autres catégories

v2.3.0
wp functions

Autres catégories

1.0.0

Actualités connexes Tout