MinimalGPT 다운로드 - MinimalGPT 소스 코드 다운로드

MinimalGPT

AI 소스 코드

v2.0.0

다운로드

️ MinimalGPT 에 대한 모든 지원이 종료되어 감가상각되었습니다! 가까운 시일 내에 Corpus2GPT를 사용해 보세요!

https://github.com/abhaskumarsinha/Corpus2GPT

MinimalGPT : '가장 작고 간단한 GPT 모델'

<스팬 클래스= MinimalGPT 로고" width="20%" style="max-width: 100%;">

[ GPT-1 Paper ] [ 1002 short stories from project guttenberg ] [ logo.com ] [ Transformer - Paper ] [ Huggingface Transformers ] [ TensorFlow ] [ BPE Tokenizer: subword-nmt ]

MinimalGPT 는 GPT 모델의 구성, 훈련, 추론 및 미세 조정에 필요한 필수 구성 요소를 포함하는 간결하고 적응 가능하며 간소화된 코드 프레임워크입니다. 이 프레임워크는 Keras와 TensorFlow를 사용하여 독점적으로 구현되어 더 넓은 딥 러닝 생태계 내에서 호환성과 일관성을 보장합니다.

새로운 기능: CPU/GPU/TPU 지원 및 대용량 파일 데이터 세트 로드 지원!

코드 사양

저장소에서는 제안된 프레임워크를 구성하는 두 개의 통합 파일을 소개합니다. 첫 번째 파일인 GPT.py는 기본 프레임워크 역할을 하며 블록 및 레이어와 같은 중요한 구성 요소를 포함합니다. 이러한 구성 요소에는 다중 헤드 주의, 피드포워드 메커니즘, 스케일링 내적 주의, 위치 인코딩, 소프트맥스 출력 및 모델 예측을 위한 추론 기능이 포함됩니다. 두 번째 파일인 MinimalGPT .py는 간결한 명령줄 인터페이스를 제공하여 프레임워크 활용을 간소화합니다. 이 인터페이스를 통해 사용자는 모델 생성, 교육, 저장, 로딩, 미세 조정, 추론 등의 필수 작업을 모두 단일 명령줄 실행으로 압축하여 손쉽게 수행할 수 있습니다. 또한 파일을 Python 코드로 편리하게 가져올 수 있으므로 사용자는 간단한 함수 호출을 통해 해당 파일을 프로젝트에 원활하게 통합할 수 있습니다.

요구사항

요구사항.txt 파일에서 필수 종속성을 설치하려면 다음 명령을 실행하십시오.


pip install -r requirements.txt

용법

모델 아키텍처는 GPT_INPUT, D_MODEL, MULTI_HEAD 및 DECODER_STACKS 를 포함한 여러 중요한 매개변수에 의해 관리됩니다. 후속 재교육 또는 추론 프로세스를 위해 모델을 로드하는 것과 관련된 문제를 방지하려면 이러한 매개변수의 일관성을 보장하는 것이 중요합니다. 불확실성이 발생하는 상황에서는 이전 실행 중에 생성된 구성 파일을 참조하면 귀중한 통찰력을 얻을 수 있습니다. 또한 VOCABULARY_START 및 VOCABULARY_END 매개변수는 말뭉치의 창 마커를 정의하는 데 중요한 역할을 합니다. 이러한 마커는 지정된 START 및 END 토큰 수 내에서 말뭉치에서 어휘를 추출하는 벡터화기 레이어를 생성하는 데 도움이 됩니다. 말뭉치 내의 토큰은 공백으로 구분되며 VOCABULARY_START 및 VOCABULARY_END 포함은 토큰 파일이 명시적으로 지정되지 않은 경우 특히 관련이 있다는 점에 유의해야 합니다.

또한 토크나이저 파일과 모델 가중치가 모두 한 번에 저장/로드된다는 점에 유의하세요. 현재 코드는 이 두 파일을 별도로 저장/로드하는 것을 지원하지 않습니다.

추론 모드(-i)에는 추론 데이터를 생성하기 위해 모델 매개변수와 저장된 토크나이저 및 가중치 파일만 필요한 것이 아닙니다. (-ol) 스위치와 함께 사용해야 합니다.

MinimalGPT.py [-h] [-d DATA_PATH] [-l LEARNING_RATE] [-ol OUTPUT_LENGTH] [-e EPOCHS] [-b BATCH_SIZE] [-s GPT_INPUT] [-dm D_MODEL] [-p MULTI_HEAD] [-ds DECODER_STACKS] [-ts TOKEN_START] [-te TOKEN_END] [-vs VOCABULARY_START] [-ve VOCABULARY_END] [-sd] [-lt LOAD_TOKENIZER] [-lw LOAD_WEIGHTS] [-st SAVE_TOKENIZER] [-sw SAVE_WEIGHTS] [-ot OPTIMIZER] [-i] [-mv] [-mvo] optional arguments: -h, --help show this help message and exit -d DATA_PATH, --data-path DATA_PATH File: Corresponding to corpus or training text [String] -l LEARNING_RATE, --learning-rate LEARNING_RATE Float: Learning Rate. The model will train ONLY IF the rate is > 0, skip otherwise [Float] -ol OUTPUT_LENGTH, --output-length OUTPUT_LENGTH Length of the output sequence to be generated -e EPOCHS, --epochs EPOCHS Number of training Epochs [Int] -b BATCH_SIZE, --batch-size BATCH_SIZE Size of each batch [Int] -s GPT_INPUT, --gpt-input GPT_INPUT Number of Tokens of text the model inputs at a time [Int] -dm D_MODEL, --d-model D_MODEL Embedding layer output dimensions [Int] -p MULTI_HEAD, --multi-head MULTI_HEAD Number of Multi-head Attention layer in parallel [Int] -ds DECODER_STACKS, --decoder-stacks DECODER_STACKS Number of stacked Decoder layer [Int] -ts TOKEN_START, --token-start TOKEN_START The token number in the corpus to mark it as the starting point of the training [Int] -te TOKEN_END, --token-end TOKEN_END The token number in the corpus to mark it as the end point of the training [Int] -vs VOCABULARY_START, --vocabulary-start VOCABULARY_START Token number from the corpus to mark the starting point of vocabulary data [Int] -ve VOCABULARY_END, --vocabulary-end VOCABULARY_END Token number from the corpus to mark the end point of vocabulary data [Int] -sd, --save Save the Model and Vectorizer data to disk [True/False] -lt LOAD_TOKENIZER, --load-tokenizer LOAD_TOKENIZER File: Vectorization layer [File] -lw LOAD_WEIGHTS, --load-weights LOAD_WEIGHTS File: Model Weights [File] -st SAVE_TOKENIZER, --save-tokenizer SAVE_TOKENIZER File: Saving Vectorizer File [File] -sw SAVE_WEIGHTS, --save-weights SAVE_WEIGHTS File: Saving Model Weights[File] -ot OPTIMIZER, --optimizer OPTIMIZER Optimizer consistent to TensorFlow optimizer class [tf.keras.optimizers] -i, --inference-only Only Print the output of the model in Inference Mode [True/False] -mv, --model-vectorizer Return Model, Vectorizer Tuple [True/False] -mvo, --model-vectorizer-output Return Model, Vectorizer, Output Tuple [True/False]">


usage: MinimalGPT .py [-h] [-d DATA_PATH] [-l LEARNING_RATE]
                     [-ol OUTPUT_LENGTH] [-e EPOCHS] [-b BATCH_SIZE]
                     [-s GPT_INPUT] [-dm D_MODEL] [-p MULTI_HEAD]
                     [-ds DECODER_STACKS] [-ts TOKEN_START] [-te TOKEN_END]
                     [-vs VOCABULARY_START] [-ve VOCABULARY_END] [-sd]
                     [-lt LOAD_TOKENIZER] [-lw LOAD_WEIGHTS]
                     [-st SAVE_TOKENIZER] [-sw SAVE_WEIGHTS] [-ot OPTIMIZER]
                     [-i] [-mv] [-mvo]

optional arguments:
  -h, --help            show this help message and exit
  -d DATA_PATH, --data-path DATA_PATH
                        File: Corresponding to corpus or training text
                        [String]
  -l LEARNING_RATE, --learning-rate LEARNING_RATE
                        Float: Learning Rate. The model will train ONLY IF the
                        rate is > 0, skip otherwise [Float]
  -ol OUTPUT_LENGTH, --output-length OUTPUT_LENGTH
                        Length of the output sequence to be generated
  -e EPOCHS, --epochs EPOCHS
                        Number of training Epochs [Int]
  -b BATCH_SIZE, --batch-size BATCH_SIZE
                        Size of each batch [Int]
  -s GPT_INPUT, --gpt-input GPT_INPUT
                        Number of Tokens of text the model inputs at a time
                        [Int]
  -dm D_MODEL, --d-model D_MODEL
                        Embedding layer output dimensions [Int]
  -p MULTI_HEAD, --multi-head MULTI_HEAD
                        Number of Multi-head Attention layer in parallel [Int]
  -ds DECODER_STACKS, --decoder-stacks DECODER_STACKS
                        Number of stacked Decoder layer [Int]
  -ts TOKEN_START, --token-start TOKEN_START
                        The token number in the corpus to mark it as the
                        starting point of the training [Int]
  -te TOKEN_END, --token-end TOKEN_END
                        The token number in the corpus to mark it as the end
                        point of the training [Int]
  -vs VOCABULARY_START, --vocabulary-start VOCABULARY_START
                        Token number from the corpus to mark the starting
                        point of vocabulary data [Int]
  -ve VOCABULARY_END, --vocabulary-end VOCABULARY_END
                        Token number from the corpus to mark the end point of
                        vocabulary data [Int]
  -sd, --save           Save the Model and Vectorizer data to disk
                        [True/False]
  -lt LOAD_TOKENIZER, --load-tokenizer LOAD_TOKENIZER
                        File: Vectorization layer [File]
  -lw LOAD_WEIGHTS, --load-weights LOAD_WEIGHTS
                        File: Model Weights [File]
  -st SAVE_TOKENIZER, --save-tokenizer SAVE_TOKENIZER
                        File: Saving Vectorizer File [File]
  -sw SAVE_WEIGHTS, --save-weights SAVE_WEIGHTS
                        File: Saving Model Weights[File]
  -ot OPTIMIZER, --optimizer OPTIMIZER
                        Optimizer consistent to TensorFlow optimizer class
                        [tf.keras.optimizers]
  -i, --inference-only  Only Print the output of the model in Inference Mode
                        [True/False]
  -mv, --model-vectorizer
                        Return Model, Vectorizer Tuple [True/False]
  -mvo, --model-vectorizer-output
                        Return Model, Vectorizer, Output Tuple [True/False]

예

모델 생성 및 학습의 예

원하는 모델 사양에 GPT_INPUT = 10, D_MODEL = 128, MULTI_HEAD = 8, DECODER_STACKS = 1이 있고 훈련 범위에 대한 코퍼스 토큰 범위가 TOKEN_START = 0 ~ TOKEN_END = 40000이라고 가정하고 코퍼스 범위에서 벡터화 레이어를 생성합니다. VOCABULARY_START = 0 ~ VOCABULARY_END = 200000, 다음 명령이 실행되어 모델 훈련 프로세스를 시작합니다. 결과 가중치 및 토크나이저 데이터는 지정된 폴더에 저장됩니다. 후속 출력은 이 명령 실행의 결과를 보여줍니다.

MinimalGPT.py -d './dataset/output_dataset.txt' -l 0.001 -ol 200 -e 4 -b 512 -s 10 -dm 128 -p 8 -ds 1 -ts 0 -te 40000 -vs 0 -ve 200000 -sd -st './models/tokenizer.mgt' -sw './models/weights.mgw' Total tokens: 40000 100%|██████████████████████████████████████████████████████████████████████████████| 200000/200000 [02:02<00:00, 1636.38it/s] New Vectorizer created successfully... Vocabulary Size: 14270 100%|██████████████████████████████████████████████████████████████████████████████| 39989/39989 [00:00<00:00, 302926.25it/s] 100%|█████████████████████████████████████████████████████████████████████████████| 39989/39989 [00:00<00:00, 1289942.19it/s] (None, 10, 128) Epoch 1/4 79/79 [==============================] - 88s 1s/step - loss: 7.8692 Epoch 2/4 79/79 [==============================] - 92s 1s/step - loss: 3.8066 Epoch 3/4 79/79 [==============================] - 93s 1s/step - loss: 1.1487 Epoch 4/4 79/79 [==============================] - 92s 1s/step - loss: 0.2900 100%|██████████████████████████████████████████████████████████████████████████████████████| 190/190 [00:05<00:00, 34.70it/s] Vocabulary size saved: 14270 and her eyes in the library. She was the rather large woman, although not fat, and when she wore high heels--which sh e was not prone to do, because although Cutter would not have cared, she kept trying to project into other people's minds and trying, as she said, "Not to do anything to them, that I wouldn't want them to do you me."--she rose a good inch above Cutter. She was pleasant humored, and cooperative, and the one great irritant about her that annoyed Cutter, was the fact that she wa s not capable of meeting life wholeheartedly and with strength. She steadily worried about other people's feelings and thought s, so that Cutter wondered if she were capable of the slightest personal conviction. Yet that weakness was an advantage at the same time, to him, because she worked constantly toward making him happy. The house was run to his minutest liking, and the s ervants liked her, so that while she did not use a strong enough">


PS C:gpt> python MinimalGPT .py -d './dataset/output_dataset.txt' -l 0.001 -ol 200 -e 4 -b 512 -s 10 -dm 128 -p 8 -ds 1 -ts 0 -te 40000 -vs 0 -ve 200000 -sd -st './models/tokenizer.mgt' -sw './models/weights.mgw'
Total tokens: 40000
100%|██████████████████████████████████████████████████████████████████████████████| 200000/200000 [02:02<00:00, 1636.38it/s]
New Vectorizer created successfully...
Vocabulary Size: 14270
100%|██████████████████████████████████████████████████████████████████████████████| 39989/39989 [00:00<00:00, 302926.25it/s]
100%|█████████████████████████████████████████████████████████████████████████████| 39989/39989 [00:00<00:00, 1289942.19it/s]
(None, 10, 128)
Epoch 1/4
79/79 [==============================] - 88s 1s/step - loss: 7.8692
Epoch 2/4
79/79 [==============================] - 92s 1s/step - loss: 3.8066
Epoch 3/4
79/79 [==============================] - 93s 1s/step - loss: 1.1487
Epoch 4/4
79/79 [==============================] - 92s 1s/step - loss: 0.2900
100%|██████████████████████████████████████████████████████████████████████████████████████| 190/190 [00:05<00:00, 34.70it/s]
Vocabulary size saved: 14270
         and her eyes in the library. She was the rather large woman, although not fat, and when she wore high heels--which sh
e was not prone to do, because although Cutter would not have cared, she kept trying to project into other people's minds and
trying, as she said, "Not to do anything to them, that I wouldn't want them to do you me."--she rose a good inch above Cutter.
 She was pleasant humored, and cooperative, and the one great irritant about her that annoyed Cutter, was the fact that she wa
s not capable of meeting life wholeheartedly and with strength. She steadily worried about other people's feelings and thought
s, so that Cutter wondered if she were capable of the slightest personal conviction. Yet that weakness was an advantage at the
 same time, to him, because she worked constantly toward making him happy. The house was run to his minutest liking, and the s
ervants liked her, so that while she did not use a strong enough

미세 조정

위의 모델을 미세 조정(또는 재학습)한다고 가정하고, 토크나이저와 가중치를 다시 로드하고 코퍼스의 지정된 창 범위의 새 텍스트에서 이를 재학습시키는 명령은 다음과 같습니다.

MinimalGPT.py -d './dataset/output_dataset.txt' -l 0.00005 -ol 200 -e 1 -b 512 -s 10 -dm 128 -p 8 -ds 1 -ts 80000 -te 120000 -sd -st './models/tokenizer2.mgt' -sw './models/weights2.mgw' -lt './models/tokenizer.mgt' -lw './models/weights.mgw' Total tokens: 40000 100%|██████████████████████████████████████████████████████████████████████████████| 39989/39989 [00:00<00:00, 302923.51it/s] 100%|█████████████████████████████████████████████████████████████████████████████| 39989/39989 [00:00<00:00, 1428099.68it/s] (None, 10, 128) 79/79 [==============================] - 81s 993ms/step - loss: 7.9725 100%|██████████████████████████████████████████████████████████████████████████████████████| 190/190 [00:06<00:00, 30.29it/s] Vocabulary size saved: 14270 of her own the black of my own and my wife had could seen the house at the same moment her mind caught the first sugg estion of the folded paper. “But he must have a name! Where is the paper?” She moved to the desk, and began to turn over the s cattered documents that littered it. The first that caught her eye was an unfinished letter in her husband’s hand, with his pe n lying across it, as though dropped there at a sudden summons. “My dear Parvis,”--who was Parvis?--“I have just received your letter announcing Elwell’s death, and while I suppose there is now no farther risk of trouble, it might be safer--” That was all. The “risk of trouble” was easily explained by the newspaper clipping which had apprised Mary of the suit brought against her husband by one of his associates in the Blue Star enterprise. The only new information conveyed in the letter was the fact of its showing Boyne,">


PS C:gpt> python MinimalGPT .py -d './dataset/output_dataset.txt' -l 0.00005 -ol 200 -e 1 -b 512 -s 10 -dm 128 -p 8 -ds 1 -ts 80000 -te 120000 -sd -st './models/tokenizer2.mgt' -sw './models/weights2.mgw' -lt './models/tokenizer.mgt' -lw './models/weights.mgw'
Total tokens: 40000
100%|██████████████████████████████████████████████████████████████████████████████| 39989/39989 [00:00<00:00, 302923.51it/s]
100%|█████████████████████████████████████████████████████████████████████████████| 39989/39989 [00:00<00:00, 1428099.68it/s]
(None, 10, 128)
79/79 [==============================] - 81s 993ms/step - loss: 7.9725
100%|██████████████████████████████████████████████████████████████████████████████████████| 190/190 [00:06<00:00, 30.29it/s]
Vocabulary size saved: 14270
         of her own the black of my own and my wife had could seen the house at the same moment her mind caught the first sugg
estion of the folded paper. “But he must have a name! Where is the paper?” She moved to the desk, and began to turn over the s
cattered documents that littered it. The first that caught her eye was an unfinished letter in her husband’s hand, with his pe
n lying across it, as though dropped there at a sudden summons. “My dear Parvis,”--who was Parvis?--“I have just received your
 letter announcing Elwell’s death, and while I suppose there is now no farther risk of trouble, it might be safer--” That was
all. The “risk of trouble” was easily explained by the newspaper clipping which had apprised Mary of the suit brought against
her husband by one of his associates in the Blue Star enterprise. The only new information conveyed in the letter was the fact
 of its showing Boyne,

추론 모드

추론 모드에는 사전 훈련된 가중치와 벡터화기의 로드가 포함됩니다. 그런 다음 이러한 구성 요소를 사용하여 모델을 실행하고 지정된 대로 지정된 길이의 출력을 생성합니다.

MinimalGPT.py -i -ol 500 -e 6 -b 512 -s 10 -dm 128 -p 8 -ds 1 -lt './models/tokenizer2.mgt' -lw './models/weights2.mgw' (None, 10, 128) 100%|██████████████████████████████████████████████████████████████████████████████████████| 490/490 [00:13<00:00, 35.93it/s] of her own “on the other from the inel’--a little sensational, of course. But I guess you’d better look it over.” He held out a newspaper to Mary, who unfolded it slowly, remembering, as she did so, the evening when, in that same room, the per usal of a clipping from the “Sentinel” had first shaken the depths of her security. As she opened the paper, her eyes, shrinki ng from the glaring head-lines, “Widow of Boyne’s Victim Forced to Appeal for Aid,” ran down the column of text to two portrai ts inserted in it. The first was her husband’s, taken from a photograph made the year they had come to England. It was the pic ture of him that she liked best, the one that stood on the writing-table up-stairs in her bedroom. As the eyes in the photogra ph met hers, she felt it would be impossible to read what was said of him, and closed her lids with the sharpness of the pain. “I thought if you felt disposed to put your name down--” she heard Parvis continue. She opened her eyes with an effort, and t hey fell on the other portrait. It was that of a youngish man, slightly built, in rough clothes, with features somewhat blurre d by the shadow of a projecting hat-brim. Where had she seen that outline before? She stared at it confusedly, her heart hamme ring in her throat and ears. Then she gave a cry. “This is the man--the man who came for my husband!” She heard Parvis start t o his feet, and was dimly aware that she had slipped backward into the corner of the sofa, and that he was bending above her i n alarm. With an intense effort she straightened herself, and reached out for the paper, which she had dropped. “It’s the man! I should know him anywhere!” she cried in a voice that sounded in her own ears like a scream. Parvis’s voice seemed to come t o her from far off, down endless, fog-muffled windings. “Mrs. Boyne, you’re not very well. Shall I call somebody? Shall I get a glass of water?” “No, no, no!” She threw herself toward him, her hand frantically clenching the newspaper. “I tell you, it’s the man! I KNOW him! He spoke to me in the garden!” Parvis took the journal from her, directing his glasses to the portrait. “It can’t be, Mrs. Boyne. It’s Robert Elwell.” “Robert Elwell?” Her white">

 PS C:gpt> python MinimalGPT .py -i -ol 500 -e 6 -b 512 -s 10 -dm 128 -p 8 -ds 1 -lt './models/tokenizer2.mgt' -lw './models/weights2.mgw'
(None, 10, 128)
100%|██████████████████████████████████████████████████████████████████████████████████████| 490/490 [00:13<00:00, 35.93it/s]
of her own “on the other from the inel’--a little sensational, of course. But I guess you’d better look it over.” He
held out a newspaper to Mary, who unfolded it slowly, remembering, as she did so, the evening when, in that same room, the per
usal of a clipping from the “Sentinel” had first shaken the depths of her security. As she opened the paper, her eyes, shrinki
ng from the glaring head-lines, “Widow of Boyne’s Victim Forced to Appeal for Aid,” ran down the column of text to two portrai
ts inserted in it. The first was her husband’s, taken from a photograph made the year they had come to England. It was the pic
ture of him that she liked best, the one that stood on the writing-table up-stairs in her bedroom. As the eyes in the photogra
ph met hers, she felt it would be impossible to read what was said of him, and closed her lids with the sharpness of the pain.
“I thought if you felt disposed to put your name down--” she heard Parvis continue. She opened her eyes with an effort, and t
hey fell on the other portrait. It was that of a youngish man, slightly built, in rough clothes, with features somewhat blurre
d by the shadow of a projecting hat-brim. Where had she seen that outline before? She stared at it confusedly, her heart hamme
ring in her throat and ears. Then she gave a cry. “This is the man--the man who came for my husband!” She heard Parvis start t
o his feet, and was dimly aware that she had slipped backward into the corner of the sofa, and that he was bending above her i
n alarm. With an intense effort she straightened herself, and reached out for the paper, which she had dropped. “It’s the man!
I should know him anywhere!” she cried in a voice that sounded in her own ears like a scream. Parvis’s voice seemed to come t
o her from far off, down endless, fog-muffled windings. “Mrs. Boyne, you’re not very well. Shall I call somebody? Shall I get
a glass of water?” “No, no, no!” She threw herself toward him, her hand frantically clenching the newspaper. “I tell you, it’s
the man! I KNOW him! He spoke to me in the garden!” Parvis took the journal from her, directing his glasses to the portrait.
“It can’t be, Mrs. Boyne. It’s Robert Elwell.” “Robert Elwell?” Her white

모델을 프로젝트로 가져오기

MinimalGPT .py를 활용하여 생성된 훈련된 모델을 프로젝트에 통합하는 것은 MinimalGPT 기능을 가져오고 원하는 사양에 따라 구성함으로써 촉진되는 간단한 프로세스입니다. 이는 inference_only = True(추론 모드) 프레임워크 내에서 return_model_and_Vectorizer = True 또는 return_model_and_Vectorizer_and_output = True 매개변수를 설정하여 달성할 수 있습니다. 또한 모델의 훈련, 생성 및 내보내기는 명령줄 모드와 유사한 접근 방식을 사용하여 수행할 수 있습니다. 이러한 절차를 포괄적으로 설명하기 위해 함께 제공되는 Jupyter Notebook에서 예시 데모를 제공합니다.

from MinimalGPT import MinimalGPT model = MinimalGPT (output_length = 200, gpt_input = 10, d_model = 128, h = 8, decoder_stacks = 1, load_tokenizer = './models/tokenizer3.mgt', load_weights = './models/weights3.mgw', inference_only = True, return_model_and_vectorizer_and_output = True) model[0].summary()
 Model: "model"
 Layer (type) Output Shape Param
 ================================================================= input_1 (InputLayer) [(None, 10)] 0
 embedding (Embedding) (None, 10, 128) 1826816
 positional_embedding (Posit (None, 10, 128) 0
 ionalEmbedding)
 decoder (Decoder) (None, 10, 128) 37160
 flatten (Flatten) (None, 1280) 0
 dense (Dense) (None, 14273) 18283713
 tf.nn.softmax (TFOpLambda) (None, 14273) 0
 ================================================================= Total params: 20,147,689 Trainable params: 20,147,689 Non-trainable params: 0

구현 사양

여기에 구현된 모델은 원래 문서 구현과 비교하여 약간 다릅니다. 스케일링된 내적 출력의 헤드를 연결한 후 형성된 행렬에 크기 키 차원 x d_model의 행렬 매개변수가 곱해집니다. 실용적인 목적으로 매개변수 수를 줄이기 위한 이 작은 조정은 훈련 가능한 매개변수 최적화로 인해 성능이 약간 향상되는 결과를 가져올 수 있습니다.

결과

샘플이 포함된 노트북의 예제 폴더를 따르세요.

문제 해결

오류가 발생하거나 특정 기능 요청을 염두에 두고 있는 경우 이슈 탭에서 자유롭게 티켓을 열 수 있습니다.

참고자료/추가 자료

Vaswani, Ashish 등. "당신이 필요로하는 것은 관심뿐입니다." 신경 정보 처리 시스템의 발전 30(2017).
래드포드, 알렉, 그 외 여러분. "생성적 사전 훈련을 통해 언어 이해를 향상시킵니다." (2018).
래드포드, 알렉, 그 외 여러분. "언어 모델은 감독되지 않는 다중 작업 학습자입니다." OpenAI 블로그 1.8(2019): 9.
브라운, 톰, 그 외 여러분. "언어 모델은 소수의 학습자입니다." 신경 정보 처리 시스템의 발전 33(2020): 1877-1901.
하워드, 제레미, 세바스찬 루더. "텍스트 분류를 위한 범용 언어 모델 미세 조정." arXiv 사전 인쇄 arXiv:1801.06146 (2018).
페트로니, 파비오, 그 외 여러분. "지식 기반으로서의 언어 모델?." arXiv 사전 인쇄 arXiv:1909.01066 (2019).

확장하다

추가 정보