Trax是一个端到端的库,用于深度学习,专注于清晰的代码和速度。它是在Google Brain团队中积极使用和维护的。本笔记本(在Colab中运行)显示了如何使用TRAX以及您可以在哪里找到更多信息。
我们欢迎对Trax的贡献!我们欢迎使用新型号和层的代码以及对代码和文档的改进的PR。我们特别喜欢笔记本,这些笔记本可以解释模型如何工作并展示如何使用它们来解决问题!
以下是一些示例笔记本: -
trax.data
api中的一些主要功能一般设置
在运行任何代码示例之前,执行以下单元格(一次)。
import os
import numpy as np
!p ip install - q - U trax
import trax
这是您用几行代码创建英语 - 德语翻译器的方式:
# Create a Transformer model.
# Pre-trained model config in gs://trax-ml/models/translation/ende_wmt32k.gin
model = trax . models . Transformer (
input_vocab_size = 33300 ,
d_model = 512 , d_ff = 2048 ,
n_heads = 8 , n_encoder_layers = 6 , n_decoder_layers = 6 ,
max_len = 2048 , mode = 'predict' )
# Initialize using pre-trained weights.
model . init_from_file ( 'gs://trax-ml/models/translation/ende_wmt32k.pkl.gz' ,
weights_only = True )
# Tokenize a sentence.
sentence = 'It is nice to learn new things today!'
tokenized = list ( trax . data . tokenize ( iter ([ sentence ]), # Operates on streams.
vocab_dir = 'gs://trax-ml/vocabs/' ,
vocab_file = 'ende_32k.subword' ))[ 0 ]
# Decode from the Transformer.
tokenized = tokenized [ None , :] # Add batch dimension.
tokenized_translation = trax . supervised . decoding . autoregressive_sample (
model , tokenized , temperature = 0.0 ) # Higher temperature: more diverse results.
# De-tokenize,
tokenized_translation = tokenized_translation [ 0 ][: - 1 ] # Remove batch and EOS.
translation = trax . data . detokenize ( tokenized_translation ,
vocab_dir = 'gs://trax-ml/vocabs/' ,
vocab_file = 'ende_32k.subword' )
print ( translation )
Es ist schön, heute neue Dinge zu lernen!
Trax包括基本模型(例如Resnet,LSTM,Transformer)和RL算法(例如Readforce,A2C,PPO)。它也被积极地用于研究,并包括改革者和AWR等新型RL算法等新模型。 Trax与大量深度学习数据集具有绑定,包括Tensor2Tensor和TensorFlow数据集。
您可以将TRAX用作您自己的Python脚本和笔记本的库,也可以用作外壳的二进制文件,这对于培训大型型号可能更方便。它可以在CPU,GPU和TPU上进行任何更改。
您可以在这里了解Trax的工作原理,如何创建新模型以及如何在自己的数据上训练它们。
流经Trax模型的基本单元是张量- 多维阵列,有时也称为Numpy阵列,这是由于使用最广泛的张量操作的软件包numpy
。如果您不知道如何在张张器上操作Numpy指南:TRAX还使用Numpy API。
在Trax中,我们希望使用Numpy操作非常快速运行,并利用GPU和TPU来加速它们。我们还希望自动计算张量的功能梯度。这是在trax.fastmath
软件包中完成的,这要归功于其后端-Jax和Tensorflow Numpy。
from trax . fastmath import numpy as fastnp
trax . fastmath . use_backend ( 'jax' ) # Can be 'jax' or 'tensorflow-numpy'.
matrix = fastnp . array ([[ 1 , 2 , 3 ], [ 4 , 5 , 6 ], [ 7 , 8 , 9 ]])
print ( f'matrix = n { matrix } ' )
vector = fastnp . ones ( 3 )
print ( f'vector = { vector } ' )
product = fastnp . dot ( vector , matrix )
print ( f'product = { product } ' )
tanh = fastnp . tanh ( product )
print ( f'tanh(product) = { tanh } ' )
matrix =
[[1 2 3]
[4 5 6]
[7 8 9]]
vector = [1. 1. 1.]
product = [12. 15. 18.]
tanh(product) = [0.99999994 0.99999994 0.99999994]
可以使用trax.fastmath.grad
计算梯度。
def f ( x ):
return 2.0 * x * x
grad_f = trax . fastmath . grad ( f )
print ( f'grad(2x^2) at 1 = { grad_f ( 1.0 ) } ' )
grad(2x^2) at 1 = 4.0
层是Trax模型的基本基础。您将在“层”介绍中了解所有有关它们的知识,但是就目前而言,只需查看一个核心Trax层的实现, Embedding
:
class Embedding ( base . Layer ):
"""Trainable layer that maps discrete tokens/IDs to vectors."""
def __init__ ( self ,
vocab_size ,
d_feature ,
kernel_initializer = init . RandomNormalInitializer ( 1.0 )):
"""Returns an embedding layer with given vocabulary size and vector size.
Args:
vocab_size: Size of the input vocabulary. The layer will assign a unique
vector to each ID in `range(vocab_size)`.
d_feature: Dimensionality/depth of the output vectors.
kernel_initializer: Function that creates (random) initial vectors for
the embedding.
"""
super (). __init__ ( name = f'Embedding_ { vocab_size } _ { d_feature } ' )
self . _d_feature = d_feature # feature dimensionality
self . _vocab_size = vocab_size
self . _kernel_initializer = kernel_initializer
def forward ( self , x ):
"""Returns embedding vectors corresponding to input token IDs.
Args:
x: Tensor of token IDs.
Returns:
Tensor of embedding vectors.
"""
return jnp . take ( self . weights , x , axis = 0 , mode = 'clip' )
def init_weights_and_state ( self , input_signature ):
"""Returns tensor of newly initialized embedding vectors."""
del input_signature
shape_w = ( self . _vocab_size , self . _d_feature )
w = self . _kernel_initializer ( shape_w , self . rng )
self . weights = w
需要使用输入的签名(形状和DTYPE) Embedding
具有训练权重的层,然后可以通过调用它们来运行。
from trax import layers as tl
# Create an input tensor x.
x = np . arange ( 15 )
print ( f'x = { x } ' )
# Create the embedding layer.
embedding = tl . Embedding ( vocab_size = 20 , d_feature = 32 )
embedding . init ( trax . shapes . signature ( x ))
# Run the layer -- y = embedding(x).
y = embedding ( x )
print ( f'shape of y = { y . shape } ' )
x = [ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14]
shape of y = (15, 32)
TRAX中的模型是由最常使用Serial
和Branch
组合器的图层构建的。您可以在“图层”介绍中阅读有关这些组合器的更多信息,并在trax/models/
中查看许多模型的代码,例如,这就是Transformer语言模型的实现。以下是如何构建情感分类模型的示例。
model = tl . Serial (
tl . Embedding ( vocab_size = 8192 , d_feature = 256 ),
tl . Mean ( axis = 1 ), # Average on axis 1 (length of sentence).
tl . Dense ( 2 ), # Classify 2 classes.
tl . LogSoftmax () # Produce log-probabilities.
)
# You can print model structure.
print ( model )
Serial[
Embedding_8192_256
Mean
Dense_2
LogSoftmax
]
要训练模型,您需要数据。在TRAX中,数据流表示为Python迭代器,因此您可以调用next(data_stream)
并获取元组,例如(inputs, targets)
。 TRAX允许您轻松使用TensorFlow数据集,并且还可以使用标准open('my_file.txt')
从自己的文本文件中获取迭代器。
train_stream = trax . data . TFDS ( 'imdb_reviews' , keys = ( 'text' , 'label' ), train = True )()
eval_stream = trax . data . TFDS ( 'imdb_reviews' , keys = ( 'text' , 'label' ), train = False )()
print ( next ( train_stream )) # See one example.
(b"This was an absolutely terrible movie. Don't be lured in by Christopher Walken or Michael Ironside. Both are great actors, but this must simply be their worst role in history. Even their great acting could not redeem this movie's ridiculous storyline. This movie is an early nineties US propaganda piece. The most pathetic scenes were those when the Columbian rebels were making their cases for revolutions. Maria Conchita Alonso appeared phony, and her pseudo-love affair with Walken was nothing but a pathetic emotional plug in a movie that was devoid of any real meaning. I am disappointed that there are movies like this, ruining actor's like Christopher Walken's good name. I could barely sit through it.", 0)
使用trax.data
模块,您可以创建输入处理管道,例如,以代币化和调整数据。您可以使用trax.data.Serial
创建数据管道,它们是您适用于流以创建处理流的流的功能。
data_pipeline = trax . data . Serial (
trax . data . Tokenize ( vocab_file = 'en_8k.subword' , keys = [ 0 ]),
trax . data . Shuffle (),
trax . data . FilterByLength ( max_length = 2048 , length_keys = [ 0 ]),
trax . data . BucketByLength ( boundaries = [ 32 , 128 , 512 , 2048 ],
batch_sizes = [ 256 , 64 , 16 , 4 , 1 ],
length_keys = [ 0 ]),
trax . data . AddLossWeights ()
)
train_batches_stream = data_pipeline ( train_stream )
eval_batches_stream = data_pipeline ( eval_stream )
example_batch = next ( train_batches_stream )
print ( f'shapes = { [ x . shape for x in example_batch ] } ' ) # Check the shapes.
shapes = [(4, 1024), (4,), (4,)]
当您拥有模型和数据时,请使用trax.supervised.training
来定义培训和评估任务并创建训练循环。 TRAX训练环将优化培训,并将为您创建张板日志和型号检查点。
from trax . supervised import training
# Training task.
train_task = training . TrainTask (
labeled_data = train_batches_stream ,
loss_layer = tl . WeightedCategoryCrossEntropy (),
optimizer = trax . optimizers . Adam ( 0.01 ),
n_steps_per_checkpoint = 500 ,
)
# Evaluaton task.
eval_task = training . EvalTask (
labeled_data = eval_batches_stream ,
metrics = [ tl . WeightedCategoryCrossEntropy (), tl . WeightedCategoryAccuracy ()],
n_eval_batches = 20 # For less variance in eval numbers.
)
# Training loop saves checkpoints to output_dir.
output_dir = os . path . expanduser ( '~/output_dir/' )
!r m - rf { output_dir }
training_loop = training . Loop ( model ,
train_task ,
eval_tasks = [ eval_task ],
output_dir = output_dir )
# Run 2000 steps (batches).
training_loop . run ( 2000 )
Step 1: Ran 1 train steps in 0.78 secs
Step 1: train WeightedCategoryCrossEntropy | 1.33800304
Step 1: eval WeightedCategoryCrossEntropy | 0.71843582
Step 1: eval WeightedCategoryAccuracy | 0.56562500
Step 500: Ran 499 train steps in 5.77 secs
Step 500: train WeightedCategoryCrossEntropy | 0.62914723
Step 500: eval WeightedCategoryCrossEntropy | 0.49253047
Step 500: eval WeightedCategoryAccuracy | 0.74062500
Step 1000: Ran 500 train steps in 5.03 secs
Step 1000: train WeightedCategoryCrossEntropy | 0.42949259
Step 1000: eval WeightedCategoryCrossEntropy | 0.35451687
Step 1000: eval WeightedCategoryAccuracy | 0.83750000
Step 1500: Ran 500 train steps in 4.80 secs
Step 1500: train WeightedCategoryCrossEntropy | 0.41843575
Step 1500: eval WeightedCategoryCrossEntropy | 0.35207348
Step 1500: eval WeightedCategoryAccuracy | 0.82109375
Step 2000: Ran 500 train steps in 5.35 secs
Step 2000: train WeightedCategoryCrossEntropy | 0.38129005
Step 2000: eval WeightedCategoryCrossEntropy | 0.33760912
Step 2000: eval WeightedCategoryAccuracy | 0.85312500
训练模型后,像任何一层一样运行以获取结果。
example_input = next ( eval_batches_stream )[ 0 ][ 0 ]
example_input_str = trax . data . detokenize ( example_input , vocab_file = 'en_8k.subword' )
print ( f'example input_str: { example_input_str } ' )
sentiment_log_probs = model ( example_input [ None , :]) # Add batch dimension.
print ( f'Model returned sentiment probabilities: { np . exp ( sentiment_log_probs ) } ' )
example input_str: I first saw this when I was a teen in my last year of Junior High. I was riveted to it! I loved the special effects, the fantastic places and the trial-aspect and flashback method of telling the story.<br /><br />Several years later I read the book and while it was interesting and I could definitely see what Swift was trying to say, I think that while it's not as perfect as the book for social commentary, as a story the movie is better. It makes more sense to have it be one long adventure than having Gulliver return after each voyage and making a profit by selling the tiny Lilliput sheep or whatever.<br /><br />It's much more arresting when everyone thinks he's crazy and the sheep DO make a cameo anyway. As a side note, when I saw Laputa I was stunned. It looks very much like the Kingdom of Zeal from the Chrono Trigger video game (1995) that also made me like this mini-series even more.<br /><br />I saw it again about 4 years ago, and realized that I still enjoyed it just as much. Really high quality stuff and began an excellent run of Sweeps mini-series for NBC who followed it up with the solid Merlin and interesting Alice in Wonderland.<pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad>
Model returned sentiment probabilities: [[3.984500e-04 9.996014e-01]]