trax下载 - trax源代码下载

Trax - 清晰的代码和速度深度学习

Trax是一个端到端的库，用于深度学习，专注于清晰的代码和速度。它是在Google Brain团队中积极使用和维护的。本笔记本（在Colab中运行）显示了如何使用TRAX以及您可以在哪里找到更多信息。

运行预训练的变压器：在几行代码中创建转换器
功能和资源：API文档，在哪里与我们交谈，如何打开问题等等
演练：Trax的工作原理，如何制作新型号并训练自己的数据

我们欢迎对Trax的贡献！我们欢迎使用新型号和层的代码以及对代码和文档的改进的PR。我们特别喜欢笔记本，这些笔记本可以解释模型如何工作并展示如何使用它们来解决问题！

以下是一些示例笔记本： -

trax.data api解释了：解释了trax.data api中的一些主要功能
使用Reformer的命名实体识别：使用Kaggle数据集使用改革仪体系结构实现命名实体识别。
深N-gram模型：在莎士比亚作品中训练的深N-gram模型的实现

一般设置

在运行任何代码示例之前，执行以下单元格（一次）。

 import os
import numpy as np

!p ip install - q - U trax
import trax

1。运行预训练的变压器

这是您用几行代码创建英语 - 德语翻译器的方式：

用trax.models.transformer在trax中创建变压器模型
从具有模型的预训练的文件中初始化它。INIT_FROM_FILE
将您的输入句子引入以用trax.data.tokenize输入到模型中
用trax.supervised.decoding.autoregreserese_sample从变压器解码
将解码的结果降低，以用trax.data.detokenize获取翻译

 # Create a Transformer model.
# Pre-trained model config in gs://trax-ml/models/translation/ende_wmt32k.gin
model = trax . models . Transformer (
    input_vocab_size = 33300 ,
    d_model = 512 , d_ff = 2048 ,
    n_heads = 8 , n_encoder_layers = 6 , n_decoder_layers = 6 ,
    max_len = 2048 , mode = 'predict' )

# Initialize using pre-trained weights.
model . init_from_file ( 'gs://trax-ml/models/translation/ende_wmt32k.pkl.gz' ,
                     weights_only = True )

# Tokenize a sentence.
sentence = 'It is nice to learn new things today!'
tokenized = list ( trax . data . tokenize ( iter ([ sentence ]),  # Operates on streams.
                                    vocab_dir = 'gs://trax-ml/vocabs/' ,
                                    vocab_file = 'ende_32k.subword' ))[ 0 ]

# Decode from the Transformer.
tokenized = tokenized [ None , :]  # Add batch dimension.
tokenized_translation = trax . supervised . decoding . autoregressive_sample (
    model , tokenized , temperature = 0.0 )  # Higher temperature: more diverse results.

# De-tokenize,
tokenized_translation = tokenized_translation [ 0 ][: - 1 ]  # Remove batch and EOS.
translation = trax . data . detokenize ( tokenized_translation ,
                                   vocab_dir = 'gs://trax-ml/vocabs/' ,
                                   vocab_file = 'ende_32k.subword' )
print ( translation )

 Es ist schön, heute neue Dinge zu lernen!

2。功能和资源

Trax包括基本模型（例如Resnet，LSTM，Transformer）和RL算法（例如Readforce，A2C，PPO）。它也被积极地用于研究，并包括改革者和AWR等新型RL算法等新模型。 Trax与大量深度学习数据集具有绑定，包括Tensor2Tensor和TensorFlow数据集。

您可以将TRAX用作您自己的Python脚本和笔记本的库，也可以用作外壳的二进制文件，这对于培训大型型号可能更方便。它可以在CPU，GPU和TPU上进行任何更改。

API文档
与我们聊天
打开一个问题
订阅trax-discuss以获取新闻

3。演练

您可以在这里了解Trax的工作原理，如何创建新模型以及如何在自己的数据上训练它们。

张量和快速数学

流经Trax模型的基本单元是张量- 多维阵列，有时也称为Numpy阵列，这是由于使用最广泛的张量操作的软件包numpy 。如果您不知道如何在张张器上操作Numpy指南：TRAX还使用Numpy API。

在Trax中，我们希望使用Numpy操作非常快速运行，并利用GPU和TPU来加速它们。我们还希望自动计算张量的功能梯度。这是在trax.fastmath软件包中完成的，这要归功于其后端-Jax和Tensorflow Numpy。

 from trax . fastmath import numpy as fastnp
trax . fastmath . use_backend ( 'jax' )  # Can be 'jax' or 'tensorflow-numpy'.

matrix  = fastnp . array ([[ 1 , 2 , 3 ], [ 4 , 5 , 6 ], [ 7 , 8 , 9 ]])
print ( f'matrix = n { matrix } ' )
vector = fastnp . ones ( 3 )
print ( f'vector = { vector } ' )
product = fastnp . dot ( vector , matrix )
print ( f'product = { product } ' )
tanh = fastnp . tanh ( product )
print ( f'tanh(product) = { tanh } ' )

 matrix = 
[[1 2 3]
 [4 5 6]
 [7 8 9]]
vector = [1. 1. 1.]
product = [12. 15. 18.]
tanh(product) = [0.99999994 0.99999994 0.99999994]

可以使用trax.fastmath.grad计算梯度。

 def f ( x ):
  return 2.0 * x * x

grad_f = trax . fastmath . grad ( f )

print ( f'grad(2x^2) at 1 = { grad_f ( 1.0 ) } ' )

 grad(2x^2) at 1 = 4.0

层

层是Trax模型的基本基础。您将在“层”介绍中了解所有有关它们的知识，但是就目前而言，只需查看一个核心Trax层的实现， Embedding ：

 class Embedding ( base . Layer ):
  """Trainable layer that maps discrete tokens/IDs to vectors."""

  def __init__ ( self ,
               vocab_size ,
               d_feature ,
               kernel_initializer = init . RandomNormalInitializer ( 1.0 )):
    """Returns an embedding layer with given vocabulary size and vector size.

    Args:
      vocab_size: Size of the input vocabulary. The layer will assign a unique
          vector to each ID in `range(vocab_size)`.
      d_feature: Dimensionality/depth of the output vectors.
      kernel_initializer: Function that creates (random) initial vectors for
          the embedding.
    """
    super (). __init__ ( name = f'Embedding_ { vocab_size } _ { d_feature } ' )
    self . _d_feature = d_feature  # feature dimensionality
    self . _vocab_size = vocab_size
    self . _kernel_initializer = kernel_initializer

  def forward ( self , x ):
    """Returns embedding vectors corresponding to input token IDs.

    Args:
      x: Tensor of token IDs.

    Returns:
      Tensor of embedding vectors.
    """
    return jnp . take ( self . weights , x , axis = 0 , mode = 'clip' )

  def init_weights_and_state ( self , input_signature ):
    """Returns tensor of newly initialized embedding vectors."""
    del input_signature
    shape_w = ( self . _vocab_size , self . _d_feature )
    w = self . _kernel_initializer ( shape_w , self . rng )
    self . weights = w

需要使用输入的签名（形状和DTYPE） Embedding具有训练权重的层，然后可以通过调用它们来运行。

 from trax import layers as tl

# Create an input tensor x.
x = np . arange ( 15 )
print ( f'x = { x } ' )

# Create the embedding layer.
embedding = tl . Embedding ( vocab_size = 20 , d_feature = 32 )
embedding . init ( trax . shapes . signature ( x ))

# Run the layer -- y = embedding(x).
y = embedding ( x )
print ( f'shape of y = { y . shape } ' )

 x = [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14]
shape of y = (15, 32)

型号

TRAX中的模型是由最常使用Serial和Branch组合器的图层构建的。您可以在“图层”介绍中阅读有关这些组合器的更多信息，并在trax/models/中查看许多模型的代码，例如，这就是Transformer语言模型的实现。以下是如何构建情感分类模型的示例。

 model = tl . Serial (
    tl . Embedding ( vocab_size = 8192 , d_feature = 256 ),
    tl . Mean ( axis = 1 ),  # Average on axis 1 (length of sentence).
    tl . Dense ( 2 ),      # Classify 2 classes.
    tl . LogSoftmax ()   # Produce log-probabilities.
)

# You can print model structure.
print ( model )

 Serial[
  Embedding_8192_256
  Mean
  Dense_2
  LogSoftmax
]

数据

要训练模型，您需要数据。在TRAX中，数据流表示为Python迭代器，因此您可以调用next(data_stream)并获取元组，例如(inputs, targets) 。 TRAX允许您轻松使用TensorFlow数据集，并且还可以使用标准open('my_file.txt')从自己的文本文件中获取迭代器。

 train_stream = trax . data . TFDS ( 'imdb_reviews' , keys = ( 'text' , 'label' ), train = True )()
eval_stream = trax . data . TFDS ( 'imdb_reviews' , keys = ( 'text' , 'label' ), train = False )()
print ( next ( train_stream ))  # See one example.

 (b"This was an absolutely terrible movie. Don't be lured in by Christopher Walken or Michael Ironside. Both are great actors, but this must simply be their worst role in history. Even their great acting could not redeem this movie's ridiculous storyline. This movie is an early nineties US propaganda piece. The most pathetic scenes were those when the Columbian rebels were making their cases for revolutions. Maria Conchita Alonso appeared phony, and her pseudo-love affair with Walken was nothing but a pathetic emotional plug in a movie that was devoid of any real meaning. I am disappointed that there are movies like this, ruining actor's like Christopher Walken's good name. I could barely sit through it.", 0)

使用trax.data模块，您可以创建输入处理管道，例如，以代币化和调整数据。您可以使用trax.data.Serial创建数据管道，它们是您适用于流以创建处理流的流的功能。

 data_pipeline = trax . data . Serial (
    trax . data . Tokenize ( vocab_file = 'en_8k.subword' , keys = [ 0 ]),
    trax . data . Shuffle (),
    trax . data . FilterByLength ( max_length = 2048 , length_keys = [ 0 ]),
    trax . data . BucketByLength ( boundaries = [  32 , 128 , 512 , 2048 ],
                             batch_sizes = [ 256 ,  64 ,  16 ,    4 , 1 ],
                             length_keys = [ 0 ]),
    trax . data . AddLossWeights ()
  )
train_batches_stream = data_pipeline ( train_stream )
eval_batches_stream = data_pipeline ( eval_stream )
example_batch = next ( train_batches_stream )
print ( f'shapes = { [ x . shape for x in example_batch ] } ' )  # Check the shapes.

 shapes = [(4, 1024), (4,), (4,)]

监督培训

当您拥有模型和数据时，请使用trax.supervised.training来定义培训和评估任务并创建训练循环。 TRAX训练环将优化培训，并将为您创建张板日志和型号检查点。

 from trax . supervised import training

# Training task.
train_task = training . TrainTask (
    labeled_data = train_batches_stream ,
    loss_layer = tl . WeightedCategoryCrossEntropy (),
    optimizer = trax . optimizers . Adam ( 0.01 ),
    n_steps_per_checkpoint = 500 ,
)

# Evaluaton task.
eval_task = training . EvalTask (
    labeled_data = eval_batches_stream ,
    metrics = [ tl . WeightedCategoryCrossEntropy (), tl . WeightedCategoryAccuracy ()],
    n_eval_batches = 20  # For less variance in eval numbers.
)

# Training loop saves checkpoints to output_dir.
output_dir = os . path . expanduser ( '~/output_dir/' )
!r m - rf { output_dir }
training_loop = training . Loop ( model ,
                              train_task ,
                              eval_tasks = [ eval_task ],
                              output_dir = output_dir )

# Run 2000 steps (batches).
training_loop . run ( 2000 )

 Step      1: Ran 1 train steps in 0.78 secs
Step      1: train WeightedCategoryCrossEntropy |  1.33800304
Step      1: eval  WeightedCategoryCrossEntropy |  0.71843582
Step      1: eval      WeightedCategoryAccuracy |  0.56562500

Step    500: Ran 499 train steps in 5.77 secs
Step    500: train WeightedCategoryCrossEntropy |  0.62914723
Step    500: eval  WeightedCategoryCrossEntropy |  0.49253047
Step    500: eval      WeightedCategoryAccuracy |  0.74062500

Step   1000: Ran 500 train steps in 5.03 secs
Step   1000: train WeightedCategoryCrossEntropy |  0.42949259
Step   1000: eval  WeightedCategoryCrossEntropy |  0.35451687
Step   1000: eval      WeightedCategoryAccuracy |  0.83750000

Step   1500: Ran 500 train steps in 4.80 secs
Step   1500: train WeightedCategoryCrossEntropy |  0.41843575
Step   1500: eval  WeightedCategoryCrossEntropy |  0.35207348
Step   1500: eval      WeightedCategoryAccuracy |  0.82109375

Step   2000: Ran 500 train steps in 5.35 secs
Step   2000: train WeightedCategoryCrossEntropy |  0.38129005
Step   2000: eval  WeightedCategoryCrossEntropy |  0.33760912
Step   2000: eval      WeightedCategoryAccuracy |  0.85312500

训练模型后，像任何一层一样运行以获取结果。

 example_input = next ( eval_batches_stream )[ 0 ][ 0 ]
example_input_str = trax . data . detokenize ( example_input , vocab_file = 'en_8k.subword' )
print ( f'example input_str: { example_input_str } ' )
sentiment_log_probs = model ( example_input [ None , :])  # Add batch dimension.
print ( f'Model returned sentiment probabilities: { np . exp ( sentiment_log_probs ) } ' )

 example input_str: I first saw this when I was a teen in my last year of Junior High. I was riveted to it! I loved the special effects, the fantastic places and the trial-aspect and flashback method of telling the story.<br /><br />Several years later I read the book and while it was interesting and I could definitely see what Swift was trying to say, I think that while it's not as perfect as the book for social commentary, as a story the movie is better. It makes more sense to have it be one long adventure than having Gulliver return after each voyage and making a profit by selling the tiny Lilliput sheep or whatever.<br /><br />It's much more arresting when everyone thinks he's crazy and the sheep DO make a cameo anyway. As a side note, when I saw Laputa I was stunned. It looks very much like the Kingdom of Zeal from the Chrono Trigger video game (1995) that also made me like this mini-series even more.<br /><br />I saw it again about 4 years ago, and realized that I still enjoyed it just as much. Really high quality stuff and began an excellent run of Sweeps mini-series for NBC who followed it up with the solid Merlin and interesting Alice in Wonderland.<pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad><pad>
Model returned sentiment probabilities: [[3.984500e-04 9.996014e-01]]

展开