CNNGPT
1.0.0
$ pip3 install -U cnngpt
import torch
from cnngpt.main import CNNLanguageModel
# Hyperparameters
vocab_size = 5000
embedding_dim = 256
num_layers = 4
kernel_size = 3
hidden_dim = 256
max_seq_len = 100
# Initialize model
model = CNNLanguageModel(
vocab_size,
embedding_dim,
num_layers,
kernel_size,
hidden_dim,
max_seq_len,
)
# Dummy input (batch_size=32, sequence_length=50)
x = torch.randint(
0, vocab_size, (32, 50)
) # Random integers as token IDs
# Forward pass
logits = model(x) # [batch_size, seq_len, vocab_size]
# Output shape
print("Logits shape:", logits.shape)
vocab_size
: The size of the vocabulary (number of unique tokens).embedding_dim
: The dimension of the embeddings.num_layers
: The number of convolutional layers.kernel_size
: The size of the convolutional kernels.hidden_dim
: The dimension of the hidden representations (should match embedding_dim
for residual connections).max_seq_len
: The maximum sequence length the model can handle.t
does not depend on future time steps.Throughout the network, we carefully manage tensor shapes to maintain consistency:
[batch_size, seq_len, embedding_dim]
[batch_size, embedding_dim, seq_len]
[batch_size, hidden_dim, seq_len]
[batch_size, seq_len, embedding_dim]
[batch_size, seq_len, vocab_size]
embedding_dim
and hidden_dim
must be equal to correctly add the residual connection.[batch_size, seq_len, hidden_dim]
before applying LayerNorm
.max_seq_len
; positional encodings are sliced accordingly.We have successfully translated the detailed algorithm into a PyTorch implementation, carefully following each step and ensuring that the code aligns with the design principles outlined earlier. This CNN-based language model leverages causal and dilated convolutions, gated activations, residual connections, and layer normalization to effectively model textual data for generation tasks.
By understanding each component and its role in the model, we can appreciate how this architecture captures both local and global dependencies in language, offering a powerful alternative to traditional models in natural language processing.