nnl Download - nnl Source code download

nnl

AI Source Code

gpt2-xl assets

Download

NeuralNet Logic

nnl is an inference engine for large models on low-memory GPU platform.

Introduction

Big models are too large to fit into the GPU memory. nnl addresses this problem with a trade-off between PCIE bandwidth and memory.

A typical inference pipeline is as follows:

compose the computation graph using a model with $n$ nodes
topological sort each node in the computation graph to make a computation table
for i in [1, 2, 3, ..., n]:
- execute the following tasks asynchronously
  - compute the output of node i
  - load the weights to GPU for node i+1
  - allocate the GPU memory (output tensor and cahces) for node i+1
  - deallocate the GPU memory (output tensors, weights and caches) for node i-1

With GPU memory pool and memory defragmentation, NNIL makes it possible to inference a large model on a low-end GPU platform.

Build the library

This is just a hobby project written up in a few weeks, currently only CUDA backend is supported.

Tested with

gcc 13.2.1
cuda 12.2
cudnn 8.9.2.26

Build the static library

make libnnl_cuda.a && make libnnl_cuda_kernels.a

This command will build the two static libraries: lib/libnnl_cuda.a and lib/libnnl_cuda_kernels.a. The first one is the core library with CUDA backend in C++, and the second one is for the CUDA kernels.

GPT2-XL Example

A demo program of GPT2-XL (1.6B) is provided here. This program can be compiled by this command:

make gpt2_1558m

After downloading all the weights from the release, we can run the following command on a low-end GPU platform such as GTX 1050 (2 GB memory):

./bin/gpt2_1558m --max_len 20  "Hi. My name is Feng and I am a machine learning engineer"

And the output is like this:

Disclaimer: this is just an example generated by gpt2-xl, I am not working at Google and I do not know Randi.

And you can find the GPU memory access pattern

Roadmap

int8 support
more layers
more example applications
weight persistence in case of a small model

License

PeaceOSL

Acknowledgements

oneflow
nlohmann_json
spdlog

Why nnl?

Expand

Additional Information

Version gpt2-xl assets
Type AI Source Code
Update Time 2024-12-30
size 50MB
From Github

Related Applications

node telegram bot api

2024-12-14
typebot.io

2024-12-14
python wechaty getting started

2024-12-14
TranscriberBot

2024-12-14
genal chat

2024-12-14
Facemoji

2024-12-14

Recommended for You

chat.petals.dev

Other source code

1.0.0
GPT Prompt Templates

Other source code

1.0.0
GPTyped

Other source code

GPTyped 1.0.5
node telegram bot api

AI Source Code

v0.50.0
typebot.io

AI Source Code

v3.1.2
python wechaty getting started

AI Source Code

1.0.0
waymo open dataset

Other source code

December 2023 Update
termwind

Other categories

v2.3.0
wp functions

Other categories

1.0.0

Related Information All

nnl