CUB 为 CUDA 编程模型的每一层提供最先进的、可重用的软件组件:
CUB 包含在 NVIDIA HPC SDK 和 CUDA 工具包中。
我们推荐 CUB 项目网站以获取更多信息和示例。
# include < cub/cub.cuh >
// Block-sorting CUDA kernel
__global__ void BlockSortKernel ( int *d_in, int *d_out)
{
using namespace cub ;
// Specialize BlockRadixSort, BlockLoad, and BlockStore for 128 threads
// owning 16 integer items each
typedef BlockRadixSort< int , 128 , 16 > BlockRadixSort;
typedef BlockLoad< int , 128 , 16 , BLOCK_LOAD_TRANSPOSE> BlockLoad;
typedef BlockStore< int , 128 , 16 , BLOCK_STORE_TRANSPOSE> BlockStore;
// Allocate shared memory
__shared__ union {
typename BlockRadixSort::TempStorage sort;
typename BlockLoad::TempStorage load;
typename BlockStore::TempStorage store;
} temp_storage;
int block_offset = blockIdx. x * ( 128 * 16 ); // OffsetT for this block's ment
// Obtain a segment of 2048 consecutive keys that are blocked across threads
int thread_keys[ 16 ];
BlockLoad (temp_storage. load ). Load (d_in + block_offset, thread_keys);
__syncthreads ();
// Collectively sort the keys
BlockRadixSort (temp_storage. sort ). Sort (thread_keys);
__syncthreads ();
// Store the sorted segment
BlockStore (temp_storage. store ). Store (d_out + block_offset, thread_keys);
}
每个线程块使用cub::BlockRadixSort
对其自己的输入段进行集体排序。该类根据要排序的数据类型、每个块的线程数、每个线程的键数以及隐式的目标编译体系结构进行专门化。
cub::BlockLoad
和cub::BlockStore
类同样是专门化的。此外,为了提供对设备内存的合并访问,这些原语被配置为使用条带访问模式(其中连续线程同时访问连续项)来访问内存,然后将键转置为跨线程的元素的阻塞排列。
一旦专门化,这些类就会公开不透明的TempStorage
成员类型。线程块使用这些存储类型来静态分配线程块所需的共享内存的联合。 (或者,这些存储类型可以别名为全局内存分配)。
使用以下编译器的指定版本定期测试 CUB。不受支持的版本可能会发出弃用警告,可以通过在编译期间定义 CUB_IGNORE_DEPRECATED_COMPILER 来消除警告。
除了 GitHub 之外,CUB 还随 NVIDIA HPC SDK 和 CUDA 工具包一起分发。
有关特定版本的详细信息,请参阅变更日志。
CUB 发布 | 包含在 |
---|---|
2.0.1 | CUDA工具包12.0 |
2.0.0 | 待定 |
1.17.2 | 待定 |
1.17.1 | 待定 |
1.17.0 | 待定 |
1.16.0 | 待定 |
1.15.0 | NVIDIA HPC SDK 22.1 和 CUDA 工具包 11.6 |
1.14.0 | NVIDIA HPC SDK 21.9 |
1.13.1 | CUDA 工具包 11.5 |
1.13.0 | NVIDIA HPC SDK 21.7 |
1.12.1 | CUDA 工具包 11.4 |
1.12.0 | NVIDIA HPC SDK 21.3 |
1.11.0 | CUDA 工具包 11.3 |
1.10.0 | NVIDIA HPC SDK 20.9 和 CUDA 工具包 11.2 |
1.9.10-1 | NVIDIA HPC SDK 20.7 和 CUDA 工具包 11.1 |
1.9.10 | NVIDIA HPC SDK 20.5 |
1.9.9 | CUDA工具包11.0 |
1.9.8-1 | NVIDIA HPC SDK 20.3 |
1.9.8 | CUDA 工具包 11.0 抢先体验 |
1.9.8 | CUDA 11.0 抢先体验 |
1.8.0 | |
1.7.5 | 推力1.9.2 |
1.7.4 | 推力1.9.1-2 |
1.7.3 | |
1.7.2 | |
1.7.1 | |
1.7.0 | 推力1.9.0-5 |
1.6.4 | |
1.6.3 | |
1.6.2(之前为 1.5.5) | |
1.6.1(之前为 1.5.4) | |
1.6.0(之前的 1.5.3) | |
1.5.2 | |
1.5.1 | |
1.5.0 | |
1.4.1 | |
1.4.0 | |
1.3.2 | |
1.3.1 | |
1.3.0 | |
1.2.3 | |
1.2.2 | |
1.2.0 | |
1.1.1 | |
1.0.2 | |
1.0.1 | |
0.9.4 | |
0.9.2 | |
0.9.1 | |
0.9.0 |
CUB 和推力相互依赖。建议克隆 Thrust 并将 CUB 构建为 Thrust 的组件。
CUB 使用 CMake 构建系统来构建单元测试、示例和标头测试。要作为开发人员构建 CUB,应遵循以下秘诀:
# Clone Thrust and CUB from Github. CUB is located in Thrust's
# `dependencies/cub` submodule.
git clone --recursive https://github.com/NVIDIA/thrust.git
cd thrust
# Create build directory:
mkdir build
cd build
# Configure -- use one of the following:
cmake -DTHRUST_INCLUDE_CUB_CMAKE=ON .. # Command line interface.
ccmake -DTHRUST_INCLUDE_CUB_CMAKE=ON .. # ncurses GUI (Linux only)
cmake-gui # Graphical UI, set source/build directories and options in the app
# Build:
cmake --build . -j < num jobs > # invokes make (or ninja, etc)
# Run tests and examples:
ctest
默认情况下,以 C++14 标准为目标,但这可以在 CMake 中更改。有关配置 CUB 构建和创建拉取请求的更多信息,请参阅 CONTRIBUTING.md。
CUB 可在“New BSD”开源许可证下使用:
Copyright (c) 2010-2011, Duane Merrill. All rights reserved.
Copyright (c) 2011-2018, NVIDIA CORPORATION. All rights reserved.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:
* Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
* Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
* Neither the name of the NVIDIA CORPORATION nor the
names of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL NVIDIA CORPORATION BE LIABLE FOR ANY
DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.