Training and optimization of large language models (LLMs) are key challenges in the field of artificial intelligence. Efficient training methods and model output that conforms to human values are crucial. Reinforcement learning and human feedback (RLHF) is a mainstream LLM training method. Although it is widely used, there is still room for improvement in efficiency and scalability. For this purpose, the ByteDance Doubao Big Model Team opened the RLHF framework called HybridFlow, aiming to solve these problems and bring new possibilities to LLM training. Through innovative design, this framework improves the efficiency and flexibility of LLM training.
Big models (LLM) such as GPT and Llama have set off a revolution in the field of artificial intelligence, but how to efficiently train these huge models and make them in line with human values remains a difficult problem.
Reinforcement learning and human feedback (RLHF) has been widely used in recent years as an important LLM training method, but the traditional RLHF framework has limitations in flexibility, efficiency and scalability.
To solve these problems, ByteDance Doubao Big Model Team open source the RLHF framework called HybridFlow, bringing new possibilities to LLM training.
RLHF usually consists of three stages:
First, the actor model generates text based on the input prompts; then, the critical model, reference model and reward model evaluate the generated text and calculates the corresponding value, reference probability and reward value;
Finally, these evaluation results are used to train the actor model to generate text that is more in line with human preferences. Traditional RLHF frameworks usually use a single controller to manage the entire data stream, but this is inefficient for LLMs that require distributed computing.
The HybridFlow framework innovatively combines single and multi-controller modes and decouples complex computing and data dependencies through hierarchical API design, enabling flexible representation and efficient execution of RLHF data streams.
The advantages of HybridFlow are mainly reflected in the following three aspects:
Flexible support for a variety of RLHF algorithms and models: HybridFlow provides a modular API, where users can easily implement and extend various RLHF algorithms, such as PPO, ReMax and Safe-RLHF.
Efficient model weight reorganization: The 3D-HybridEngine component supports actor models to efficiently reorganize model weights during the training and generation stages, minimizing memory redundancy and communication overhead.
Automated model deployment and parallel policy selection: Auto Mapping components can automatically map models to different devices based on model load and data dependencies and select the best parallel policy, thereby simplifying the model deployment process and improving training efficiency.
Experimental results show that HybridFlow's throughput increases significantly when running various RLHF algorithms, up to 20.57 times. HybridFlow's open source will provide powerful tools for RLHF research and development to promote the development of LLM technology in the future.
Paper address: https://arxiv.org/pdf/2409.19256
The open source of the HybridFlow framework provides an effective way to improve the LLM training process. Its efficiency and flexibility will promote the further development of RLHF technology and promote the birth of a stronger and more in line with human values. We look forward to HybridFlow playing a greater role in future LLM research.