The training and optimization of large-scale language models (LLMs) are key challenges in the field of artificial intelligence. Efficient training methods not only need to ensure the performance of the model, but also ensure that it is consistent with human values. Reinforcement learning with human feedback (RLHF), as an effective LLM training method, has been widely used in recent years, but its efficiency and scalability still need to be improved. For this purpose, the ByteDance Doubao Big Model Team has open sourced an RLHF framework called HybridFlow, which aims to solve the limitations of the traditional RLHF framework and bring new breakthroughs to LLM training.
RLHF usually consists of three stages: first, the actor model generates text according to the input prompts; then, the critic model, reference model and reward model evaluate the generated text and calculate the corresponding value, reference probability and reward value; finally, These evaluation results are used to train the actor model to generate text that is more consistent with human preferences. Traditional RLHF frameworks usually adopt a single controller to manage the entire data flow, but this is inefficient for LLM that requires distributed computing.
The HybridFlow framework innovatively combines single-controller and multi-controller modes and decouples complex calculations and data dependencies through layered API design to achieve flexible representation and efficient execution of RLHF data flows.
The advantages of HybridFlow are mainly reflected in the following three aspects:
Flexible support for multiple RLHF algorithms and models: HybridFlow provides a modular API so that users can easily implement and extend various RLHF algorithms, such as PPO, ReMax, and Safe-RLHF.
Efficient model weight reorganization: The 3D-HybridEngine component supports efficient model weight reorganization of actor models during the training and generation stages, minimizing memory redundancy and communication overhead.
Automated model deployment and parallel strategy selection: The Auto Mapping component can automatically map models to different devices based on model load and data dependencies, and select the best parallel strategy, thereby simplifying the model deployment process and improving training efficiency.
Experimental results show that HybridFlow's throughput is significantly improved when running various RLHF algorithms, up to 20.57 times. The open source of HybridFlow will provide a powerful tool for RLHF research and development and promote the development of future LLM technology.
Paper address: https://arxiv.org/pdf/2409.19256
The editor of Downcodes concluded: The open source of the HybridFlow framework provides new ideas and tools for the training of large-scale language models. Its efficiency and flexibility are expected to promote the further development of LLM technology and deserve attention and in-depth research. We look forward to seeing more innovative applications based on HybridFlow in the future.