PaddleNLP 2.0 is the core library of the Paddle Ecosystem in the text field. It has three major features: easy-to-use text field API, multi-scenario application examples, and high-performance distributed training. It aims to improve developers’ development efficiency in the text field and provide Best practices for NLP tasks in the core framework of Flying Paddle 2.0.
Easy-to-use text field API
Provides domain APIs from data loading, text preprocessing, model network evaluation, to inference acceleration: Dataset API that supports loading of rich Chinese data sets; Data API that completes data preprocessing flexibly and efficiently; Transformer that provides 60+ pre-trained models APIs, etc., can greatly improve the efficiency of NLP task modeling and iteration.
Application examples for multiple scenarios
Covering NLP application examples from academic to industrial levels, covering NLP basic technology, NLP core technology, NLP system application and related expanded applications. It is fully developed based on the new API system of Flying Paddle Core Framework 2.0, providing development with the best practices of Flying Paddle 2.0 framework in the text field.
High performance distributed training
Based on the leading automatic mixed precision optimization strategy of the Flying Paddle core framework, combined with the distributed Fleet API, it supports a 4D hybrid parallel strategy and can efficiently complete model training of very large-scale parameters.