Does Large Model Scaling Law also apply to downstream task performance? The latest research from Stanford and Google reveals the secrets

Author：Eve Cole Update Time：2025-02-04 13:48:02

This article explores the impact of pre-training data set size on downstream task performance in large model training, especially the Scaling Law of transfer learning. The researchers analyzed the relationship between pre-training dataset size and downstream task performance (measured as BLEU score and cross-entropy), and proposed two guidelines for evaluating the value of pre-training datasets. The study found that the BLEU score is more consistent with logarithmic scaling, while the correlation of cross-entropy is poor. The effectiveness of the pre-training data set depends on the alignment with the downstream tasks, and an overly large data set may not bring additional improvement.

The success of large models is largely due to the existence of Scaling Law. The researchers explored the Scaling Law of transfer learning and studied two indicators: downstream BLEU score and downstream cross-entropy, and the relationship between the size of the pre-training data set and the performance of downstream tasks after task fine-tuning. Is cross-entropy loss always a good metric? The BLEU score is closer to logarithmic law. The researchers gave two guidelines for evaluating the value of pre-training data sets for target downstream tasks. Experimental results show that pre-training has little improvement on BLEU score, and the Scaling Law applied to BLEU score is different from cross-entropy and perplexity, which follow power-law scaling behavior. The correlation between cross-entropy and BLEU score is not good, and the pre-training data evaluation guide provides an evaluation method for the value of downstream tasks. The impact of a pre-training dataset on task performance depends on the degree of alignment, and a pre-training dataset that is too large may not bring additional improvement. Scaling Law can be used to predict downstream task performance improvements. Whether the Scaling Law can be adapted to the BLEU score indicates how well the pre-training data is aligned with the specific translation task.

In summary, this study reveals the role of Scaling Law in evaluating the effectiveness of large model pre-training data and highlights the importance of selecting appropriate evaluation metrics and considering the degree of alignment of pre-training data with downstream tasks, providing valuable insights for large model training. experience and guidance. Future research can further explore more effective evaluation indicators and methods to better guide the training and optimization of large models.