WHEN SCALING MEETS LLM FINETUNING: THE EFFECT OF DATA, MODEL AND FINETUNING METHOD
This paper provides a systematic study on how different scaling factors affect the performance of large language model (LLM) finetuning. Specifically, it explores the impact of factors like LLM model size, pretraining data size, finetuning data size, and tunable parameter sizes for parameter efficient tuning (PET) methods such as prompt tuning and LoRA.
Downstream Tasks
In order to be able to assess the impact of various factors for fine-tuning, the authors choose the machine translation and multilingual summarization as downstream tasks. More exactly, they pre-train multiple English-German and English-Chinese bilingual LLMs with sizes ranging from 1B to 16B parameters on around 280B tokens as described below:
Minimize image
Edit image
Delete image
These models are then finetuned on downstream tasks of machine translation and multilingual summarization using full-model tuning (FMT) that updates all parameters or parameter efficient methods (PET) like such as Prompt Tuning or LoRA. The finetuning data regime varies from thousands to millions of examples. A novel multiplicative joint scaling law is proposed to characterize the interaction between finetuning data size and factors like LLM size. Extensive experiments are performed to derive empirical scaling trends.
Results
Now let’s look at some key results:
- “LLM finetuning benefits more from LLM model scaling than pretraining data scaling across tasks and methods“
Minimize image
Edit image
Delete image
- “Scaling PET parameters is ineffective, delivering limited gains for both LoRA and Prompt”
Minimize image
Edit image
Delete image
- “Finetuning data have more pronounced influence on FMT than PET, where LoRA scales better than Prompt”
Minimize image
Edit image
Delete image
- “PET depends more on LLM model and pretraining data scaling than finetuning data scaling across settings”
Conclusion
In this paper, the authors study the scaling for LLM finetuning, considering different factors such as LLM model size, pretraining data size, finetuning data size, PET parameter size. The results show that increasing LLM model size has a higher impact on finetuning than pretraining data scaling. Also, scaling PET parameter is ineffective. More details in the full paper.
Congrats to the authors for their work!
Zhang, Biao et al. “When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method.” (2024).