Cost Of Fine-Tuning LLMs

Large Language Models (LLMs) are not easy to fine-tune, but some methods (line LoRA - Low-Rank Adaptation of LLMs) can help. Let's see some numbers about costs and hours needed to fine-tune such models.

How many hours?


Considerations

Alpaca (Llama + LoRA) 30B can be trained on a 80GBs A100 (here).
LLaMA-13B outperforms GPT-3(175B) and LLaMA-65B is competitive to PaLM-540M (paper) -> sometimes smaller model trained on more or better data are better than bigger models.

From Llama model paper: When training a 65B-parameter model, our code processes around 380 tokens/sec/GPU on 2048 A100 GPU with 80GB of RAM. This means that training over our dataset containing 1.4T tokens takes approximately ==21 days==.

BLOOM (176B) is really expensive to fine-tune (see here).
BLOOM (560m) can be trained on 130k dataset (squad v2) with a V100 on VertxAI (more info and here).

To run BLOOM (176B) we need ~180GB GPU (here).

I have not found a lot of attempt to use LoRA on BLOOM, this is the only one -> they fine-tuned it on Stanford Alpaca dataset.