Skip to content

Fine-tuning

1. What is Fine-tuning?

Fine-tuning is the process of taking a pre-trained foundation model and continuing to train it on a smaller, task-specific dataset to specialize its behavior for a particular domain, style, or task.

Think of it like this: - Pre-training = a person completing a university degree (broad, general knowledge) - Fine-tuning = that same person doing a 3-month internship in a specific company/role (specialized, practical skills)

The model doesn’t forget its general knowledge — it builds on top of it with domain-specific expertise.


2. Why Fine-tune? Prompting vs Fine-tuning

Before choosing fine-tuning, it’s important to know when it’s actually needed — because it’s expensive and time-consuming compared to prompting.

Prompting (default approach)

You guide the model’s behavior entirely through the input text. No training required.

Prompt: "You are a legal assistant. Respond formally and cite relevant laws."

Pros: Fast, flexible, no cost beyond API calls, easy to change. Cons: Prompt takes up context window space; behavior can be inconsistent; may not capture highly specialized knowledge.


Fine-tuning (when prompting isn’t enough)

You actually modify the model’s weights using new training data.

Pros: More consistent behavior, better performance on specialized tasks, no need to repeat instructions in every prompt, can compress domain knowledge into the model. Cons: Requires labeled training data, time, compute cost, and maintenance.


Decision guide: Prompt or Fine-tune?

Situation Use
General task the model already does well Prompting
Need consistent tone/style across all responses Fine-tuning
Domain-specific vocabulary the model doesn’t know Fine-tuning
Task requires proprietary knowledge Fine-tuning + Grounding
Prototype or early-stage product Prompting
High-volume production system needing efficiency Fine-tuning
Model needs to follow a very specific output format Fine-tuning

3. How Fine-tuning Works

Step 1 — Start with a foundation model

You begin with a pre-trained model (e.g., Gemini) that already understands language, reasoning, and general knowledge.

Step 2 — Prepare your dataset

Collect and clean a dataset of input-output pairs relevant to your task.

Input:  "What is the coverage limit for flood damage?"
Output: "According to Policy Section 4.2, flood damage is covered up to $250,000..."

Input:  "Is earthquake damage included?"
Output: "Earthquake damage is not covered under standard policies. Refer to Section 7..."

The quality and diversity of this dataset directly determines how good the fine-tuned model will be.

Rule of thumb: You typically need hundreds to thousands of high-quality examples for effective fine-tuning.

Step 3 — Train

The model is trained on your dataset. Rather than training from scratch, the model makes small adjustments to its weights to better handle your specific examples. This is much faster and cheaper than pre-training.

Step 4 — Evaluate

Measure the fine-tuned model’s performance on a held-out test set. Compare against the base model and against prompt-only approaches.

Step 5 — Deploy

Deploy the fine-tuned model via the API. On Vertex AI, fine-tuned models are hosted as dedicated endpoints.


4. Types of Fine-tuning

4.1 Supervised Fine-tuning (SFT)

The most common type. You provide labeled input-output pairs and the model learns to produce the desired output given an input.

Use cases: Customer support bots, domain-specific Q&A, classification, structured output generation.


4.2 RLHF (Reinforcement Learning from Human Feedback)

Human raters score model outputs. The model is trained to maximize scores. This is how Google aligns Gemini to be helpful, safe, and accurate.

Use cases: Improving general response quality, reducing harmful outputs, aligning to brand voice.

RLHF is more commonly used by model providers (like Google) than by enterprises deploying models.


4.3 PEFT — Parameter-Efficient Fine-tuning

Instead of updating all of the model’s billions of parameters (which is expensive), PEFT techniques update only a small subset of parameters.

The most popular PEFT method is LoRA (Low-Rank Adaptation): - Adds small, trainable matrices alongside frozen original weights - Dramatically reduces memory and compute requirements - Quality is close to full fine-tuning at a fraction of the cost

Why it matters for GAIL: Vertex AI supports PEFT/LoRA-based fine-tuning, making it accessible to enterprises without massive GPU budgets.


4.4 Distillation

A large, expensive model (the “teacher”) is used to train a smaller, faster model (the “student”) to mimic its behavior.

Use case: You want a cost-efficient model for high-volume inference that behaves like a larger model.


5. Fine-tuning vs Other Adaptation Techniques

Technique Changes model weights? Requires training data? Cost Best for
Prompting ❌ No ❌ No Low General tasks
Few-shot prompting ❌ No ✅ Examples in prompt Low Quick task adaptation
RAG / Grounding ❌ No ✅ Documents Low-Medium Dynamic, factual knowledge
Fine-tuning (SFT) ✅ Yes ✅ Labeled pairs Medium-High Consistent specialized behavior
PEFT / LoRA ✅ Partially ✅ Labeled pairs Medium Efficient specialization
Pre-training from scratch ✅ Yes (all) ✅ Massive corpus Very High Building a new foundation model

6. Fine-tuning on Google Cloud (Vertex AI)

Google Cloud exposes fine-tuning capabilities through Vertex AI:

Supervised Tuning (Vertex AI)

  • Available for Gemini models
  • Provide a JSONL dataset of prompt-response pairs
  • Google manages the training infrastructure
  • The result is a tuned model version you can deploy as an endpoint

Reinforcement Tuning

  • Available for Gemini models via Vertex AI
  • Uses reward signals to align the model to specific goals

Model Garden

  • Vertex AI Model Garden offers access to open-source models (Llama, Mistral, etc.) that can be fine-tuned on custom data with more control

Key GAIL exam points:

  • Fine-tuning in Vertex AI does not require you to manage infrastructure
  • Your training data stays within your Google Cloud project (data privacy)
  • Fine-tuned models are billed differently from base models (dedicated endpoints)

7. Data Requirements for Fine-tuning

The dataset is the most critical factor in fine-tuning success.

Quality over quantity

A few hundred high-quality, diverse examples often outperform thousands of noisy ones.

Dataset format (Vertex AI)

{"input_text": "Classify this review: 'Great product!'", "output_text": "positive"}
{"input_text": "Classify this review: 'Terrible, broke on day 1'", "output_text": "negative"}

Common data pitfalls

  • Biased data → biased model (see Responsible AI notes)
  • Too narrow → model overfits, fails on edge cases
  • Inconsistent labels → model learns contradictions
  • Too little data → minimal improvement over base model

8. When Fine-tuning is NOT the Right Answer

Fine-tuning is often over-used. Know when to avoid it:

Situation Better alternative
You just need a different tone System prompt / role prompting
You need up-to-date or dynamic knowledge RAG / Grounding
You’re still in prototyping Few-shot prompting
Your task changes frequently Prompting (fine-tuning is static)
You don’t have labeled training data Prompting or RAG

9. Key Vocabulary Cheat Sheet

Term Definition
Fine-tuning Further training a pre-trained model on a task-specific dataset
SFT Supervised Fine-Tuning — training on labeled input-output pairs
RLHF Reinforcement Learning from Human Feedback — trains on human preference scores
PEFT Parameter-Efficient Fine-Tuning — updates only a small subset of weights
LoRA Low-Rank Adaptation — popular PEFT method, cheap and effective
Distillation Training a small model to mimic a larger one
Overfitting Model memorizes training data, fails on new inputs
Tuned model A fine-tuned version of a base model deployed as its own endpoint
Base model The original, unmodified foundation model
JSONL JSON Lines format — standard for Vertex AI fine-tuning datasets