Fine-tuning¶

1. What is Fine-tuning?¶

Fine-tuning is the process of taking a pre-trained foundation model and continuing to train it on a smaller, task-specific dataset to specialize its behavior for a particular domain, style, or task.

Think of it like this: - Pre-training = a person completing a university degree (broad, general knowledge) - Fine-tuning = that same person doing a 3-month internship in a specific company/role (specialized, practical skills)

The model doesn’t forget its general knowledge — it builds on top of it with domain-specific expertise.

2. Why Fine-tune? Prompting vs Fine-tuning¶

Before choosing fine-tuning, it’s important to know when it’s actually needed — because it’s expensive and time-consuming compared to prompting.

Prompting (default approach)¶

You guide the model’s behavior entirely through the input text. No training required.

Prompt: "You are a legal assistant. Respond formally and cite relevant laws."

Pros: Fast, flexible, no cost beyond API calls, easy to change. Cons: Prompt takes up context window space; behavior can be inconsistent; may not capture highly specialized knowledge.

Fine-tuning (when prompting isn’t enough)¶

You actually modify the model’s weights using new training data.

Pros: More consistent behavior, better performance on specialized tasks, no need to repeat instructions in every prompt, can compress domain knowledge into the model. Cons: Requires labeled training data, time, compute cost, and maintenance.

Decision guide: Prompt or Fine-tune?¶

Situation	Use
General task the model already does well	Prompting
Need consistent tone/style across all responses	Fine-tuning
Domain-specific vocabulary the model doesn’t know	Fine-tuning
Task requires proprietary knowledge	Fine-tuning + Grounding
Prototype or early-stage product	Prompting
High-volume production system needing efficiency	Fine-tuning
Model needs to follow a very specific output format	Fine-tuning

3. How Fine-tuning Works¶

Step 1 — Start with a foundation model¶

You begin with a pre-trained model (e.g., Gemini) that already understands language, reasoning, and general knowledge.

Step 2 — Prepare your dataset¶

Collect and clean a dataset of input-output pairs relevant to your task.

Input:  "What is the coverage limit for flood damage?"
Output: "According to Policy Section 4.2, flood damage is covered up to $250,000..."

Input:  "Is earthquake damage included?"
Output: "Earthquake damage is not covered under standard policies. Refer to Section 7..."

The quality and diversity of this dataset directly determines how good the fine-tuned model will be.

Rule of thumb: You typically need hundreds to thousands of high-quality examples for effective fine-tuning.

Step 3 — Train¶

The model is trained on your dataset. Rather than training from scratch, the model makes small adjustments to its weights to better handle your specific examples. This is much faster and cheaper than pre-training.

Step 4 — Evaluate¶

Measure the fine-tuned model’s performance on a held-out test set. Compare against the base model and against prompt-only approaches.

Step 5 — Deploy¶

Deploy the fine-tuned model via the API. On Vertex AI, fine-tuned models are hosted as dedicated endpoints.

4. Types of Fine-tuning¶

4.1 Supervised Fine-tuning (SFT)¶

The most common type. You provide labeled input-output pairs and the model learns to produce the desired output given an input.

Use cases: Customer support bots, domain-specific Q&A, classification, structured output generation.

4.2 RLHF (Reinforcement Learning from Human Feedback)¶

Human raters score model outputs. The model is trained to maximize scores. This is how Google aligns Gemini to be helpful, safe, and accurate.

Use cases: Improving general response quality, reducing harmful outputs, aligning to brand voice.

RLHF is more commonly used by model providers (like Google) than by enterprises deploying models.

4.3 PEFT — Parameter-Efficient Fine-tuning¶

Instead of updating all of the model’s billions of parameters (which is expensive), PEFT techniques update only a small subset of parameters.

The most popular PEFT method is LoRA (Low-Rank Adaptation): - Adds small, trainable matrices alongside frozen original weights - Dramatically reduces memory and compute requirements - Quality is close to full fine-tuning at a fraction of the cost

Why it matters for GAIL: Vertex AI supports PEFT/LoRA-based fine-tuning, making it accessible to enterprises without massive GPU budgets.

4.4 Distillation¶

A large, expensive model (the “teacher”) is used to train a smaller, faster model (the “student”) to mimic its behavior.

Use case: You want a cost-efficient model for high-volume inference that behaves like a larger model.

5. Fine-tuning vs Other Adaptation Techniques¶

Technique	Changes model weights?	Requires training data?	Cost	Best for
Prompting	❌ No	❌ No	Low	General tasks
Few-shot prompting	❌ No	✅ Examples in prompt	Low	Quick task adaptation
RAG / Grounding	❌ No	✅ Documents	Low-Medium	Dynamic, factual knowledge
Fine-tuning (SFT)	✅ Yes	✅ Labeled pairs	Medium-High	Consistent specialized behavior
PEFT / LoRA	✅ Partially	✅ Labeled pairs	Medium	Efficient specialization
Pre-training from scratch	✅ Yes (all)	✅ Massive corpus	Very High	Building a new foundation model

6. Fine-tuning on Google Cloud (Vertex AI)¶

Google Cloud exposes fine-tuning capabilities through Vertex AI:

Supervised Tuning (Vertex AI)¶

Available for Gemini models
Provide a JSONL dataset of prompt-response pairs
Google manages the training infrastructure
The result is a tuned model version you can deploy as an endpoint

Reinforcement Tuning¶

Available for Gemini models via Vertex AI
Uses reward signals to align the model to specific goals

Model Garden¶

Vertex AI Model Garden offers access to open-source models (Llama, Mistral, etc.) that can be fine-tuned on custom data with more control

Key GAIL exam points:¶

Fine-tuning in Vertex AI does not require you to manage infrastructure
Your training data stays within your Google Cloud project (data privacy)
Fine-tuned models are billed differently from base models (dedicated endpoints)

7. Data Requirements for Fine-tuning¶

The dataset is the most critical factor in fine-tuning success.

Quality over quantity¶

A few hundred high-quality, diverse examples often outperform thousands of noisy ones.

Dataset format (Vertex AI)¶

{"input_text": "Classify this review: 'Great product!'", "output_text": "positive"}
{"input_text": "Classify this review: 'Terrible, broke on day 1'", "output_text": "negative"}

Common data pitfalls¶

Biased data → biased model (see Responsible AI notes)
Too narrow → model overfits, fails on edge cases
Inconsistent labels → model learns contradictions
Too little data → minimal improvement over base model

8. When Fine-tuning is NOT the Right Answer¶

Fine-tuning is often over-used. Know when to avoid it:

Situation	Better alternative
You just need a different tone	System prompt / role prompting
You need up-to-date or dynamic knowledge	RAG / Grounding
You’re still in prototyping	Few-shot prompting
Your task changes frequently	Prompting (fine-tuning is static)
You don’t have labeled training data	Prompting or RAG

9. Key Vocabulary Cheat Sheet¶

Term	Definition
Fine-tuning	Further training a pre-trained model on a task-specific dataset
SFT	Supervised Fine-Tuning — training on labeled input-output pairs
RLHF	Reinforcement Learning from Human Feedback — trains on human preference scores
PEFT	Parameter-Efficient Fine-Tuning — updates only a small subset of weights
LoRA	Low-Rank Adaptation — popular PEFT method, cheap and effective
Distillation	Training a small model to mimic a larger one
Overfitting	Model memorizes training data, fails on new inputs
Tuned model	A fine-tuned version of a base model deployed as its own endpoint
Base model	The original, unmodified foundation model
JSONL	JSON Lines format — standard for Vertex AI fine-tuning datasets