Prompt Engineering¶

1. What is Prompt Engineering?¶

A prompt is the input you give to an LLM — a question, instruction, or piece of text that tells the model what to do.

Prompt Engineering is the practice of designing and refining prompts to get the most accurate, relevant, and useful outputs from a model.

Why does it matter?¶

The same model can produce drastically different results depending on how you phrase your input:

Prompt	Output Quality
`"Summarize this"`	Vague, possibly unhelpful
`"Summarize the following customer complaint in 2 sentences, focusing on the core issue and the customer's emotional tone."`	Precise, useful, actionable

Prompt engineering is a skill — and for GAIL, it’s tested both as a concept and as a practical technique for improving GenAI applications.

2. Anatomy of a Good Prompt¶

A well-structured prompt typically contains some or all of these elements:

┌─────────────────────────────────────────────┐
│  SYSTEM CONTEXT (optional)                  │
│  "You are a helpful customer support agent" │
├─────────────────────────────────────────────┤
│  INSTRUCTION                                │
│  "Classify the sentiment of this review"   │
├─────────────────────────────────────────────┤
│  EXAMPLES (optional)                        │
│  "Positive: 'Great product!' → positive"   │
├─────────────────────────────────────────────┤
│  INPUT DATA                                 │
│  "Review: 'Arrived late and broken.'"      │
├─────────────────────────────────────────────┤
│  OUTPUT FORMAT (optional)                   │
│  "Respond with: positive / negative /      │
│   neutral only."                           │
└─────────────────────────────────────────────┘

Not every prompt needs all sections — but including more structure generally improves results.

3. Prompting Techniques¶

3.1 Zero-Shot Prompting¶

You give the model a task with no examples. You rely entirely on the model’s pre-trained knowledge.

Prompt:
"Translate the following sentence to Italian:
'The meeting starts at 9am.'"

Output:
"La riunione inizia alle 9."

When to use: Simple, well-defined tasks where the model is already capable.

Limitation: For complex or domain-specific tasks, the model may misinterpret what you want without examples.

3.2 Few-Shot Prompting¶

You provide a small number of examples (typically 2–5) inside the prompt, showing the model the pattern you expect.

Prompt:
"Classify the sentiment of each review.

Review: 'Absolutely love this product!' → positive
Review: 'Waste of money, broke after a week.' → negative
Review: 'It works fine, nothing special.' → neutral

Review: 'Shipping was fast but the packaging was damaged.' → "

Output:
"negative"

When to use: When the task has a specific format, style, or domain the model needs to match. Very effective for classification, extraction, and transformation tasks.

Key insight: The examples in few-shot prompting act as in-context training — the model adapts its behavior without any actual retraining.

3.3 Chain-of-Thought (CoT) Prompting¶

You instruct the model to think step by step before giving a final answer. This dramatically improves performance on reasoning, math, and multi-step logic tasks.

Without CoT:

Prompt: "Roger has 5 tennis balls. He buys 2 more cans of 3 balls each. How many does he have?"
Output: "11"  ← correct but arrived at by chance

With CoT:

Prompt: "Roger has 5 tennis balls. He buys 2 more cans of 3 balls each.
How many does he have? Think step by step."

Output:
"Roger starts with 5 balls.
He buys 2 cans × 3 balls = 6 balls.
5 + 6 = 11 balls total."

Why it works: Forcing the model to externalize reasoning reduces errors by making each step explicit and checkable.

Variants: - Zero-shot CoT: Just add “Think step by step” or “Let’s reason through this” - Few-shot CoT: Provide full worked examples with step-by-step reasoning shown

When to use: Math problems, logic puzzles, multi-step planning, complex decision-making questions.

3.4 System Prompting¶

A system prompt is a set of instructions given to the model before the user interaction begins, typically by the developer. It sets the persona, tone, rules, and constraints.

System: "You are a concise technical assistant for platform engineers.
         Always respond in bullet points. Never discuss pricing.
         If asked something outside your scope, say 'I can't help with that'."

User: "What's the best way to monitor a Kubernetes cluster?"

Output: Bullet-pointed, technical response, on-topic.

When to use: In applications and products where you need consistent model behavior across all user interactions. Critical for enterprise GenAI deployments.

3.5 Role Prompting¶

Assign the model a specific persona or role to influence its style and depth of response.

"You are an experienced cardiologist explaining heart disease to a patient
with no medical background. Use simple language and avoid jargon."

Why it works: The model has seen vast amounts of text associated with different roles and adjusts its style accordingly.

3.6 Retrieval-Augmented Generation (RAG) Prompting¶

Inject external, retrieved information directly into the prompt so the model grounds its answer in specific documents rather than relying on training knowledge alone.

Context: [Retrieved document: Company Q3 earnings report]
Question: "What was the revenue growth in Q3?"
Instruction: "Answer based only on the provided context."

RAG is covered in depth in the Grounding study notes.

4. Output Controlling Parameters¶

These are settings you can adjust when calling a model via API (e.g., in Vertex AI or Google AI Studio) to control the style and randomness of the output.

4.1 Temperature¶

Controls the randomness / creativity of the output.

Range: typically 0.0 to 2.0
Low temperature (0.0–0.3): Deterministic, predictable, always picks the most likely next token. Good for factual tasks.
High temperature (0.8–2.0): More random, creative, diverse. Good for brainstorming or creative writing.

Temperature = 0.1 → "The capital of France is Paris."
Temperature = 1.5 → "Ah, France! A land of croissants, where Paris dreams in cobblestones..."

Analogy: Temperature is like a dial that goes from “always safe choice” to “roll the dice.”

4.2 Top-P (Nucleus Sampling)¶

Controls output diversity by limiting the model to choosing from the smallest set of tokens whose cumulative probability reaches P.

Range: 0.0 to 1.0
Top-P = 0.1: Only considers the top 10% most likely tokens → very focused
Top-P = 0.9: Considers a broader set → more varied output

How it differs from temperature: - Temperature scales probabilities up/down before sampling - Top-P cuts off the long tail of unlikely tokens entirely

In practice: Temperature and Top-P are often used together. A common default is temperature=0.7, top_p=0.9.

4.3 Top-K¶

Limits sampling to the K most likely next tokens.

Top-K = 1 → always picks the single most likely token (greedy, deterministic)
Top-K = 40 → randomly selects from the 40 most probable next tokens

Less commonly tuned than temperature or top-p, but available in Vertex AI.

4.4 Max Output Tokens¶

Sets a hard limit on the length of the response.

Prevents unexpectedly long (and costly) outputs
Too low and the model may cut off mid-sentence
Set based on your use case: summaries need fewer tokens than full reports

4.5 Stop Sequences¶

A string or list of strings that tell the model to stop generating when encountered.

stop_sequences = ["\n\n", "END"]

Useful when you want structured outputs and need to prevent the model from generating extra content beyond a delimiter.

Parameter Quick Reference¶

Parameter	Controls	Low value	High value
Temperature	Randomness	Deterministic, focused	Creative, varied
Top-P	Token pool breadth	Very focused	More diverse
Top-K	Number of token candidates	Narrow, predictable	Broader options
Max tokens	Response length	Short	Long
Stop sequences	Where to stop	N/A	N/A

5. Prompt Engineering Best Practices¶

Be specific and detailed¶

Vague prompts produce vague answers.

❌ "Write about dogs"
✅ "Write a 150-word paragraph for a pet adoption website about the benefits
    of adopting adult dogs, targeting first-time pet owners."

Specify the output format¶

✅ "Respond as a JSON object with keys: 'sentiment', 'confidence', 'reason'."
✅ "Use bullet points. Maximum 5 items."
✅ "Answer in one sentence."

Use positive instructions over negative ones¶

❌ "Don't be verbose"
✅ "Be concise. Maximum 3 sentences."

Iterate and test¶

Prompt engineering is empirical — try, measure, refine. There’s no perfect formula.

Separate instructions from data clearly¶

✅ "Summarize the following article. Article: [article text here]"

Using labels like Article:, Question:, Context: reduces ambiguity.

6. Common Prompting Pitfalls¶

Pitfall	Problem	Fix
Ambiguous task	Model guesses what you want	Be explicit about task and format
No context	Generic, surface-level response	Provide relevant background
Prompt injection	Malicious input overrides your instructions	Use system prompts and input validation
Overloading one prompt	Too many tasks at once	Break complex tasks into steps
Ignoring temperature	Factual task with high temp → hallucinations	Lower temperature for factual outputs

7. Prompt Engineering in Google Cloud¶

In the Google ecosystem, you can experiment with prompts using:

Tool	Purpose
Google AI Studio	Free, browser-based prompt playground for Gemini models
Vertex AI Prompt Management	Store, version, and deploy prompts in production
Vertex AI Studio	Enterprise-grade prompt testing and model tuning
NotebookLM	Document-grounded AI — prompting against your own documents

Prompt types in Vertex AI / AI Studio¶

Freeform prompt: Open-ended, conversational
Structured prompt: Includes examples (few-shot), context, and explicit instructions
Chat prompt: Multi-turn conversation with system instructions

8. Key Vocabulary Cheat Sheet¶

Term	Definition
Prompt	The input you provide to an LLM
Zero-shot	Task given with no examples
Few-shot	Task given with a small number of examples
Chain-of-thought	Prompting the model to reason step by step
System prompt	Pre-conversation instructions that set model behavior
Temperature	Controls randomness of output (0 = deterministic, high = creative)
Top-P	Limits token selection to a cumulative probability threshold
Top-K	Limits token selection to the K most probable tokens
Max tokens	Maximum length of the model’s response
Stop sequence	String that signals the model to stop generating
In-context learning	Model adapts behavior based on examples in the prompt
Prompt injection	Malicious input designed to override system instructions
Hallucination	Confident but incorrect model output
Grounding	Anchoring outputs to verified external data