Skip to content

Section 5 – Optimizing Performance and Cost

5.1 – Application Performance Monitoring

Cloud Profiler

  • Continuous CPU and memory profiling in production (low overhead ~0.5%)
  • Languages: Go, Python, Java, Node.js, Ruby, PHP
  • Identifies: hot functions, memory allocation hotspots
  • No sampling configuration needed — automatic continuous profiling
# Python — add profiler to app
pip install google-cloud-profiler
import googlecloudprofiler
googlecloudprofiler.start(
    service='my-service',
    service_version='1.0.0',
    verbose=3
)

Active Assist

  • AI-powered recommendations across GCP services
  • Available for: cost, security, performance, reliability, manageability
  • Access via: Cloud Console → Active Assist, or gcloud recommender
# View cost recommendations
gcloud recommender recommendations list \
  --recommender=google.compute.instance.MachineTypeRecommender \
  --location=us-central1 \
  --project=PROJECT

# View idle resource recommendations
gcloud recommender recommendations list \
  --recommender=google.compute.instance.IdleResourceRecommender \
  --location=global \
  --project=PROJECT

Cloud Run Performance Monitoring

# Monitor cold starts
gcloud logging read 'resource.type="cloud_run_revision"' \
  --filter='jsonPayload.message:"Cold start"' --limit=100

# Key metrics
run.googleapis.com/request_latencies  # Request latency histogram
run.googleapis.com/container/startup_latency  # Cold start duration
run.googleapis.com/request_count  # Request count by response code

5.2 – FinOps Practices

Observability Cost Management

Observability costs often grow unexpectedly. Key cost drivers:

Component Cost driver Optimization
Cloud Logging Log ingestion volume Exclusion filters, reduce sampling
Cloud Monitoring Custom metric samples Reduce cardinality, drop unused metrics
Cloud Trace Trace samples Adjust sampling rate (0.1 = 10%)
BigQuery Log export + query Partition tables, use log-based metrics instead
# Check metric ingestion volume (Metrics Management page)
gcloud monitoring metrics list --project=PROJECT | wc -l

# Exclude high-volume health check logs
gcloud logging sinks update _Default \
  --add-exclusion='name=healthchecks,filter=httpRequest.requestUrl="/health" AND httpRequest.status=200'

# Reduce trace sampling
# In OTel SDK:
from opentelemetry.sdk.trace.sampling import TraceIdRatioBased
sampler = TraceIdRatioBased(0.1)  # Sample 10% of traces

Compute Pricing Models

Sustained Use Discounts (SUD)

  • Automatic — no purchase required
  • Applied to Compute Engine VMs running > 25% of a billing month
  • Incremental discounts: 25%→50%→75%→100% usage = 0%→10%→20%→30% discount
  • Does NOT apply to: App Engine flexible, Dataflow, GPU accelerators (some), E2 machine series

Committed Use Discounts (CUD)

  • Commit to 1 or 3 years of resource usage
  • Two types:
Type Description Discount
Resource-based Commit to specific vCPU/RAM in a region/machine family Up to 57% (1yr) / 70% (3yr)
Spend-based (Flex CUD) Commit to minimum hourly spend 28% (1yr) / 46% (3yr)
  • Flex CUD covers: Compute Engine VMs, GKE Standard, GKE Autopilot, Cloud Run
  • CUDs cannot be cancelled — commit only to your stable baseline
# Purchase a CUD
gcloud compute commitments create my-commitment \
  --plan=12-month \
  --region=us-central1 \
  --resources=vcpu=100,memory=400GB

Spot VMs

  • Up to 91% cheaper than on-demand
  • Can be preempted with 30-second notice when GCP needs capacity
  • No maximum runtime (unlike old preemptible VMs which had 24h max)
  • Good for: batch jobs, CI/CD workers, ML training, data processing
# Create Spot VM
gcloud compute instances create my-spot \
  --machine-type=n2-standard-4 \
  --provisioning-model=SPOT \
  --instance-termination-action=STOP

# Use Spot in GKE node pool
gcloud container node-pools create spot-pool \
  --cluster=my-cluster \
  --spot \
  --num-nodes=3

When to use which

Workload Recommendation
Stable, long-running (prod) Resource CUD (1yr)
Variable, mixed workloads Flex CUD
Batch, CI/CD, ML training Spot VMs
Short-lived variable (< monthly) SUD (auto)
Dev/test On-demand (no commitment)

Network Cost Optimization

Network Tiers

Tier Description Egress cost
Premium Google’s global backbone — low latency Higher
Standard ISP routing — acceptable latency ~30% cheaper
# Set network tier for instance
gcloud compute instances create my-vm \
  --network-tier=STANDARD  # or PREMIUM (default)

Minimize Egress Costs

  • Keep compute and storage in same region → free intra-region traffic
  • Use CDN (Cloud CDN) for static content — reduce origin egress
  • Cloud Interconnect for high-volume on-prem ↔ GCP: cheaper than internet egress
  • Use VPC Flow Logs sampling < 100% — significant logging cost reduction

GKE Cost Optimization

Node Efficiency

# Right-size nodes with VPA
kubectl apply -f vertical-pod-autoscaler.yaml

# Use Autopilot for automatic right-sizing
gcloud container clusters create-auto my-cluster \
  --region=us-central1

# Enable cluster autoscaler with scale-down
gcloud container node-pools update pool \
  --cluster=my-cluster \
  --enable-autoscaling \
  --min-nodes=1 --max-nodes=20

# Spot node pool for non-critical workloads
gcloud container node-pools create spot-pool \
  --cluster=CLUSTER --spot

Namespace-Level Cost Visibility

  • Use GKE Cost Allocation (built-in): allocates costs to namespaces/labels
  • Enable in cluster settings → available in BigQuery export
  • Use Kubecost or custom Prometheus + BigQuery pipeline for detailed breakdowns
# Enable GKE cost allocation
gcloud container clusters update CLUSTER \
  --enable-cost-allocation

Cloud Run Cost Optimization

# Set minimum instances (avoid cold start penalty at cost of always-on)
gcloud run services update myapp --min-instances=1

# Set concurrency high to maximize instance utilization
gcloud run services update myapp --concurrency=1000

# Use CPU always-on only if needed (default: CPU throttled when not serving)
gcloud run services update myapp --no-cpu-throttling  # Always-on CPU (more expensive)

GCP Recommenders (Active Assist)

Recommender What it finds
MachineTypeRecommender Over-provisioned VMs → suggest smaller
IdleResourceRecommender VMs with <1% CPU usage — idle
DiskIdleRecommender Unused persistent disks
AddressIdleRecommender Unused static IPs ($7.20/month each)
SnapshotRecommender Old snapshots costing money
GKENodeRecommender Over/under-provisioned GKE node pools
IAMRecommender Overly broad IAM bindings (security + cost)
FirewallInsightRecommender Unused firewall rules
# Apply a recommendation
gcloud recommender recommendations mark-claimed RECOMMENDATION_ID \
  --recommender=google.compute.instance.MachineTypeRecommender \
  --location=us-central1-a \
  --project=PROJECT

# Mark applied
gcloud recommender recommendations mark-succeeded RECOMMENDATION_ID \
  --recommender=google.compute.instance.MachineTypeRecommender \
  --location=us-central1-a \
  --project=PROJECT

Cost Monitoring and Budgets

# Create budget alert
gcloud billing budgets create \
  --billing-account=BILLING_ACCOUNT_ID \
  --display-name="Prod Monthly Budget" \
  --budget-amount=5000USD \
  --threshold-rule=percent=50 \
  --threshold-rule=percent=80 \
  --threshold-rule=percent=100,basis=FORECASTED_SPEND \
  --filter-projects=projects/prod-project-id

Cost Allocation Labels

# Label resources for cost attribution
gcloud compute instances add-labels my-instance \
  --labels=team=payments,env=prod,cost-center=engineering

# Apply default labels via Org Policy
# constraints/compute.requireShieldedVm + labels via ConstraintCustom

Summary: Key Cost Levers

Priority Action Typical Savings
🔴 High Apply CUDs to stable compute baseline 30-70%
🔴 High Use Spot VMs for batch/CI workloads 60-91%
🟡 Medium Right-size instances (use Recommender) 20-40%
🟡 Medium Delete idle resources (IPs, disks, snapshots) Variable
🟡 Medium Reduce log ingestion (exclusions) 20-50% of logging bill
🟢 Low Enable Cloud CDN for static assets 30-50% of egress
🟢 Low Use Standard network tier where latency allows ~30% of egress

Exam Tips

  • SUD = automatic, no commitment; CUD = manual purchase, ⅓ year commitment
  • Spot VMs = no max runtime (unlike old preemptible 24h limit) + 30s termination notice
  • Flex CUD = spend-based commitment; covers GKE Autopilot + Cloud Run (not just VMs)
  • Recommenders are per-resource-type and per-location — check correct recommender ID
  • GKE Cost Allocation = built-in namespace/label cost breakdown → exports to BigQuery
  • Cloud Profiler = production profiling with <0.5% overhead (safe in prod)
  • Active Assist = umbrella term for all GCP recommenders + insights
  • Idle static IPs cost ~$7.20/month each — small but adds up; use AddressIdleRecommender