2.1 & 2.2 – Designing & Implementing CI/CD Pipelines
Full GCP CI/CD Pipeline Overview
Code Push → Cloud Build Trigger → Build + Test → Push to Artifact Registry
→ Cloud Deploy Release → Rollout to Dev → Promote to Staging
→ Verify/Canary → Manual Approval → Rollout to Prod
→ Cloud Audit Logs / Cloud Monitoring (success metrics)
2.1 – Designing Pipelines
CI Pipeline Design Principles
- Fail fast — lint and unit tests first, expensive tests last
- Build once, deploy many — one image, promoted through envs via tags
- Immutable artifacts — never rebuild; tag by
$COMMIT_SHA
- Parallel steps — use
waitFor in Cloud Build for parallelism
- Cache layers — use Cloud Build caching to speed up builds
CD Pipeline Design with Cloud Deploy
# delivery-pipeline.yaml — serial progression
spec:
serialPipeline:
stages:
- targetId: dev # Auto-deploy on every release
- targetId: staging # Promote after dev passes
profiles: [staging]
- targetId: prod # Requires approval + canary
profiles: [prod]
strategy:
canary:
canaryDeployment:
percentages: [10, 50]
verify: true # Run verify job before advancing
Approval Flows
# Target with required approval
apiVersion: deploy.cloud.google.com/v1
kind: Target
metadata:
name: prod
spec:
requireApproval: true # Reviewer must approve in UI or gcloud
gke:
cluster: projects/PROJECT/locations/us-central1/clusters/prod
# Approve pending rollout
gcloud deploy rollouts approve projects/P/locations/us-central1/deliveryPipelines/app/releases/r1/rollouts/r1-to-prod-0001
Multi-Cloud / Hybrid Deployments
- Cloud Deploy supports Anthos targets (GKE Attached clusters — EKS, AKS, on-prem)
- Use GitHub Actions + Workload Identity Federation for cross-cloud deployments
- Artifact Registry serves images to any environment — no GCR dependency
Deployment Strategies
Rolling Update (default K8s)
- Replace pods one-by-one; controlled by
maxSurge + maxUnavailable
- Zero downtime if
maxUnavailable: 0
- Rollback:
kubectl rollout undo deployment/app
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
Blue/Green
- Two identical environments (blue = current, green = new)
- Switch traffic instantly via Service selector update or Load Balancer backend swap
- Instant rollback — switch selector back to blue
- Cost: 2x resources while green is live
# Green deployment (new version)
apiVersion: apps/v1
kind: Deployment
metadata:
name: app-green
spec:
selector:
matchLabels:
app: myapp
version: green
---
# Switch traffic: update service selector
kubectl patch service myapp -p '{"spec":{"selector":{"version":"green"}}}'
Canary
- Route X% of traffic to new version; gradually increase
- Requires: service mesh (Istio/Cloud Service Mesh) OR Gateway API OR Ingress traffic splitting
- Cloud Run: built-in traffic splitting by revision (simplest canary implementation)
- Cloud Deploy: native canary with configured percentages
# Cloud Run canary via traffic splitting
gcloud run services update-traffic myapp \
--to-revisions=v2=10,v1=90 # 10% to new, 90% to old
Feature Flags
- Decouple deployment from release — code is deployed but feature is off
- Flip flag to enable for subset of users
- GCP tool: no native feature flag service — use LaunchDarkly, Unleash, or custom config via Firestore/App Engine remote config
Comparison Table
| Strategy |
Rollback Speed |
Resources |
Risk |
Best For |
| Rolling |
Medium (pod by pod) |
~1.1x |
Medium |
Standard K8s workloads |
| Blue/Green |
Instant (flip selector) |
2x |
Low |
DB migrations, high-risk releases |
| Canary |
Gradual |
~1.1x |
Very Low |
New features, traffic-tested rollouts |
| Feature Flag |
Instant |
1x |
Very Low |
Gradual user-facing releases |
Success Metrics for Deployments
Cloud Deploy Verification
- Run verify jobs after each canary phase — e.g., run test suite against deployed service
- If verify fails → rollback automatically
# In skaffold.yaml — custom verify job
verify:
- name: verify-integration
container:
name: verify-container
image: test-runner:latest
command: ["/bin/sh"]
args: ["-c", "curl -f http://app/healthz && ./run-smoke-tests.sh"]
SLI-Based Success Metrics
- Monitor error rate, latency p99, availability during rollout
- Set alerting policies on these metrics during canary → alert = rollback signal
- Use burn rate alerts on SLOs to trigger rollback if error budget burns too fast
Auditing and Tracking Deployments
Cloud Audit Logs
- Admin Activity logs: who deployed what (always on, free)
- Data Access logs: who accessed what (must enable, billable)
- System Event logs: GCP system actions
# Query deployment events in Logs Explorer
resource.type="clouddeploy.googleapis.com/DeliveryPipeline"
protoPayload.methodName="CreateRollout"
Artifact Registry Audit Trail
- Every push/pull logged in Cloud Audit Logs
- Image digest (
sha256:...) provides immutable reference — link deployment to exact build
Cloud Build History
- All builds logged with: trigger, commit SHA, steps, duration, success/fail
- Accessible in Cloud Console, gcloud, or via API
# List recent builds
gcloud builds list --limit=20 --filter="status=FAILURE"
# Describe a specific build
gcloud builds describe BUILD_ID
Troubleshooting Deployment Issues
Cloud Build Failures
- Check build logs:
gcloud builds log BUILD_ID
- Common issues:
- Permission denied → SA missing IAM role
- Timeout → increase
timeout in cloudbuild.yaml
- Network issues → enable Private Google Access or use Private Pool
- Docker layer cache miss → check
--cache-from configuration
Cloud Deploy Rollout Failures
- Check rollout status:
gcloud deploy rollouts describe ROLLOUT_NAME
- Check target cluster:
kubectl get events -n default
- Common issues:
- Image not found → check Artifact Registry path and SA permissions
- Verify job failed → review verify job logs in Cloud Logging
- Pending approval → not a failure — needs human action
Rollback
# Roll back to previous release in Cloud Deploy
gcloud deploy rollouts rollback ROLLOUT_NAME \
--delivery-pipeline=my-pipeline \
--region=us-central1
# K8s rollback
kubectl rollout undo deployment/app
kubectl rollout undo deployment/app --to-revision=2
Pipeline for ML Workloads
- Vertex AI Pipelines: managed ML pipeline execution (Kubeflow Pipelines SDK)
- CI/CD for ML = same principles + model-specific steps:
- Data validation → training → model evaluation → registry → deployment
- Vertex AI Model Registry: store/version trained models
- Cloud Deploy can deploy ML models to: Cloud Run, GKE, Vertex AI endpoints
- Success metrics: model accuracy, inference latency, prediction error rate (telemetry-driven)
Exam Tips
waitFor: ['-'] in Cloud Build = run step in parallel with all previous steps
- Cloud Deploy releases = immutable artifact + manifest snapshot; rollouts = deployment events
- Blue/green rollback = near-instant (switch LB/selector); canary rollback = gradual
requireApproval: true blocks automatic promotion — human must approve
- Artifact Registry image digest (sha256) = immutable — always use digest in prod, not
:latest
- Cloud Deploy canary works with: GKE Gateway API, GKE Service Mesh, Cloud Run traffic splitting