Skip to content

Cloud Run

Core Concepts

Cloud Run is a fully managed serverless platform for running stateless containers. Deploy any containerized application that responds to HTTP requests without managing infrastructure.

Key Principle: Containers that scale to zero, pay per request, any language/library.

Cloud Run vs Alternatives

Feature Cloud Run App Engine Standard Cloud Functions GKE Autopilot
Unit Container Runtime sandbox Function Pod
Languages Any (container) Specific versions Specific versions Any
Scale to zero Yes Yes Yes No (min 1)
Cold start ~1s ~100ms ~1s N/A
Multi-region Yes (managed) No No Manual
Control Container-level Runtime-level Function-level Full K8s

When to Use Cloud Run

✅ Use When

  • Stateless HTTP workloads (APIs, web apps)
  • Any language/framework (containerized)
  • Multi-region deployment needed
  • Scale to zero desired
  • Pay-per-request model preferred
  • Existing containers to deploy

❌ Don’t Use When

  • Long-running background jobs → Cloud Tasks + Cloud Run or Compute Engine
  • WebSocket/streaming needed → GKE
  • Stateful applications → GKE + StatefulSets
  • Sub-100ms latency critical → Keep warm instances

Architecture Patterns

Multi-Region Deployment

Global Load Balancer:

Global LB → Cloud Run (us-central1)
         → Cloud Run (europe-west1)
         → Cloud Run (asia-east1)

Benefits: Low latency globally, high availability, automatic failover

Service-to-Service Communication

Pattern: Cloud Run services calling each other

Authentication: Service accounts + IAM, no public access

Event-Driven Architecture

Triggers:

  • Pub/Sub push subscriptions
  • Cloud Storage events (via Eventarc)
  • Cloud Scheduler (cron)
  • Direct HTTP invocation

Key Features

Autoscaling

Configuration:

  • Min instances: 0-1000 (0 for scale to zero)
  • Max instances: 1-1000
  • Concurrency: 1-1000 requests per instance

Strategy: Min 0 for cost, min >0 for latency

Request Timeout

  • Default: 5 minutes
  • Max: 60 minutes
  • Configurable per service

CPU Allocation

Always allocated (default): CPU always available, faster response Allocated during request: CPU only during request, cheaper

Decision: Always allocated for low latency, request-only for cost

Networking

Ingress:

  • All: Public internet
  • Internal: VPC only
  • Internal + Cloud Load Balancing: VPC + global LB

Egress: Via VPC connector for private access

Concurrency Model

Default: 80 concurrent requests per instance

High concurrency (80-1000): Fewer instances, lower cost, shared memory Low concurrency (1-10): More instances, isolation, predictable performance

Decision: High for stateless, low for resource-intensive

Security

Default:

  • HTTPS only, managed certificates
  • Require authentication by default
  • Per-service IAM permissions

Allow unauthenticated: allUsers invoker role (public APIs)

VPC Service Controls: Perimeter-based security

Cost Model

Pricing:

  • Request: $0.40 per million
  • CPU time: $0.00002400 per vCPU-second
  • Memory: $0.00000250 per GiB-second
  • Free tier: 2M requests, 360k GiB-seconds/month

Optimization:

  • Scale to zero when idle
  • Right-size CPU/memory
  • CPU allocation during request only
  • High concurrency

Cloud Run Jobs

Purpose: Run containers to completion (batch, data processing)

Differences from Services:

  • No HTTP endpoint
  • Run to completion, then stop
  • Parallel execution support
  • Scheduled via Cloud Scheduler

Use cases: ETL, batch processing, scheduled tasks

Integration Patterns

With Cloud Services:

  • Cloud SQL: Direct connection via Unix socket
  • Firestore: Native client libraries
  • Pub/Sub: Push subscriptions trigger Cloud Run
  • Cloud Storage: Eventarc for bucket events
  • Secret Manager: Environment variables from secrets

Service Mesh: Cloud Run integrates with Anthos Service Mesh (advanced)

Common Patterns

API Gateway Pattern

API Gateway (Cloud Endpoints) → Cloud Run (backend)

Fan-Out Pattern

Cloud Run (coordinator) → Pub/Sub → Multiple Cloud Run services

Async Processing

Cloud Run (web) → Pub/Sub → Cloud Run (worker)

Limitations

  • 60m max request timeout
  • 32 GiB max memory per instance
  • 4 vCPU max per instance
  • Stateless (local disk ephemeral)
  • Cold starts (~1s typical)

Migration Paths

From App Engine: Containerize app, deploy to Cloud Run From GKE: Extract stateless services to Cloud Run From Compute Engine: Containerize, remove state

Exam Focus

Design Decisions

  • Cloud Run vs App Engine vs Cloud Functions
  • When to use multi-region
  • Concurrency configuration
  • CPU allocation strategy

Architecture

  • Service-to-service auth
  • Event-driven patterns
  • Global deployment
  • VPC integration

Cost Optimization

  • Scale to zero
  • CPU allocation modes
  • Right-sizing
  • Concurrency tuning

Security

  • IAM authentication
  • Public vs private services
  • VPC Service Controls
  • Secret management

Limitations

  • Request timeout
  • Stateless requirement
  • Cold start consideration
  • Resource limits