Database Essentials¶
๐ฏ Heavy Hitters (High Frequency)¶
1. Database Selection (Most Important!)¶
The exam LOVES asking “which database should you use?” - memorize this:
| Database | Use Case | Key Features |
|---|---|---|
| Cloud SQL | Traditional RDBMS, < 64 TB | PostgreSQL, MySQL, SQL Server; ACID; Vertical scaling |
| AlloyDB | High-performance PostgreSQL | 4x faster than Cloud SQL; 100% PostgreSQL compatible; AI/ML features |
| Cloud Spanner | Global RDBMS, > 1 TB | Horizontally scalable; 99.999% SLA; Global consistency |
| Firestore | Document database, mobile/web | Real-time sync; Offline support; NoSQL; Serverless |
| Bigtable | Wide-column NoSQL, IoT/analytics | Petabyte-scale; Low latency (<10ms); Time-series data |
| BigQuery | Data warehouse | Serverless; SQL analytics; Petabyte-scale; Separate storage/compute |
2. Cloud SQL vs AlloyDB vs Cloud Spanner¶
-
Cloud SQL: Traditional workloads, < 64 TB, single region (or cross-region read replicas)
-
Exam Clue: “MySQL/PostgreSQL, moderate scale” โ Cloud SQL
-
AlloyDB: PostgreSQL only, need high performance, < 100 TB
-
Exam Clue: “Migrate PostgreSQL, need better performance” โ AlloyDB
- 4x faster than Cloud SQL for transactional workloads
-
100x faster for analytical queries (columnar engine)
-
Cloud Spanner: Need global distribution, > 1 TB, 99.999% SLA
-
Exam Clue: “Global application, strong consistency, horizontal scaling” โ Spanner
3. Firestore vs Bigtable¶
-
Firestore: Document-based, mobile/web apps, real-time sync
-
Exam Clue: “Mobile app, real-time updates, offline support” โ Firestore
- Max 1 MB per document
-
1 write/sec per document limit
-
Bigtable: Wide-column, IoT/time-series, high throughput
-
Exam Clue: “IoT sensors, time-series, petabyte-scale, low latency” โ Bigtable
- Not for: < 1 TB data, transactional workloads
- Linear scaling: Add nodes for more throughput
๐๏ธ Relational Databases (SQL)¶
Cloud SQL¶
- Engines: PostgreSQL, MySQL, SQL Server
- Max Size: 64 TB
-
Availability:
-
Regional: Single zone or HA (automatic failover)
- Read Replicas: Cross-region for read scaling
- Backup: Automated + on-demand, point-in-time recovery
- Migration: Database Migration Service (DMS)
-
Connections:
-
Public IP + Cloud SQL Auth Proxy
- Private IP (VPC peering)
- Private Service Connect (PSC)
- Exam Tip: Use Private IP or PSC for production workloads
Key Limits:
- Max storage: 64 TB (PostgreSQL/MySQL), 10 TB (SQL Server)
- Max connections: Depends on instance size (~4000 for large instances)
AlloyDB for PostgreSQL¶
- What: Fully managed PostgreSQL-compatible database
- Performance:
- 4x faster transactional workloads vs Cloud SQL
- 100x faster analytical queries (columnar engine)
- Max Size: 64 TB (can request more)
- Availability: 99.99% SLA, regional HA
- Key Features:
- Columnar engine: Run analytics on transactional data (HTAP)
- AI/ML integration: pgvector extension for vector embeddings
- Query insights: Real-time performance monitoring
- Migration: Use DMS from Cloud SQL or on-prem PostgreSQL
- Exam Clue: “PostgreSQL + better performance + analytics” โ AlloyDB
When AlloyDB over Cloud SQL: - Need high performance (OLTP + OLAP) - Running analytics on transactional data - Using vector embeddings (pgvector) - Large PostgreSQL databases
Cloud Spanner¶
- What: Globally distributed, horizontally scalable RDBMS
- CAP Theorem: CP (Consistency + Partition tolerance)
- Consistency: Strong consistency (external consistency)
- Scaling: Horizontal (add nodes/processing units)
- Max Size: Unlimited (petabyte-scale)
- Availability:
- Multi-region: 99.999% SLA (5 nines!)
- Regional: 99.99% SLA
- Use Cases:
- Global applications (gaming, financial)
- Need > 1 TB with strong consistency
- Require horizontal scaling
- Cost: Most expensive SQL option
- Exam Clue: “Global consistency, 99.999% SLA, > 1 TB” โ Spanner
Key Concepts: - Nodes: 2 TB storage + 10,000 QPS per node (approximate) - Processing Units: 1000 PU = 1 node (granular scaling) - Interleaved tables: Parent-child rows co-located for performance - Primary key design: Avoid hotspots (don’t use sequential IDs)
When Spanner over Cloud SQL: - Need > 1 TB AND horizontal scaling - Global application requiring consistency - Need 99.999% availability - Can’t tolerate replication lag
๐ฆ NoSQL Databases¶
Firestore (Native Mode)¶
- Type: Document database (NoSQL)
- Use Cases:
- Mobile/web applications
- Real-time synchronization
- Offline support
- Serverless backends
- Structure: Collections โ Documents โ Fields
- Queries: SQL-like queries on documents
- Limits:
- Document size: 1 MB max
- Write rate: 1 write/sec per document
- Query depth: 100 levels
- Pricing: Pay per read/write/delete operation
- Exam Clue: “Mobile app, real-time, offline” โ Firestore
vs Datastore (Legacy): - Firestore is the successor (don’t choose Datastore on exam) - Firestore has better querying, real-time updates - Can migrate Datastore โ Firestore
Bigtable¶
- Type: Wide-column NoSQL (like HBase/Cassandra)
- Use Cases:
- Time-series data (IoT sensors, stock prices)
- High-throughput analytics
- Large-scale data (> 1 TB)
- Low-latency reads/writes (< 10ms)
- Not For:
- < 1 TB data (not cost-effective)
- Transactional workloads (no ACID)
- Complex queries (no joins, limited filtering)
- Structure: Tables โ Row keys โ Column families โ Columns
- Scaling: Linear (add nodes for more throughput)
- Replication: Up to 8 clusters (sync or async)
- Storage: HDD (cheaper, higher latency) or SSD (faster, more expensive)
- Exam Clue: “IoT, time-series, petabyte-scale, low latency” โ Bigtable
Key Design Principles: - Row key design: Critical for performance (avoid hotspots) - Tall and narrow: Many rows, few columns - Column families: Group related columns - Time-series: Reverse timestamp in row key for latest data first
Performance: - Throughput: ~10,000 QPS per node (depends on workload) - Latency: < 10ms for reads/writes (SSD) - Scaling: Add nodes (linear scaling)
๐ Analytics & Data Warehouse¶
BigQuery¶
- Type: Serverless data warehouse (analytics)
- Use Cases:
- Data analytics and BI
- Large-scale data analysis (petabyte-scale)
- Machine learning (BQML)
- Log analysis
- Key Features:
- Serverless: No infrastructure management
- Separation: Storage and compute decoupled
- SQL: Standard SQL (ANSI SQL 2011)
- Partitioning: Date/timestamp/integer partitioning
- Clustering: Sort data within partitions
- Not For:
- Transactional workloads (OLTP)
- Low-latency point queries
- Frequent small updates
- Pricing:
- On-demand: $5/TB queried
- Flat-rate: Reserved slots (predictable cost)
- Storage: $0.02/GB/month (active), $0.01/GB/month (long-term)
Performance Optimization: - Partitioning: Reduce data scanned - Clustering: Improve query performance (up to 4 columns) - Materialized views: Pre-compute aggregations - BI Engine: In-memory analysis (sub-second queries)
Exam Tips: - Cost optimization: Partition tables, limit query scope, use clustering - BigQuery ML: Train ML models with SQL - BigQuery Omni: Query data in AWS S3, Azure Blob Storage
๐ Migration Strategies¶
Database Migration Service (DMS)¶
- What: Managed service to migrate databases to Cloud SQL, AlloyDB
- Supported Sources:
- PostgreSQL โ Cloud SQL PostgreSQL, AlloyDB
- MySQL โ Cloud SQL MySQL
- SQL Server โ Cloud SQL SQL Server
- Oracle โ Cloud SQL PostgreSQL (via ora2pg)
- Migration Types:
- One-time: Single migration with downtime
- Continuous (CDC): Minimal downtime, replication
- Process:
- Create migration job
- Continuous replication (CDC)
- Cutover when ready
- Exam Clue: “Migrate database with minimal downtime” โ DMS
Migration Patterns¶
MySQL/PostgreSQL on-prem โ Cloud SQL
โโโ Small DB (< 100 GB): Export/import or DMS
โโโ Large DB (> 100 GB): DMS (continuous replication)
โโโ Need better performance: โ AlloyDB (via DMS)
SQL Server on-prem โ Cloud SQL SQL Server
โโโ Use DMS or native backup/restore
Oracle โ Cloud SQL PostgreSQL
โโโ ora2pg or partner tools (complex)
Need horizontal scaling โ Cloud Spanner
โโโ Schema redesign + application changes required
NoSQL migrations:
โโโ MongoDB โ Firestore (document model)
โโโ Cassandra/HBase โ Bigtable (wide-column)
โโโ Redis โ Memorystore (managed Redis)
๐ก๏ธ High Availability & Disaster Recovery¶
Cloud SQL HA¶
- Regional HA: Primary + standby in different zones
- Failover: < 60 seconds (automatic)
- Synchronous replication: No data loss
- SLA: 99.95% (HA configuration)
- Read Replicas: Cross-region for read scaling
- Async replication: Potential lag
- Use for: Read scaling, DR, geo-distribution
- Backup:
- Automated: Daily, 7-365 day retention
- On-demand: Manual backups
- PITR: Point-in-time recovery (up to 7 days)
AlloyDB HA¶
- Regional HA: Primary + standby (automatic)
- SLA: 99.99%
- Backup: Automated + on-demand, continuous PITR
- Read pools: Scale reads independently
Cloud Spanner HA¶
- Multi-region: 99.999% SLA (3+ regions)
- Regional: 99.99% SLA
- Replication: Synchronous (zero RPO)
- Failover: Automatic (no data loss)
- Backup: Scheduled backups, PITR (up to 7 days)
Firestore HA¶
- Multi-region: Automatic replication
- SLA: 99.999% (multi-region)
- No manual HA setup needed
Bigtable HA¶
- Replication: Up to 8 clusters
- SLA: 99.9% (single cluster), 99.99% (multi-cluster)
- Failover: Automatic (with app-profile routing)
- Backup: On-demand backups, scheduled backups
๐ Database Security¶
Authentication & Authorization¶
- Cloud SQL:
- Database users (MySQL/PostgreSQL users)
- Cloud SQL IAM authentication (PostgreSQL, MySQL 8.0+)
- Cloud SQL Auth Proxy (secure connection)
- AlloyDB: IAM authentication, database users
- Cloud Spanner: IAM roles (Fine-grained IAM)
- Firestore: Firebase Auth or Cloud Identity
- Bigtable: IAM roles only
- BigQuery: IAM roles, column-level security, row-level security
Network Security¶
- Private IP: VPC peering (Cloud SQL) or Private Service Connect
- Private Service Connect: Recommended for new deployments
- Authorized networks: IP allowlisting (public IP)
- Cloud SQL Auth Proxy: Encrypted connection, IAM auth
Encryption¶
- At rest: Encrypted by default (Google-managed keys)
- CMEK: Customer-managed encryption keys (Cloud KMS)
- In transit: TLS/SSL (enforced or optional)
Audit Logging¶
- Admin Activity: Who did what (enabled by default)
- Data Access: Who accessed what data (enable explicitly)
- Query logs: BigQuery, Cloud SQL (performance)
๐ Exam Decision Trees¶
Which Database?¶
Need SQL + < 64 TB + single region โ Cloud SQL
Need SQL + PostgreSQL + high performance โ AlloyDB
Need SQL + > 1 TB + global + horizontal scale โ Cloud Spanner
Need document database + real-time + mobile โ Firestore
Need time-series + IoT + > 1 TB + low latency โ Bigtable
Need analytics + petabyte-scale + serverless โ BigQuery
Cloud SQL vs AlloyDB vs Spanner¶
PostgreSQL + need better performance โ AlloyDB
PostgreSQL + analytics on transactional data โ AlloyDB (columnar engine)
Need > 64 TB + horizontal scaling โ Cloud Spanner
Need 99.999% SLA + global consistency โ Cloud Spanner
Traditional MySQL/PostgreSQL + < 64 TB โ Cloud SQL
Budget-conscious + moderate performance โ Cloud SQL
Firestore vs Bigtable¶
Document model + real-time sync โ Firestore
Mobile/web app + offline support โ Firestore
< 1 TB data + serverless โ Firestore
> 1 TB + time-series + IoT โ Bigtable
Need lowest latency (< 10ms) โ Bigtable
High-throughput writes (millions/sec) โ Bigtable
High Availability¶
Need 99.999% SLA โ Cloud Spanner (multi-region)
Need automatic failover + SQL โ Cloud SQL HA or AlloyDB
Need cross-region read scaling โ Cloud SQL read replicas
Need zero RPO (no data loss) โ Cloud Spanner or Cloud SQL HA
Need multi-region replication + NoSQL โ Firestore or Bigtable replication
Migration¶
MySQL/PostgreSQL + minimal downtime โ Database Migration Service
PostgreSQL + need better performance โ Migrate to AlloyDB via DMS
Oracle โ PostgreSQL โ ora2pg + Cloud SQL/AlloyDB
Need horizontal scaling after migration โ Cloud Spanner (redesign required)
SQL Server โ GCP โ Cloud SQL SQL Server (DMS or backup/restore)
โก Quick Reminders¶
Size Limits¶
- Cloud SQL: 64 TB (PostgreSQL/MySQL), 10 TB (SQL Server)
- AlloyDB: 64 TB (can request more)
- Cloud Spanner: Unlimited (petabyte-scale)
- Firestore: 1 MB per document
- Bigtable: 10 GB per cell (but don’t use cells that large)
- BigQuery: No storage limit
Performance¶
- Cloud SQL: Vertical scaling (up to 624 GB RAM)
- AlloyDB: 4x faster than Cloud SQL (transactional), 100x faster (analytical)
- Cloud Spanner: Linear horizontal scaling
- Firestore: 1 write/sec per document limit
- Bigtable: ~10,000 QPS per node, < 10ms latency
- BigQuery: Serverless, auto-scales
Cost Considerations¶
- Cheapest: Cloud SQL (for small databases)
- Most expensive: Cloud Spanner (but necessary for global scale)
- BigQuery: Pay per TB scanned (optimize with partitioning!)
- Bigtable: Not cost-effective for < 1 TB
- Firestore: Pay per operation (can get expensive with many writes)
Common Gotchas¶
- Cloud SQL: Limited to 64 TB, can’t scale horizontally
- AlloyDB: PostgreSQL ONLY (no MySQL, SQL Server)
- Cloud Spanner: Expensive, requires schema redesign (avoid hotspots)
- Firestore: Document size limit (1 MB), write rate limit (1/sec)
- Bigtable: No ACID transactions, no complex queries
- BigQuery: NOT for OLTP, costs scale with data scanned
๐ Troubleshooting Quick Checks¶
Cloud SQL connection issues: - โ Check authorized networks (if using public IP) - โ Verify Cloud SQL Auth Proxy configured correctly - โ Check private IP connectivity (VPC peering or PSC) - โ Verify IAM permissions (cloudsql.client role) - โ Check firewall rules allow connection
Cloud SQL high replication lag: - โ Check read replica is sized appropriately - โ Verify network connectivity between regions - โ Check for long-running transactions on primary - โ Consider upgrading instance size
Bigtable hotspotting: - โ Review row key design (avoid sequential keys) - โ Use reverse timestamp for time-series - โ Distribute writes across key space - โ Add nodes if underprovisioned
BigQuery slow queries: - โ Check if table is partitioned - โ Add clustering on commonly filtered columns - โ Use WHERE clauses to limit data scanned - โ Check query execution plan (EXPLAIN) - โ Consider materialized views for repeated queries
Firestore high costs: - โ Check for unnecessary reads (optimize client-side caching) - โ Review document listeners (don’t use on large collections) - โ Batch writes when possible - โ Use proper indexes (avoid composite index explosion)
๐ Pro Tips for Exam¶
- Read carefully: Look for keywords like “global”, “real-time”, “time-series”, “analytics”
- Scale requirements: < 64 TB โ Cloud SQL/AlloyDB; > 1 TB + horizontal โ Spanner
- Performance: “PostgreSQL + faster” โ AlloyDB; “globally consistent” โ Spanner
- Real-time: “Mobile + real-time sync” โ Firestore
- Analytics: “Petabyte-scale analytics” โ BigQuery; “OLTP + analytics” โ AlloyDB
- IoT/Time-series: “Low latency + high throughput + time-series” โ Bigtable
- Migration: “Minimal downtime” โ Database Migration Service (DMS)
- HA/DR: “99.999% SLA” โ Spanner multi-region; “automatic failover” โ Cloud SQL HA
- Cost: Spanner is expensive, Cloud SQL is cheap, BigQuery costs scale with data scanned
- Don’t overthink: If scenario matches common pattern, choose obvious answer
๐ฏ Memorization Shortcuts¶
Traditional SQL workload < 64 TB: Cloud SQL (MySQL/PostgreSQL/SQL Server)
PostgreSQL + need better performance: AlloyDB (4x faster)
Global app + strong consistency + > 1 TB: Cloud Spanner (99.999% SLA)
Mobile app + real-time + offline: Firestore (document database)
IoT sensors + time-series + low latency: Bigtable (< 10ms)
Analytics + petabyte-scale: BigQuery (serverless)
Migrate database with minimal downtime: Database Migration Service (DMS)
PostgreSQL + analytics on same data: AlloyDB (columnar engine)
Need 99.999% SLA: Cloud Spanner multi-region
Vector embeddings + semantic search: AlloyDB (pgvector) or Cloud SQL PostgreSQL
๐งช Scenario-Based Examples¶
Scenario 1: “Global gaming company needs database for player profiles, must handle millions of concurrent users across regions with strong consistency.” - Answer: Cloud Spanner (global, horizontally scalable, strong consistency)
Scenario 2: “IoT company collects sensor data from millions of devices, needs to store time-series data and query recent readings with < 10ms latency.” - Answer: Bigtable (time-series, low latency, high throughput)
Scenario 3: “E-commerce company wants to migrate PostgreSQL database from on-prem to GCP, needs better performance for analytics queries on transactional data.” - Answer: AlloyDB (PostgreSQL compatible, columnar engine for analytics)
Scenario 4: “Startup building mobile app needs real-time chat, offline support, and wants serverless database.” - Answer: Firestore (real-time sync, offline, serverless)
Scenario 5: “Finance company needs to analyze petabytes of transaction logs for fraud detection, budget is a concern.” - Answer: BigQuery (petabyte-scale analytics, pay per query, use partitioning to optimize costs)
Scenario 6: “Company migrating MySQL database with 20 TB of data, needs minimal downtime during migration.” - Answer: Database Migration Service with continuous replication (CDC)
Scenario 7: “Healthcare app needs HIPAA-compliant PostgreSQL database with automatic failover and CMEK encryption.” - Answer: Cloud SQL PostgreSQL HA + CMEK + Private IP
Scenario 8: “Retail company needs to run both OLTP and OLAP queries on same dataset without ETL pipeline.” - Answer: AlloyDB (HTAP with columnar engine)