Cloud Storage Overview¶
Description¶
Cloud Storage is Google Cloud’s object storage service for storing and retrieving unstructured data at any scale. Understanding when to use object storage versus block storage (Persistent Disks) or file storage (Filestore) is fundamental to cloud architecture.
Key Principle: Object storage is for massive-scale, infrequently changing data accessed via HTTP/S; not for databases or file systems requiring POSIX operations.
Core Concepts¶
Object Storage Model¶
Architecture:
- Buckets: Top-level containers for objects
- Objects: Individual files with metadata
- Flat namespace: No directory structure (though prefixes simulate folders)
- HTTP/S access: RESTful API, not filesystem mount
Characteristics:
- Unlimited capacity
- Strongly consistent (read-after-write)
- Eventually consistent for IAM/ACL changes
- Atomic operations
- No minimum object size
- Maximum object size: 5 TB
Buckets¶
Bucket Properties:
- Globally unique name
- Geographic location (region, dual-region, multi-region)
- Storage class (default for objects)
- Access control method (IAM, ACL, or both)
- Lifecycle policies
- Versioning settings
Immutable Properties (cannot change after creation):
- Bucket name
- Location type (region/dual-region/multi-region)
- Specific location
Architecture Impact: Location and name decisions are permanent; plan carefully
Objects¶
Object Components:
- Data (the file content)
- Metadata (key-value pairs)
- Access control (IAM, ACLs)
- Generation number (versioning)
Object Metadata:
- System metadata (size, content-type, timestamps)
- Custom metadata (user-defined key-value pairs)
- Object generation (version identifier)
- Metageneration (metadata version)
Durability and Availability¶
Durability: 99.999999999% (11 9’s)¶
What This Means:
- Store 10 million objects
- Expect to lose 1 object every 10,000 years
- Designed for no data loss
- Automatic replication and erasure coding
Architecture Implication: Cloud Storage is more durable than any self-managed solution
Availability Varies by Location¶
Multi-Region: 99.95% SLA
- Data in at least two geographic locations
-
160 km apart
- Automatic failover
- Use for: Global applications, highest availability
Dual-Region: 99.95% SLA
- Data in two specific regions
- Choose regions for latency/compliance
- Balance between cost and availability
- Use for: Regional applications with HA requirements
Region: 99.9% SLA
- Data in single region (across 3 zones)
- Lower cost than multi/dual-region
- Lower availability than multi/dual-region
- Use for: Regional data, cost optimization
Architecture Decision: Balance availability requirements with cost
Strong Consistency¶
Read-After-Write Consistency¶
Guarantees:
- Object immediately available after write
- Listing immediately shows new objects
- Deletes immediately reflected
- No eventual consistency delays
Architectural Benefits:
- Simplifies application logic
- No need to handle stale reads
- Reliable for build artifacts and CI/CD
- Safe for concurrent access patterns
Exceptions:
- IAM changes: Up to 60 seconds
- Bucket configuration: Up to 60 seconds
When to Use Cloud Storage¶
✅ Use Cloud Storage When:¶
Unstructured Data Storage:
- Media files (images, videos, audio)
- Document storage
- Backups and archives
- Log files and analytics data
- Build artifacts and binaries
Static Website Hosting:
- HTML, CSS, JavaScript files
- Static assets
- Public documentation
- Download repositories
Data Lake / Analytics:
- Raw data ingestion
- Data warehouse staging
- BigQuery external tables
- Dataflow input/output
Backup and Archive:
- Database backups
- VM images and snapshots (backend storage)
- Long-term archival
- Compliance data retention
Content Distribution:
- Cloud CDN origin
- Software distribution
- Global asset delivery
- User-generated content
❌ Don’t Use Cloud Storage When:¶
Database Storage:
- Use Cloud SQL, Spanner, Firestore
- Object storage not optimized for transactional data
- No query language support
- Not ACID compliant
File System Requirements:
- Applications expecting POSIX filesystem
- Random writes within files
- File locking mechanisms
- Use Filestore (NFS) instead
Block Storage for VMs:
- VM boot disks
- Database data files
- High-IOPS applications
- Use Persistent Disks instead
Real-Time Streaming:
- Message queues
- Real-time event streaming
- Use Pub/Sub instead
Frequent Small Updates:
- Collaborative editing
- Append operations
- Partial object updates
- Use database or Firestore
Cloud Storage vs Alternatives¶
Cloud Storage vs Persistent Disks¶
| Feature | Cloud Storage | Persistent Disk |
|---|---|---|
| Access Method | HTTP/S API | Block device (mounted) |
| Use Case | Unstructured objects | VM storage, databases |
| Performance | High throughput | High IOPS |
| Capacity | Unlimited | Up to 64 TB per disk |
| Cost | Based on storage class | Based on disk type |
| Attachment | Any number of clients | Limited per VM |
| Snapshot | Object versioning | Incremental snapshots |
Decision: Use Cloud Storage for objects, Persistent Disks for databases and VMs
Cloud Storage vs Filestore¶
| Feature | Cloud Storage | Filestore |
|---|---|---|
| Protocol | HTTP/S | NFS |
| Consistency | Object-level | POSIX filesystem |
| Use Case | Object storage | Shared file storage |
| Scale | Unlimited | Up to 100 TB |
| Performance | Throughput optimized | IOPS optimized |
| Mounting | API/SDK | NFS mount |
Decision: Use Cloud Storage for objects, Filestore for POSIX filesystem requirements
Cloud Storage vs Cloud SQL¶
| Feature | Cloud Storage | Cloud SQL |
|---|---|---|
| Data Model | Objects | Relational tables |
| Query | List/Get by name | SQL queries |
| Transactions | None | ACID compliant |
| Use Case | Unstructured data | Structured data |
| Consistency | Strong per object | Transactional |
Decision: Use Cloud Storage for files, Cloud SQL for relational data
Location Types¶
Multi-Region¶
Characteristics:
- At least two geographic areas (>160 km apart)
- Highest availability (99.95%)
- Geo-redundant
- Higher cost
Available Multi-Regions:
- US (multiple US locations)
- EU (multiple Europe locations)
- ASIA (multiple Asia locations)
Use Cases:
- Global applications
- Highest availability requirements
- Content distribution
- Disaster recovery
Cost: Highest storage cost, no egress within multi-region
Dual-Region¶
Characteristics:
- Two specific regions
- 99.95% availability
- Geo-redundant
- Choose regions for compliance/latency
- Turbo replication option (async replication <15 min)
Examples:
- NAM4 (Iowa + South Carolina)
- EUR4 (Netherlands + Finland)
Use Cases:
- Regional applications with HA
- Data residency requirements
- Balance cost and availability
- Specific latency requirements
Cost: Between multi-region and region
Region¶
Characteristics:
- Single region (3 zones)
- 99.9% availability
- Zone-redundant within region
- Lowest cost
Use Cases:
- Regional applications
- Cost optimization
- Data locality requirements
- Compute in same region (lower latency, no egress)
Cost: Lowest storage cost
Access Control¶
IAM (Recommended)¶
Characteristics:
- Bucket-level and project-level permissions
- Fine-grained roles
- Condition-based access
- Integration with organization policies
- Audit logging
Roles:
roles/storage.objectViewer: Read objectsroles/storage.objectCreator: Create objectsroles/storage.objectAdmin: Full object controlroles/storage.admin: Full bucket control
Use Cases:
- Modern applications
- Service-to-service access
- Centralized access control
- Conditional access policies
ACLs (Legacy)¶
Characteristics:
- Object-level and bucket-level
- Simpler but less flexible
- Compatible with S3 ACLs
- Being phased out
Use Cases:
- Legacy applications
- S3 compatibility requirements
- Specific object-level permissions
- Not recommended for new applications
Uniform Bucket-Level Access¶
Recommendation: Enable uniform bucket-level access (IAM only, no ACLs)
Benefits:
- Simplified permission model
- Better security
- Easier auditing
- Organization policy enforcement
Architecture Decision: Use IAM with uniform bucket-level access for all new buckets
Signed URLs¶
Use Cases¶
Temporary Access:
- Time-limited access to private objects
- No authentication required
- Share with external users
- Download links with expiration
Architecture Pattern:
Application (with credentials) → Generate signed URL → Share URL → User downloads directly
Benefits:
- No proxy through application server
- Direct download from Cloud Storage
- Reduced infrastructure costs
- Better performance
Time Limits:
- Maximum 7 days (service account key)
- Maximum 12 hours (user credentials)
- Set appropriate expiration for use case
Performance Characteristics¶
Throughput¶
Capabilities:
- Multi-gigabit per second throughput
- Scales with parallel requests
- No bottleneck for large files
- Optimized for bandwidth
Optimization:
- Parallel uploads for large files
- Multiple threads/workers
- Composite uploads for >32 MB files
Request Rate Limits¶
Bucket Limits (sustained):
- 5,000 writes per second
- 50,000 reads per second
Object Limits:
- 1,000 operations per second per object
Architecture Impact:
- Design to avoid hotspots (single object)
- Use object prefixes for distribution
- Consider request rate in architecture
Latency¶
Typical Latency:
- Single-digit milliseconds (same region)
- Tens of milliseconds (cross-region)
- Sub-second for first byte
Optimizations:
- Collocate compute and storage
- Use Cloud CDN for global access
- Minimize request overhead
Cost Model¶
Storage Costs¶
Pricing by Class (per GB/month):
- Standard: ~$0.020
- Nearline: ~$0.010
- Coldline: ~$0.004
- Archive: ~$0.0012
Variations:
- Multi-region: Higher than region
- Region: Lowest
- Dual-region: Between multi and region
Operation Costs¶
Class A Operations (writes, lists): ~$0.05 per 10,000
- Insert, update, list
- Lifecycle transitions
- Composition operations
Class B Operations (reads): ~$0.004 per 10,000
- Get object
- Get metadata
Free Operations:
- Delete
- Get bucket metadata (non-object)
Network Costs¶
Egress (data out):
- Same location (region/multi-region): Free
- Cross-region: ~$0.01-$0.12 per GB
- To internet: ~$0.12 per GB (first GB free)
- GCP services same region: Free
- GCP services cross-region: Charged
Ingress (data in): Free
Retrieval Costs (Non-Standard Classes)¶
Nearline: ~$0.01 per GB retrieved Coldline: ~$0.02 per GB retrieved Archive: ~$0.05 per GB retrieved
Architecture Impact: Factor retrieval costs into storage class selection
Integration Patterns¶
With Compute Services¶
Compute Engine: Upload/download via gsutil or API GKE: Mount via Cloud Storage FUSE or S3 API Cloud Functions: Triggered by object changes, direct access Cloud Run: Access via client libraries
With Data Services¶
BigQuery: External tables, import/export Dataflow: Source/sink for pipelines Dataproc: HDFS replacement, job I/O Composer: DAG storage, data staging
With ML/AI¶
Vertex AI: Training data, model storage AutoML: Dataset storage AI Platform: Job input/output
With Migration Services¶
Transfer Service: Online data transfer Transfer Appliance: Offline data transfer Database Migration Service: Backup storage
Security Considerations¶
Encryption¶
At Rest (default):
- Google-managed encryption keys
- Automatic for all data
- No configuration needed
Customer-Managed Keys (CMEK):
- Cloud KMS integration
- Customer controls key lifecycle
- Compliance requirements
- Audit key usage
Customer-Supplied Keys (CSEK):
- Customer provides keys
- Google doesn’t store keys
- More operational overhead
- Enhanced security
In Transit¶
HTTPS Only:
- TLS encryption automatic
- No unencrypted option for API
- Best practice for all access
Bucket Policies¶
Organization Policies:
- Enforce public access prevention
- Require uniform access
- Location restrictions
- Domain restrictions
VPC Service Controls:
- Perimeter-based access
- Prevent data exfiltration
- Additional security layer
Compliance and Governance¶
Data Residency¶
Control via Location:
- Choose specific region
- Dual-region for specific pairs
- Multi-region for broad geography
- Data stays in chosen location
Retention Policies¶
Bucket-Level Retention:
- Minimum retention period
- Objects cannot be deleted before period
- Locked policies cannot be reduced
- Compliance use cases (regulatory requirements)
Object Holds¶
Types:
- Event-based hold: Until event clears
- Temporary hold: Manual hold/release
Use Cases:
- Legal hold
- Investigation period
- Compliance requirements
Exam Focus Areas¶
Design Decisions¶
- When to use Cloud Storage vs alternatives
- Storage class selection criteria
- Location type selection (region/dual-region/multi-region)
- Access control method (IAM vs ACL)
Cost Optimization¶
- Storage class economics
- Network egress patterns
- Operation cost implications
- Lifecycle management
Performance¶
- Request rate limits
- Throughput optimization
- Latency considerations
- Parallel operations
Security¶
- Encryption options
- Access control patterns
- Signed URLs use cases
- VPC Service Controls
Integration¶
- BigQuery external tables
- Dataflow pipelines
- Cloud Functions triggers
- Content delivery (CDN)
Compliance¶
- Data residency controls
- Retention policies
- Object holds
- Audit logging