Scaling Plan

Growth Phases

Phase	DAU	Monthly Events	Games in Catalog	Team Size
Phase 1: MVP	0 - 5K	< 10M	< 500	1-2
Phase 2: Growth	5K - 50K	10M - 100M	500 - 5K	2-4
Phase 3: Scale	50K - 500K	100M - 1B	5K - 50K	4-10
Phase 4: Platform	500K+	1B+	50K+	10+

Infrastructure

Phase 1: MVP — $0/mo (Free Tiers Only)

The MVP runs entirely on free tiers. No credit card required for most services.

┌───────────────────────────────────────────────────────────┐
│                     Vercel (Free Tier)                     │
│                                                           │
│  ┌──────────────────┐  ┌───────────────────────────────┐  │
│  │  Next.js SSR     │  │  Serverless API Routes        │  │
│  │  (frontend user  │  │  /api/v1/*  (public)           │  │
│  │   + admin)       │  │  /api/admin/* (protected)      │  │
│  └──────────────────┘  └───────────────────────────────┘  │
│              ↕ Edge CDN + automatic HTTPS                  │
└───────────────────────────────────────────────────────────┘
         │                              │
         ▼                              ▼
┌──────────────────┐         ┌──────────────────┐
│  Neon PostgreSQL │         │  Upstash Redis   │
│  (free tier)     │         │  (free tier)     │
└──────────────────┘         └──────────────────┘

Component	Service	Free Tier Limits	Cost
Frontend + API	Vercel (Hobby)	100GB bandwidth, 100K serverless invocations/mo, automatic HTTPS, edge CDN, preview deploys	$0
Database	Neon PostgreSQL	0.5 GB storage, 190 compute hours/mo, autoscaling to zero, branching	$0
	Alternative: Supabase	500 MB storage, 50K monthly active users, built-in auth	$0
Cache	Upstash Redis	10K commands/day, 256 MB storage, REST API (works with serverless)	$0
CDN	Cloudflare (DNS only)	DNS, DDoS protection, basic analytics. Vercel handles CDN for assets	$0
Error Tracking	Sentry (Developer)	5K errors/mo, 1 user, basic alerting	$0
Uptime	Betterstack (Free)	5 monitors, 3-min checks, email alerts	$0
Analytics	PostHog (Free)	1M events/mo, session replay, feature flags	$0

Total: $0/mo

Why Vercel for everything?

Next.js on Vercel gives you frontend SSR and serverless API routes in one deploy. No need for a separate backend container:

API Routes become serverless functions (cold start ~200ms, then fast)
SSR pages are cached at the edge automatically
Admin dashboard can be a separate Next.js app in the same monorepo or route group
Cron jobs (ranking, aggregation) use Vercel Cron (free tier: 2 cron jobs, daily minimum interval) — for hourly jobs, use Upstash QStash (free: 500 messages/day)

Free Tier Limits to Watch

Limit	Threshold	What Happens	Upgrade Path
Vercel invocations	100K/mo	Functions stop working	Vercel Pro ($20/mo, 1M invocations)
Neon compute	190 hours/mo	DB sleeps after limit	Neon Launch ($19/mo, 300 hours)
Neon storage	0.5 GB	Can’t insert more data	Neon Launch ($19/mo, 10 GB)
Upstash commands	10K/day	Commands rejected	Upstash Pay-as-you-go (~$0.2/100K)
PostHog events	1M/mo	Events dropped	PostHog free is generous, rarely hit at MVP

Realistic timeline: Free tiers comfortably support 0 - 2K DAU. At ~2-5K DAU you’ll likely hit Neon compute or Vercel invocation limits first. Budget ~$40-60/mo for the first paid tier jump.

Why this works: At < 5K DAU, serverless handles all traffic without paying for idle compute. Neon auto-scales to zero when nobody is playing (nights). Redis caches hot data (game lists, categories). Cold starts are acceptable since game pages are SSR-cached at the edge.

Phase 2: Growth

Migration triggers:

API response time p95 > 500ms
Database CPU consistently > 60%
Event write throughput > 1K/sec sustained

┌────────────┐     ┌─────────────────────────────────────┐
│    CDN     │────>│          Load Balancer               │
│ (static +  │     └──────┬──────────────┬───────────────┘
│  caching)  │            │              │
└────────────┘     ┌──────▼─────┐  ┌─────▼──────┐
                   │ Backend x2 │  │ Backend x2 │
                   │ (user API) │  │ (admin API)│
                   └──────┬─────┘  └─────┬──────┘
                          │              │
                   ┌──────▼──────────────▼──────┐
                   │       PostgreSQL            │
                   │  Primary + Read Replica     │
                   └──────┬─────────────────────┘
                          │
                   ┌──────▼──────┐
                   │ Redis       │
                   │ (dedicated) │
                   └─────────────┘

Change	What	Why
Horizontal API	2+ instances per backend behind a load balancer	Handle concurrent requests
Read replica	PostgreSQL read replica for metric queries	Offload analytics from write path
Dedicated Redis	Separate Redis instance with more memory	Cache game lists, search results, computed metrics
CDN caching	Cache game list API responses (30s TTL)	Reduce backend load for hot pages
Background workers	Separate process for metric aggregation	Don’t block API with hourly jobs

Cost: ~$200-500/mo

Phase 3: Scale

Migration triggers:

Event volume > 10K/sec
PostgreSQL event table > 500GB
Metric query time > 2s
Need real-time or near-real-time ranking updates

┌────────────┐     ┌──────────────┐
│    CDN     │────>│    LB        │
└────────────┘     └──┬────────┬──┘
                      │        │
               ┌──────▼──┐ ┌──▼──────┐
               │User API │ │Admin API│
               │  x4     │ │  x2     │
               └──┬──────┘ └──┬──────┘
                  │           │
     ┌────────────▼───────────▼────────────┐
     │          PostgreSQL Cluster          │
     │  Primary + 2 Read Replicas          │
     │  (game data, user data, sessions)   │
     └────────────────────────────────────┘
                  │
     ┌────────────▼────────────┐
     │     Event Pipeline      │
     │  Kafka/Redpanda ──────> ClickHouse  │
     │  (stream)        (OLAP store)       │
     └────────────────────────────────────┘
                  │
     ┌────────────▼────────────┐
     │     Redis Cluster       │
     │  (cache + rate limits)  │
     └─────────────────────────┘

Change	What	Why
Event streaming	Kafka/Redpanda between API and event store	Decouple ingestion from storage, handle bursts
ClickHouse	Columnar OLAP database for events and metrics	10-100x faster aggregation than PostgreSQL for time-series
PostgreSQL focus	Keep PostgreSQL for game catalog, user data, admin state only	Let each database do what it’s best at
Redis cluster	Clustered Redis for distributed caching	Handle cache volume across multiple API instances
Search	Meilisearch or Typesense	Dedicated search engine for autocomplete and full-text

Cost: ~$1,000-3,000/mo

Phase 4: Platform

Migration triggers:

Multiple geographic regions needed
Dev portal with third-party traffic
Multiple teams with independent release cycles

Change	What
Multi-region	Deploy API + cache in 2-3 regions, single DB with global read replicas
Multi-repo	Split monorepo by team boundary (public, admin, dev-portal)
API gateway	Centralized gateway for rate limiting, auth, routing
Event bus	Shared event bus for cross-service communication
Object storage	S3/R2 for game thumbnails, icons (off CDN origin)
Monitoring	Full observability stack (Grafana, Prometheus/VictoriaMetrics, distributed tracing)

Database Scaling

PostgreSQL (Game Data)

Phase	Setup	Connection Limit
MVP	Single managed instance	20-50
Growth	Primary + 1 read replica	100
Scale	Primary + 2 read replicas, connection pooling (PgBouncer)	500+

Event Store

Phase	Store	Capacity
MVP	PostgreSQL table (partitioned by month)	< 50M events
Growth	PostgreSQL with aggressive partitioning + archival	50M - 500M
Scale	ClickHouse (columnar, compressed)	500M+ events, sub-second aggregation

Partition / Retention Strategy

Data	MVP Retention	Growth	Scale
Raw events	90 days	90 days	30 days (cold archive to S3)
Daily metrics	Indefinite	Indefinite	Indefinite
Hourly metrics	N/A	30 days	7 days
Session records	30 days	30 days	7 days

Caching Strategy

Cache Layers

Layer	What	TTL	Invalidation
CDN	Game list pages, category pages	30s - 60s	Stale-while-revalidate
Redis - Hot	Top 100 games (ranked lists)	5 min	On rank recalculation
Redis - Warm	Category game lists	5 min	On rank recalculation
Redis - Sessions	Event batching dedup	60s	Auto-expire
In-process	Category list, config	5 min	On deploy

Cache Key Patterns

games:home:{platform}:page:{n}          # Home page results
games:category:{slug}:{platform}:page:{n} # Category results
game:{slug}                              # Single game detail
categories:all                           # Category list
search:{query}:{type}                    # Search results (short TTL)
metrics:game:{id}:{range}:{platform}     # Cached metric snapshots

Ranking Recalculation

Phase	Frequency	Method
MVP	Every 6 hours	Cron job, full recalculation
Growth	Every 1 hour	Background worker, incremental update
Scale	Every 15 minutes	Streaming from ClickHouse, delta-based

Rankings are pre-computed and cached, not calculated on every request.

Search Scaling

Phase	Engine	Capacity
MVP	PostgreSQL `ILIKE` + `tsvector`	< 5K games, good enough
Growth	Meilisearch (self-hosted or cloud)	Instant autocomplete, typo tolerance, faceting
Scale	Meilisearch cluster or Typesense	Multi-index (games, categories, authors)

Static Assets & Media

Phase	Storage	Delivery
MVP	Game icons stored as URLs (from broker)	Broker CDN serves directly
Growth	Copy icons to S3/R2 (own storage)	Cloudflare CDN in front
Scale	Full asset pipeline with resizing	Multiple sizes (32, 64, 128, 256px) auto-generated

Monitoring & Observability

Phase	Tools
MVP	Application logs + error tracking (Sentry free tier). Uptime monitoring (Betterstack free tier)
Growth	Add APM (response times, slow queries). Database monitoring. Structured logging
Scale	Full stack: metrics (Prometheus/Grafana), distributed tracing, log aggregation, alerting

Key Metrics to Watch

Metric	Warning	Critical
API p95 latency	> 300ms	> 1s
Event ingestion lag	> 30s	> 5min
Database CPU	> 60%	> 85%
Cache hit rate	< 80%	< 60%
Error rate (5xx)	> 0.5%	> 2%
Disk usage	> 70%	> 90%

Cost Projection

Phase	Infrastructure	Notes
MVP	$0/mo	Free tiers only (Vercel, Neon, Upstash, Cloudflare, Sentry, Betterstack, PostHog)
Growth	~$200-500/mo	Dedicated instances, read replica
Scale	~$1,000-3,000/mo	ClickHouse, Kafka, search, multi-instance
Platform	~$5,000-15,000/mo	Multi-region, full observability, dedicated search

These are infrastructure costs only and do not include domain, CDN bandwidth overages, or third-party SaaS tools. First paid tier jump (~$40-60/mo) expected around 2-5K DAU.