Skip to content

Scaling Plan


Growth Phases

PhaseDAUMonthly EventsGames in CatalogTeam Size
Phase 1: MVP0 - 5K< 10M< 5001-2
Phase 2: Growth5K - 50K10M - 100M500 - 5K2-4
Phase 3: Scale50K - 500K100M - 1B5K - 50K4-10
Phase 4: Platform500K+1B+50K+10+

Infrastructure

Phase 1: MVP — $0/mo (Free Tiers Only)

The MVP runs entirely on free tiers. No credit card required for most services.

┌───────────────────────────────────────────────────────────┐
│ Vercel (Free Tier) │
│ │
│ ┌──────────────────┐ ┌───────────────────────────────┐ │
│ │ Next.js SSR │ │ Serverless API Routes │ │
│ │ (frontend user │ │ /api/v1/* (public) │ │
│ │ + admin) │ │ /api/admin/* (protected) │ │
│ └──────────────────┘ └───────────────────────────────┘ │
│ ↕ Edge CDN + automatic HTTPS │
└───────────────────────────────────────────────────────────┘
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Neon PostgreSQL │ │ Upstash Redis │
│ (free tier) │ │ (free tier) │
└──────────────────┘ └──────────────────┘
ComponentServiceFree Tier LimitsCost
Frontend + APIVercel (Hobby)100GB bandwidth, 100K serverless invocations/mo, automatic HTTPS, edge CDN, preview deploys$0
DatabaseNeon PostgreSQL0.5 GB storage, 190 compute hours/mo, autoscaling to zero, branching$0
Alternative: Supabase500 MB storage, 50K monthly active users, built-in auth$0
CacheUpstash Redis10K commands/day, 256 MB storage, REST API (works with serverless)$0
CDNCloudflare (DNS only)DNS, DDoS protection, basic analytics. Vercel handles CDN for assets$0
Error TrackingSentry (Developer)5K errors/mo, 1 user, basic alerting$0
UptimeBetterstack (Free)5 monitors, 3-min checks, email alerts$0
AnalyticsPostHog (Free)1M events/mo, session replay, feature flags$0

Total: $0/mo

Why Vercel for everything?

Next.js on Vercel gives you frontend SSR and serverless API routes in one deploy. No need for a separate backend container:

  • API Routes become serverless functions (cold start ~200ms, then fast)
  • SSR pages are cached at the edge automatically
  • Admin dashboard can be a separate Next.js app in the same monorepo or route group
  • Cron jobs (ranking, aggregation) use Vercel Cron (free tier: 2 cron jobs, daily minimum interval) — for hourly jobs, use Upstash QStash (free: 500 messages/day)

Free Tier Limits to Watch

LimitThresholdWhat HappensUpgrade Path
Vercel invocations100K/moFunctions stop workingVercel Pro ($20/mo, 1M invocations)
Neon compute190 hours/moDB sleeps after limitNeon Launch ($19/mo, 300 hours)
Neon storage0.5 GBCan’t insert more dataNeon Launch ($19/mo, 10 GB)
Upstash commands10K/dayCommands rejectedUpstash Pay-as-you-go (~$0.2/100K)
PostHog events1M/moEvents droppedPostHog free is generous, rarely hit at MVP

Realistic timeline: Free tiers comfortably support 0 - 2K DAU. At ~2-5K DAU you’ll likely hit Neon compute or Vercel invocation limits first. Budget ~$40-60/mo for the first paid tier jump.

Why this works: At < 5K DAU, serverless handles all traffic without paying for idle compute. Neon auto-scales to zero when nobody is playing (nights). Redis caches hot data (game lists, categories). Cold starts are acceptable since game pages are SSR-cached at the edge.


Phase 2: Growth

Migration triggers:

  • API response time p95 > 500ms
  • Database CPU consistently > 60%
  • Event write throughput > 1K/sec sustained
┌────────────┐ ┌─────────────────────────────────────┐
│ CDN │────>│ Load Balancer │
│ (static + │ └──────┬──────────────┬───────────────┘
│ caching) │ │ │
└────────────┘ ┌──────▼─────┐ ┌─────▼──────┐
│ Backend x2 │ │ Backend x2 │
│ (user API) │ │ (admin API)│
└──────┬─────┘ └─────┬──────┘
│ │
┌──────▼──────────────▼──────┐
│ PostgreSQL │
│ Primary + Read Replica │
└──────┬─────────────────────┘
┌──────▼──────┐
│ Redis │
│ (dedicated) │
└─────────────┘
ChangeWhatWhy
Horizontal API2+ instances per backend behind a load balancerHandle concurrent requests
Read replicaPostgreSQL read replica for metric queriesOffload analytics from write path
Dedicated RedisSeparate Redis instance with more memoryCache game lists, search results, computed metrics
CDN cachingCache game list API responses (30s TTL)Reduce backend load for hot pages
Background workersSeparate process for metric aggregationDon’t block API with hourly jobs

Cost: ~$200-500/mo


Phase 3: Scale

Migration triggers:

  • Event volume > 10K/sec
  • PostgreSQL event table > 500GB
  • Metric query time > 2s
  • Need real-time or near-real-time ranking updates
┌────────────┐ ┌──────────────┐
│ CDN │────>│ LB │
└────────────┘ └──┬────────┬──┘
│ │
┌──────▼──┐ ┌──▼──────┐
│User API │ │Admin API│
│ x4 │ │ x2 │
└──┬──────┘ └──┬──────┘
│ │
┌────────────▼───────────▼────────────┐
│ PostgreSQL Cluster │
│ Primary + 2 Read Replicas │
│ (game data, user data, sessions) │
└────────────────────────────────────┘
┌────────────▼────────────┐
│ Event Pipeline │
│ Kafka/Redpanda ──────> ClickHouse │
│ (stream) (OLAP store) │
└────────────────────────────────────┘
┌────────────▼────────────┐
│ Redis Cluster │
│ (cache + rate limits) │
└─────────────────────────┘
ChangeWhatWhy
Event streamingKafka/Redpanda between API and event storeDecouple ingestion from storage, handle bursts
ClickHouseColumnar OLAP database for events and metrics10-100x faster aggregation than PostgreSQL for time-series
PostgreSQL focusKeep PostgreSQL for game catalog, user data, admin state onlyLet each database do what it’s best at
Redis clusterClustered Redis for distributed cachingHandle cache volume across multiple API instances
SearchMeilisearch or TypesenseDedicated search engine for autocomplete and full-text

Cost: ~$1,000-3,000/mo


Phase 4: Platform

Migration triggers:

  • Multiple geographic regions needed
  • Dev portal with third-party traffic
  • Multiple teams with independent release cycles
ChangeWhat
Multi-regionDeploy API + cache in 2-3 regions, single DB with global read replicas
Multi-repoSplit monorepo by team boundary (public, admin, dev-portal)
API gatewayCentralized gateway for rate limiting, auth, routing
Event busShared event bus for cross-service communication
Object storageS3/R2 for game thumbnails, icons (off CDN origin)
MonitoringFull observability stack (Grafana, Prometheus/VictoriaMetrics, distributed tracing)

Database Scaling

PostgreSQL (Game Data)

PhaseSetupConnection Limit
MVPSingle managed instance20-50
GrowthPrimary + 1 read replica100
ScalePrimary + 2 read replicas, connection pooling (PgBouncer)500+

Event Store

PhaseStoreCapacity
MVPPostgreSQL table (partitioned by month)< 50M events
GrowthPostgreSQL with aggressive partitioning + archival50M - 500M
ScaleClickHouse (columnar, compressed)500M+ events, sub-second aggregation

Partition / Retention Strategy

DataMVP RetentionGrowthScale
Raw events90 days90 days30 days (cold archive to S3)
Daily metricsIndefiniteIndefiniteIndefinite
Hourly metricsN/A30 days7 days
Session records30 days30 days7 days

Caching Strategy

Cache Layers

LayerWhatTTLInvalidation
CDNGame list pages, category pages30s - 60sStale-while-revalidate
Redis - HotTop 100 games (ranked lists)5 minOn rank recalculation
Redis - WarmCategory game lists5 minOn rank recalculation
Redis - SessionsEvent batching dedup60sAuto-expire
In-processCategory list, config5 minOn deploy

Cache Key Patterns

games:home:{platform}:page:{n} # Home page results
games:category:{slug}:{platform}:page:{n} # Category results
game:{slug} # Single game detail
categories:all # Category list
search:{query}:{type} # Search results (short TTL)
metrics:game:{id}:{range}:{platform} # Cached metric snapshots

Ranking Recalculation

PhaseFrequencyMethod
MVPEvery 6 hoursCron job, full recalculation
GrowthEvery 1 hourBackground worker, incremental update
ScaleEvery 15 minutesStreaming from ClickHouse, delta-based

Rankings are pre-computed and cached, not calculated on every request.


Search Scaling

PhaseEngineCapacity
MVPPostgreSQL ILIKE + tsvector< 5K games, good enough
GrowthMeilisearch (self-hosted or cloud)Instant autocomplete, typo tolerance, faceting
ScaleMeilisearch cluster or TypesenseMulti-index (games, categories, authors)

Static Assets & Media

PhaseStorageDelivery
MVPGame icons stored as URLs (from broker)Broker CDN serves directly
GrowthCopy icons to S3/R2 (own storage)Cloudflare CDN in front
ScaleFull asset pipeline with resizingMultiple sizes (32, 64, 128, 256px) auto-generated

Monitoring & Observability

PhaseTools
MVPApplication logs + error tracking (Sentry free tier). Uptime monitoring (Betterstack free tier)
GrowthAdd APM (response times, slow queries). Database monitoring. Structured logging
ScaleFull stack: metrics (Prometheus/Grafana), distributed tracing, log aggregation, alerting

Key Metrics to Watch

MetricWarningCritical
API p95 latency> 300ms> 1s
Event ingestion lag> 30s> 5min
Database CPU> 60%> 85%
Cache hit rate< 80%< 60%
Error rate (5xx)> 0.5%> 2%
Disk usage> 70%> 90%

Cost Projection

PhaseInfrastructureNotes
MVP$0/moFree tiers only (Vercel, Neon, Upstash, Cloudflare, Sentry, Betterstack, PostHog)
Growth~$200-500/moDedicated instances, read replica
Scale~$1,000-3,000/moClickHouse, Kafka, search, multi-instance
Platform~$5,000-15,000/moMulti-region, full observability, dedicated search

These are infrastructure costs only and do not include domain, CDN bandwidth overages, or third-party SaaS tools. First paid tier jump (~$40-60/mo) expected around 2-5K DAU.