How I moved my Next.js app off Vercel and cut my hosting bill in half

LumifyHub ran on Vercel for about a year. Real-time chat, doc editing, databases, file uploads. Next.js 15 + Supabase + Clerk + TipTap. Not a landing page.

Vercel Pro was hitting $50/mo with constant deploys. Supabase Cloud Pro was $30/mo on top of that. $80/mo for a product with 136 signups and zero paying customers. I moved everything to two Hetzner boxes and got it down to ~$30.

The old stack

Vercel Pro ($50/mo) — hosting, CI/CD, Vercel Blob for file uploads
Supabase Cloud Pro ($30/mo) — Postgres, auth, realtime, storage
Cloudflare (free) — DNS only

The Vercel bill kept going up because every git push triggered a build. 5-10 pushes a day during active dev. Build minutes add up.

Supabase was $30 flat but I was using maybe 20% of what Pro includes. Same price whether you run 7 containers or 13.

The new stack

Hetzner cpx21 ($14/mo) — app server, 3 cores, 4GB RAM, Ashburn VA
Hetzner cpx21 ($14/mo) — database, self-hosted Supabase, same datacenter
Cloudflare (free) — DNS, CDN, SSL, R2 storage
Coolify (free) — self-hosted PaaS for Docker deploys

~$30/mo total. Both VMs in the same datacenter, connected over Tailscale. DB latency dropped from 10-50ms to under 1ms.

Part 1: Vercel Blob to R2

13 files importing @vercel/blob for uploads. Avatars, page attachments, team logos. I built a StorageProvider abstraction and started swapping them to @aws-sdk/client-s3 with R2 presigned URLs.

Didn’t finish all 13 at once. Got the critical paths migrated (avatars, attachments), the rest go through the abstraction layer so they hit R2 in production. A few old @vercel/blob imports are still sitting in the codebase. Works fine, just hasn’t been cleaned up.

R2’s zero egress fees are the real win here. Vercel Blob charges for bandwidth.

Part 2: Dockerizing Next.js

output: 'standalone' in next.config.ts gives you a self-contained Node server. No next start, no full node_modules. ~150MB image.

# Stage 1: deps
FROM oven/bun:1-alpine AS deps
WORKDIR /app
RUN apk add --no-cache libc6-compat
COPY package.json bun.lock ./
RUN bun install --frozen-lockfile

# Stage 2: build
FROM node:22-alpine AS builder
WORKDIR /app
COPY --from=deps /usr/local/bin/bun /usr/local/bin/bun
COPY --from=deps /app/node_modules ./node_modules
COPY . .

ENV NEXT_TELEMETRY_DISABLED=1
ENV NODE_ENV=production
ENV NODE_OPTIONS="--max-old-space-size=4096"

ARG GIT_SHA=unknown
ENV SENTRY_RELEASE=${GIT_SHA}

ARG ASSET_PREFIX=""
ENV ASSET_PREFIX=${ASSET_PREFIX}

RUN --mount=type=secret,id=env_file \
    set -a && . /run/secrets/env_file && set +a && \
    bun run build

# Stage 3: run
FROM node:22-alpine AS runner
WORKDIR /app
ENV NODE_ENV=production
ENV NEXT_TELEMETRY_DISABLED=1

RUN apk add --no-cache curl && \
    addgroup --system --gid 1001 nodejs && \
    adduser --system --uid 1001 nextjs

COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
COPY --from=builder /app/public ./public

RUN mkdir -p .next/cache && chown -R nextjs:nodejs .next/cache

USER nextjs
EXPOSE 3000
ENV PORT=3000
ENV HOSTNAME="0.0.0.0"

CMD ["node", "--max-old-space-size=2048", "--gc-interval=100", \
     "--optimize-for-size", "--expose-gc", "server.js"]

Things I didn’t know going in:

The secret mount (set -a && . /run/secrets/env_file) keeps env vars out of Docker layers. If you use ARG instead, docker history shows your secrets to anyone with the image.

Build stage needs --max-old-space-size=4096 because Next.js compilation eats RAM. Runtime gets 2048 so the OS has room on a 4GB box.

sharp (for <Image> optimization) comes through bun install as a dependency. I assumed I’d need npm install sharp in the runner stage. Didn’t.

The --expose-gc and --gc-interval=100 flags are there because I got OOM killed twice in the first week without them. More on that below.

Part 3: Zero-downtime deploys

Coolify’s default deploy does docker compose up -d, which is a recreate — stop old container, start new one. 3-6 seconds of 503s every time. Not acceptable when you have WebSocket connections.

I wrote a script that runs both containers at the same time:

Pull new image from GHCR
Start new container with the same Traefik labels, different container name
Wait for health check (~8s)
Traefik picks up the new container automatically
Stop old container (30s graceful shutdown)

Health check fails? Script exits. Old container keeps running. Free rollback.

docker pull ghcr.io/yourorg/yourapp:latest

docker run -d \
  --name yourapp-new \
  --label "traefik.enable=true" \
  --label "traefik.http.routers.yourapp.rule=Host(\`yourapp.com\`)" \
  --health-cmd="curl -sf http://localhost:3000/api/health" \
  --health-interval=5s \
  --health-retries=6 \
  ghcr.io/yourorg/yourapp:latest

until [ "$(docker inspect --format='{{.State.Health.Status}}' yourapp-new)" == "healthy" ]; do
  sleep 1
done

docker stop yourapp-old --time 30
docker rename yourapp-new yourapp

Git push to deploy, about a minute total. GHA builds the image, pushes to GHCR, SSHs in, runs the script. Slack ping when it’s done.

Part 4: Self-hosting Supabase

Scariest part. Supabase Cloud handles backups, patches, pooling. Self-hosting means that’s all on you.

Second Hetzner cpx21, same datacenter. Separate box so the database doesn’t fight the app for RAM.

Supabase’s docker-compose runs 13 containers out of the box. I turned off 6 I wasn’t using — storage, edge functions, analytics, vector, meta, imgproxy. Freed about 400MB of RAM.

The VMs talk over Tailscale. No public database ports. Browser hits a subdomain through Cloudflare for Realtime WebSockets. Server-side API calls go through the Tailscale internal IP.

Metric	Supabase Cloud	Self-hosted
App to DB latency	10-50ms	<1ms
Health check	101ms	48ms
API response	200-270ms	13-44ms

Didn’t expect numbers like that. Same datacenter + WireGuard tunnel vs the public internet is a big gap.

Daily pg_dump at 3 AM, uploaded to R2. 7 daily, 4 weekly, 3 monthly. I kept the Supabase Cloud credentials in a backup file so I can point the app back if everything breaks. Never had to use it.

Health check cron runs every 2 minutes — REST API, Postgres, container count, disk, memory. If something’s down it runs docker compose up -d and pings Slack.

The lh CLI

Wrote a CLI to stop clicking through the Coolify UI:

lh status                # health dashboard
lh quick-deploy [SHA]    # zero-downtime deploy
lh rollback [SHA]        # rollback
lh logs -f               # tail logs
lh ssh                   # shell into app server
lh provision             # new server from bare Ubuntu to prod
lh harden                # fail2ban, SSH keys, firewall

lh db status             # database health
lh db psql               # interactive psql to prod
lh db backup             # manual backup
lh db restore latest     # restore
lh db logs realtime      # container logs
lh db tune apply         # PG tuning
lh db provision          # new Supabase server from scratch

Shell scripts in scripts/infra/commands/. Wrappers around SSH, Docker, and curl. The provision commands are the big time saver — bare Ubuntu to production-ready in one run.

Cloudflare

Free plan does a lot:

CDN with 1-year TTL on /_next/static/
Full Strict SSL (origin cert from Let’s Encrypt)
HTTP/3, Brotli, Early Hints
R2 for uploads and static assets (zero egress)
Rate limiting on /api/auth/*
Bot fight mode

Static assets from each build go to R2 during CI. Old chunks stay around forever so users with cached HTML can still load old JS bundles. 90-day lifecycle rule cleans up eventually. Client-side ChunkLoadError handler auto-reloads when a chunk is actually gone.

What broke

OOM kills, twice in the first week.

First one: a PermissionMonitor class appending to an unbounded array on every request. No cap. Combined with no heap limit in the Dockerfile, Node just kept growing until the kernel killed it. Capped the array at 1000 entries, set --max-old-space-size=2048.

Second one was worse. App would run fine for about 19 hours, then die. Next.js has an in-memory fetch cache that defaults to unlimited in standalone mode — cacheMaxMemorySize grows with every cached API response. On top of that, the health endpoint was creating a new Supabase client on every request. Each client spins up GoTrue and Realtime listeners that never get cleaned up. And Node’s GC won’t bother collecting any of it until there’s actual heap pressure. Set cacheMaxMemorySize: 0, rewrote the health endpoint with raw fetch, added --gc-interval=100 --expose-gc.

I also added a debug endpoint locked to internal IPs that forces GC and takes heap snapshots without restarting the container. That’s what actually let me find the second leak.

Supabase CLI TLS. Pinned the CLI to v2.88.1, the pgx driver inside it changed SSL defaults. ?sslmode=disable in the URL stopped working silently. Migrations failed with tls error (server refused TLS connection). Fix: set PGSSLMODE=disable as an env var. Half a day debugging because the error doesn’t mention env vars.

Coolify’s deploy. Already covered above. 3-6s of 503s on every deploy from the default recreate strategy.

Should you do this?

If you’re paying $50+/mo for Vercel + managed database and your app runs on a single server, probably. Took me about a month on and off. Vercel Blob to R2 and the Supabase self-hosting were the slow parts. Docker + Coolify was a weekend. Zero-downtime deploy was another weekend.

If you’re a solo dev, a $14/mo Hetzner box gives you everything Vercel does minus edge functions and preview deployments. You get full control over deploys, the database, and your bill.

If you actually need preview deployments, multi-region, or you’d rather not be woken up at 2 AM when Postgres runs out of disk, stay on managed. That’s what it’s for.

The trade-off for me: bill cut in half, API responses 5-10x faster, zero deploy downtime. I now own uptime and backups. So far that’s a health check cron and apt upgrade once a month.

Numbers

Cost	Before	After
App hosting	$50/mo Vercel Pro	$14/mo Hetzner
Database	$30/mo Supabase Cloud	$14/mo Hetzner
Storage	included	~$2/mo R2
CDN/DNS/SSL	free Cloudflare	free Cloudflare
Total	~$80/mo	~$30/mo
Annual	~$960	~$360

Performance	Before	After
API response	200-270ms	13-44ms
DB latency	10-50ms	under 1ms
Deploy downtime	3-6s	0s