Scaling

Zero-downtime deployments

8 min readUpdated April 2026

Every StackBlaze deployment uses a Kubernetes Rolling Update strategy with maxSurge: 1 and maxUnavailable: 0. This guarantees that your full replica count keeps serving traffic throughout a deployment, new pods must pass a readiness probe before old ones are terminated.

The critical ingredient is a working health check endpoint. Without it, Kubernetes can't tell when your new pod is ready, and deployments become risky. Adding GET /health takes two minutes and protects every future deploy.

Rolling update timeline

starts newHTTP probe200 OK
Old Pod

v1.2.3

serving traffic

New Pod

v1.2.4

starting up

Health Check

GET /health

200 OK ✓

Old Terminates

SIGTERM sent

graceful drain

Health check endpoint

Express.js (Node)

// Minimal health endpoint, add this to every service

app.get('/health', (req, res) => {

res.status(200).json({ status: 'ok', timestamp: Date.now() })

})

// For deeper checks: verify DB connection before returning 200

app.get('/health', async (req, res) => {

await db.query('SELECT 1') // throws if DB is down

res.json({ status: 'ok' })

})

Deploy log, rolling update

deploy #47, my-web-service

10:14:02 Build complete: sha256:a3f7b9… 492 MB

10:14:03 Rolling update started (maxSurge: 1, maxUnavailable: 0)

10:14:04 New pod my-web-service-v47-xk2pq scheduled on worker-03

10:14:09 Container image pulled

10:14:11 Container started, waiting for readiness probe...

10:14:14 GET /health → 200 OK (attempt 1/3)

10:14:17 GET /health → 200 OK (attempt 2/3)

10:14:20 GET /health → 200 OK (attempt 3/3)

10:14:20 Pod my-web-service-v47-xk2pq is Ready

10:14:20 Traffic shifting to new pod (old pod still running)

10:14:21 Sending SIGTERM to my-web-service-v46-8rw7t

10:14:51 Old pod drained (30s graceful shutdown window)

10:14:52 Deploy #47 complete, 0 dropped requests

Readiness vs liveness probes

Readiness probe

Controls whether a pod receives traffic. Failing readiness removes the pod from the load balancer but does not restart it. Used during deployments to gate traffic until the app is warmed up. Also used for database migrations or cache warming.

Liveness probe

Detects whether a pod is stuck or deadlocked. Failing liveness causes Kubernetes to restart the container. Use a separate, simpler endpoint for liveness, it should only fail if the process is genuinely broken, not just temporarily busy.

Automatic rollback on health check failure

If the new pod fails its readiness probe more than the configured failure threshold (default: 3 consecutive failures), StackBlaze halts the rolling update and marks the deploy as failed. The old pods remain running and continue serving traffic. No manual intervention required, your previous version stays live. You can review the failure reason in the deploy log and push a fix.

Under the hood

  • Rolling Update strategy: StackBlaze sets strategy: type: RollingUpdate with maxSurge: 1 and maxUnavailable: 0 on every Deployment. This means replica count temporarily exceeds the target by 1 during a rollout, but capacity never drops below 100%.
  • readinessProbe: Kubernetes polls your /health endpoint via an HTTP GET. The pod only joins the Service endpoint list (and thus receives load balancer traffic) once the probe succeeds for successThreshold consecutive checks.
  • Graceful termination: Kubernetes sends SIGTERM to the old pod and waits terminationGracePeriodSeconds (default: 30s) for it to finish in-flight requests. Listen for SIGTERM in your app to drain connections cleanly.
  • Deployment revision history: Kubernetes retains the previous 10 ReplicaSet revisions. StackBlaze exposes one-click rollback in the dashboard which triggers kubectl rollout undo behind the scenes, reverting to the last healthy image in seconds.

Step by step

01

Add a /health endpoint to your service

Create a GET /health route in your application that returns HTTP 200 with a JSON body like { "status": "ok" }. This is the readiness probe endpoint. StackBlaze will poll it during every deployment to determine when the new pod is ready to serve traffic.

02

Configure the health check path in the dashboard

In your service settings under "Health checks", set the path to /health. You can also configure the initial delay (how long to wait before the first probe), timeout (how long a single probe attempt can take), and the failure threshold (how many consecutive failures before a pod is considered unhealthy).

03

Push to deploy

Connect your GitHub repository and push to your deploy branch (main by default). StackBlaze builds the new image, then starts the rolling update. New pods are created one at a time. The old pods continue serving traffic until the new pods are confirmed healthy.

04

Watch the rolling update in the deploy log

The deploy log in the dashboard shows each step in real-time: new pod scheduled, container pulling, health check polling, readiness confirmed, traffic shifted, old pod draining. If the health check fails at any point, StackBlaze halts the rollout and leaves the old version running.