Product

Autoscaling without bill surprises

We redesigned autoscaling with hard spend caps and clear previews of what scaling will cost before you enable it.

Marcus Rivera

Head of Product

Feb 27, 20267 min read

Autoscaling sounds great until you get the bill. A traffic spike at 3am, your service scales to 40 instances, the spike subsides, but the bill for that hour is already locked in. We've heard this story from dozens of developers who moved to StackBlaze after getting burned on other platforms. Today we're shipping two features that change how autoscaling works: spend caps and scaling previews.

The problem with autoscaling today

Most platforms treat autoscaling as a purely technical problem: scale up when CPU is high, scale down when it drops. They give you min/max instance counts and leave cost as your problem to figure out separately.

This creates a real tension. Set your max too low and your app falls over during a spike. Set it too high and a sustained load event or a traffic anomaly can generate a bill you weren't expecting. Budget alerts help, but they're reactive, they tell you after you've already spent the money.

Behavior	Old autoscaling	New autoscaling
Cost during spike	Unbounded up to max instances	Capped at your spend limit
Scale-up visibility	None until it happens	Preview shows projected cost before deploy
Over-limit behavior	N/A (no limit)	Queues requests, returns 503 after timeout
Scale-down speed	Configurable cooldown	Configurable cooldown (unchanged)
Budget alerts	Via billing dashboard	Built into the service config

What we built

Spend caps let you set a maximum hourly spend for a service. When autoscaling would push the cost above your cap, StackBlaze stops adding instances and instead queues incoming requests up to a configurable timeout. If a request can't be handled within that timeout, it gets a 503. You decide: prefer degraded performance or prefer bill shock.

Scaling previews show you, before you deploy, what your autoscaling config would have done against your historical traffic. You'll see a graph of instance count over the last 30 days with your new config applied, alongside a projected monthly cost range.

How to configure it

blueprint.yaml

services:
  - name: api
    type: web
    scaling:
      min_instances: 1
      max_instances: 20
      target_cpu_percent: 65        # Scale up when CPU exceeds this
      target_memory_percent: 80     # Scale up when memory exceeds this
      scale_down_cooldown: 300      # Seconds to wait before scaling down
      spend_cap:
        hourly_usd: 4.00            # Never spend more than $4/hour on compute
        over_limit_behavior: queue  # Options: queue | reject
        queue_timeout_ms: 5000      # How long to hold queued requests

Don't set your cap too close to your baseline

If your spend cap is only slightly above your minimum instance cost, you might accidentally prevent any autoscaling from happening at all. Set your cap to reflect the maximum you're comfortable spending during a spike, not your expected steady-state cost. We recommend at least 3–5x your baseline as a starting cap while you learn your traffic patterns.

What's next

We're working on predictive scaling, using your historical traffic patterns to pre-scale before load hits, rather than reacting to it. If your app reliably gets a traffic spike every weekday morning, there's no reason to wait for CPU to climb before adding instances. We expect to ship a beta of this in Q3.

We're also adding team-level spend caps, a single cap across all services in an environment, for teams that want a single budget guardrail rather than managing per-service caps.

Marcus Rivera

Head of Product at StackBlaze

Member of the founding team at StackBlaze. Writes about infrastructure, engineering culture, and the systems that keep production running.

How Calico network policies isolate tenants on shared hosting

Shared Kubernetes does not have to mean shared trust boundaries. Calico enforces network isolation, Linkerd provides automatic mTLS between services, and Falco detects runtime threats, three layers that keep tenants separated on shared infrastructure.

Sarah Kim

Security16 min read

Shared platform vs dedicated clusters: control plane isolation and policy-as-code

Policy-as-code on a shared platform gives you guardrails without operational overhead. Dedicated clusters add an isolated control plane, single-tenant nodes, and customer-owned policy boundaries, here is how to choose and what changes under the hood.

Priya Patel

Security18 min read

Regulatory compliance and data governance on StackBlaze

SOC 2, GDPR, HIPAA-readiness, data residency, encryption, audit logs, and DPAs, a detailed map of how StackBlaze controls align with common regulatory frameworks and what you own vs what the platform certifies.

Nina Okoye