Product

Autoscaling without bill surprises

We redesigned autoscaling with hard spend caps and clear previews of what scaling will cost before you enable it.

MR

Marcus Rivera

Head of Product

Feb 27, 20267 min read

Autoscaling sounds great until you get the bill. A traffic spike at 3am, your service scales to 40 instances, the spike subsides, but the bill for that hour is already locked in. We've heard this story from dozens of developers who moved to StackBlaze after getting burned on other platforms. Today we're shipping two features that change how autoscaling works: spend caps and scaling previews.

The problem with autoscaling today

Most platforms treat autoscaling as a purely technical problem: scale up when CPU is high, scale down when it drops. They give you min/max instance counts and leave cost as your problem to figure out separately.

This creates a real tension. Set your max too low and your app falls over during a spike. Set it too high and a sustained load event or a traffic anomaly can generate a bill you weren't expecting. Budget alerts help, but they're reactive, they tell you after you've already spent the money.

BehaviorOld autoscalingNew autoscaling
Cost during spikeUnbounded up to max instancesCapped at your spend limit
Scale-up visibilityNone until it happensPreview shows projected cost before deploy
Over-limit behaviorN/A (no limit)Queues requests, returns 503 after timeout
Scale-down speedConfigurable cooldownConfigurable cooldown (unchanged)
Budget alertsVia billing dashboardBuilt into the service config

What we built

Spend caps let you set a maximum hourly spend for a service. When autoscaling would push the cost above your cap, StackBlaze stops adding instances and instead queues incoming requests up to a configurable timeout. If a request can't be handled within that timeout, it gets a 503. You decide: prefer degraded performance or prefer bill shock.

Scaling previews show you, before you deploy, what your autoscaling config would have done against your historical traffic. You'll see a graph of instance count over the last 30 days with your new config applied, alongside a projected monthly cost range.

How to configure it

blueprint.yaml
services:
  - name: api
    type: web
    scaling:
      min_instances: 1
      max_instances: 20
      target_cpu_percent: 65        # Scale up when CPU exceeds this
      target_memory_percent: 80     # Scale up when memory exceeds this
      scale_down_cooldown: 300      # Seconds to wait before scaling down
      spend_cap:
        hourly_usd: 4.00            # Never spend more than $4/hour on compute
        over_limit_behavior: queue  # Options: queue | reject
        queue_timeout_ms: 5000      # How long to hold queued requests

Don't set your cap too close to your baseline

If your spend cap is only slightly above your minimum instance cost, you might accidentally prevent any autoscaling from happening at all. Set your cap to reflect the maximum you're comfortable spending during a spike, not your expected steady-state cost. We recommend at least 3–5x your baseline as a starting cap while you learn your traffic patterns.

What's next

We're working on predictive scaling, using your historical traffic patterns to pre-scale before load hits, rather than reacting to it. If your app reliably gets a traffic spike every weekday morning, there's no reason to wait for CPU to climb before adding instances. We expect to ship a beta of this in Q3.

We're also adding team-level spend caps, a single cap across all services in an environment, for teams that want a single budget guardrail rather than managing per-service caps.

MR

Marcus Rivera

Head of Product at StackBlaze

Member of the founding team at StackBlaze. Writes about infrastructure, engineering culture, and the systems that keep production running.

More from the blog