Autoscaling without bill surprises
We redesigned autoscaling with hard spend caps and clear previews of what scaling will cost before you enable it.
Marcus Rivera
Head of Product
Autoscaling sounds great until you get the bill. A traffic spike at 3am, your service scales to 40 instances, the spike subsides, but the bill for that hour is already locked in. We've heard this story from dozens of developers who moved to StackBlaze after getting burned on other platforms. Today we're shipping two features that change how autoscaling works: spend caps and scaling previews.
The problem with autoscaling today
Most platforms treat autoscaling as a purely technical problem: scale up when CPU is high, scale down when it drops. They give you min/max instance counts and leave cost as your problem to figure out separately.
This creates a real tension. Set your max too low and your app falls over during a spike. Set it too high and a sustained load event or a traffic anomaly can generate a bill you weren't expecting. Budget alerts help, but they're reactive, they tell you after you've already spent the money.
| Behavior | Old autoscaling | New autoscaling |
|---|---|---|
| Cost during spike | Unbounded up to max instances | Capped at your spend limit |
| Scale-up visibility | None until it happens | Preview shows projected cost before deploy |
| Over-limit behavior | N/A (no limit) | Queues requests, returns 503 after timeout |
| Scale-down speed | Configurable cooldown | Configurable cooldown (unchanged) |
| Budget alerts | Via billing dashboard | Built into the service config |
What we built
Spend caps let you set a maximum hourly spend for a service. When autoscaling would push the cost above your cap, StackBlaze stops adding instances and instead queues incoming requests up to a configurable timeout. If a request can't be handled within that timeout, it gets a 503. You decide: prefer degraded performance or prefer bill shock.
Scaling previews show you, before you deploy, what your autoscaling config would have done against your historical traffic. You'll see a graph of instance count over the last 30 days with your new config applied, alongside a projected monthly cost range.
How to configure it
services:
- name: api
type: web
scaling:
min_instances: 1
max_instances: 20
target_cpu_percent: 65 # Scale up when CPU exceeds this
target_memory_percent: 80 # Scale up when memory exceeds this
scale_down_cooldown: 300 # Seconds to wait before scaling down
spend_cap:
hourly_usd: 4.00 # Never spend more than $4/hour on compute
over_limit_behavior: queue # Options: queue | reject
queue_timeout_ms: 5000 # How long to hold queued requestsDon't set your cap too close to your baseline
If your spend cap is only slightly above your minimum instance cost, you might accidentally prevent any autoscaling from happening at all. Set your cap to reflect the maximum you're comfortable spending during a spike, not your expected steady-state cost. We recommend at least 3–5x your baseline as a starting cap while you learn your traffic patterns.
What's next
We're working on predictive scaling, using your historical traffic patterns to pre-scale before load hits, rather than reacting to it. If your app reliably gets a traffic spike every weekday morning, there's no reason to wait for CPU to climb before adding instances. We expect to ship a beta of this in Q3.
We're also adding team-level spend caps, a single cap across all services in an environment, for teams that want a single budget guardrail rather than managing per-service caps.
Marcus Rivera
Head of Product at StackBlaze
Member of the founding team at StackBlaze. Writes about infrastructure, engineering culture, and the systems that keep production running.
More from the blog
How Calico network policies isolate tenants on shared hosting
Shared Kubernetes does not have to mean shared trust boundaries. Calico enforces network isolation, Linkerd provides automatic mTLS between services, and Falco detects runtime threats, three layers that keep tenants separated on shared infrastructure.
Shared platform vs dedicated clusters: control plane isolation and policy-as-code
Policy-as-code on a shared platform gives you guardrails without operational overhead. Dedicated clusters add an isolated control plane, single-tenant nodes, and customer-owned policy boundaries, here is how to choose and what changes under the hood.
Regulatory compliance and data governance on StackBlaze
SOC 2, GDPR, HIPAA-readiness, data residency, encryption, audit logs, and DPAs, a detailed map of how StackBlaze controls align with common regulatory frameworks and what you own vs what the platform certifies.