Autoscaling

Autoscaling lets you adjust the hardware resources allocated to application clusters automatically depending on actual usage.

You can turn it on by specifying ranges in square brackets for the node and/or node resource values in services.xml. The Vespa Cloud will monitor the resource utilization of your clusters and automatically choose the cheapest resource allocation within ranges that produces close to optimal utilization.

You can see the status and recent actions of the autoscaler in the Resources view under a deployment in the console.

Autoscaling is not considering latency differences achieved by different configurations. If your application has certain configurations that produce good throughput but too high latency, you should not include these configurations in your autoscaling ranges.

Adjusting the allocation of a cluster may happen quickly for stateless container clusters, and much more slowly for content clusters with a lot of data. Autoscaling will adjust each cluster on the timescale it typically takes to rescale it (including any data redistribution).

When to use autoscaling

Autoscaling is useful in a number of scenarios. Some typical ones are:

Resource tradeoffs

Some other considerations when deciding resources:

It is often safe to follow the suggested resources advice when shown in the console and feel free to contact us if you have questions.

Mixed load

A Vespa application must handle a combination of reads and writes, from multiple sources. User load often resembles a sine-like curve. Machine-generated load, like a batch job, can be spiky and abrupt.

In the default Vespa configuration, all kinds of load uses _one_ default container cluster. Example: An application where daily batch jobs update the corpus at high rate:

nodes and resources

Autoscaling scales up much quicker than down, as the probability of a new spike is higher after one has been observed. In this example, see the rapid cluster growth for the daily load spike - followed by a slow decay.

The best solution for this case is to slow down the batch job, as it is of short duration. It is not always doable to slow down jobs - in these cases, setting up multiple container clusters can be a smart thing - optimize each cluster for its load characteristics. This could be a combination of clusters using autoscale and clusters with a fixed size. Autoscaling often works best for the user-generated load, whereas the machine-generated load could either be tuned or routed to a different cluster in the same Vespa application.

Related reading