services.xml is the primary Vespa configuration file. This documents services.xml amendments for Vespa Cloud - see services.xml at docs.vespa.ai for the general reference.
In cloud applications nodes are specified by count and node resources. Example:
In addition there are some attributes for specific cluster types, listed below.
groups (optional): Integer or range. Sets the number of groups into which content nodes should be divided. Each group will have an equal share of the nodes and redundancy copies of the corpus, and each query will be routed to just one group. This allows scaling to a higher query load than within a single group.
group-size (optional): Integer or range where either value can be skipped (replaced by an empty string) to create a one-sided limit. If this is set, the group sizes used will always be within these limits (inclusive).
If neither groups nor group-size is set, all nodes will always be placed in a single group.
The nodes element nested in these elements allow specifying whether the nodes used should be dedicated to the service or if it should run on existing nodes. Attribute:
Contained in the nodes element, specifies the resources available on each node. The resources must match a node flavor on AWS, GCP, or both, depending on which zones you are deploying to. Exception: If you use remote disk, you can specify any number lower than the max size.
Any element not specified will be assigned a default value.
|vcpu||float or range||CPU, virtual threads|
|memory||float or range, each followed by a byte unit, such as "Gb"||Memory|
|disk||float or range, each followed by a byte unit, such as "Gb"||Disk space|
|storage-type (optional)||string (enum)||The type of storage to use. This is useful to specify local storage when network storage provides insufficient
io operations or too noisy io performance:
|disk-speed (optional)||string (enum)||The required disk speed category:
|architecture (optional)||string (enum)||Node CPU architecture:
See index bootstrap for how to set resources in a step-by-step procedure, estimating settings by feeding smaller subsets. Note that autoscaling of content clusters involves data redistribution and cannot speed up bootstrapping.
Declares GPU resources to provision.
|count||integer||Number of GPUs|
|memory||integer, followed by a byte unit, such as "Gb"||Amount of memory per GPU. Total amount of GPU memory available is this
number multiplied by
Resources specified as a range will be autoscaled by the system. Ranges
are expressed by the syntax
[lower-limit, upper-limit]. Both limits
Autoscaling will attempt to keep utilization of all allocated resources close to ideal, and will automatically reconfigure to the cheapest option allowed by the ranges when necessary.
The ideal utilization takes into account that a node may be down or failing, that another region may be down causing doubling of traffic, and that we need headroom for maintenance operations and handling requests with low latency. It acts on what it has observed on your system in the recent past. If you need much more capacity in the near future than you do currently, you may want to set the lower limit to take this into account. Upper limits should be set to the maximum size that makes business sense.
When a new cluster (or application) is deployed it will initially be configured with the minimal resources given by the ranges. When autoscaling is turned on for an existing cluster, it will continue unchanged until autoscaling determines that a change is beneficial.
Autoscaling node count:
Autoscaling on all resources:
<redundancy> sets the number of data copies in each group, not in total.
It is usually preferable to set <min-redundancy> instead - especially with autoscaling.
admin element is ignored, with one exception:
when using metrics - example:
When migrating from self-hosted to Vespa Cloud, one can safely remove
unless one want to configure a metric-set.
services.xml settings can be made to vary by tags, instance, environment and region, see deployment variants.