services.xml is the primary Vespa configuration file.
This documents services.xml amendments for Vespa Cloud - see
services.xml at docs.vespa.ai
for the general reference.
<nodes>
In cloud applications nodes are specified by count and node resources. Example:
count: An integer or range. The number of nodes of the cluster.
exclusive (optional): true or false (default false). If true, the nodes of this cluster will be
placed on hosts shared only with other nodes of the same application.
This is useful for container clusters in applications storing sensitive data or secrets as it adds another layer of
protection against leaking sensitive information between applications sharing host.
In addition there are some attributes for specific cluster types, listed below.
<nodes> for <content>
groups (optional): Integer or range.
Sets the number of groups into which content nodes should be divided.
Each group will have an equal share of the nodes and redundancy copies of the corpus,
and each query will be routed to just one group.
This allows scaling
to a higher query load than within a single group.
group-size (optional): Integer or range
where either value can be skipped (replaced by an empty string) to create a one-sided limit.
If this is set, the group sizes used will always be within these limits (inclusive).
If neither groups nor group-size is set, all nodes will always be placed in a single group.
<nodes> for <controllers> <slobroks> and <logservers>
The nodes element nested in these elements allow specifying whether the nodes used should be dedicated to the service
or if it should run on existing nodes. Attribute:
dedicated (optional): true or false (default false).
Whether separate nodes should be allocated for this service or not.
<resources>
Contained in the nodes element, specifies each node's resource requirements.
Resources is a powerful way to optimize cost and performance.
For new launches, allocate enough to reduce risk -
then use the performance guides
to find the sweet spot to balance cost and free capacity, based on real production load.
Migration to new capacity is automated,
read more in elastic Vespa.
float or range, each followed by a byte unit, such as "Gb"
Memory
disk
float or range, each followed by a byte unit, such as "Gb"
Disk space
storage-type (optional)
string (enum)
The type of storage to use. This is useful to specify local storage when network storage provides insufficient
io operations or too noisy io performance:
local
Node-local storage is required.
remote
Network storage must be used.
any (default)
Both remote or local storage may be used.
disk-speed (optional)
string (enum)
The required disk speed category:
fast (default)
SSD-like disk speed is required
slow
This is sized for spinning disk speed
any
Performance does not depend on disk speed (often suitable for container clusters).
architecture (optional)
string (enum)
Node CPU architecture:
x86_64 (default)
arm64
any
Use any of the available architectures.
See index bootstrap for how to set resources in a step-by-step procedure,
estimating settings by feeding smaller subsets.
Also note that autoscaling described below is not designed for index bootstrapping,
as a bootstrap normally is much quicker than a cluster will autoscale.
Autoscaling ranges
Resources specified as a range will be autoscaled by the system. Ranges
are expressed by the syntax [lower-limit, upper-limit]. Both limits
are inclusive.
Autoscaling will attempt to keep utilization of all allocated resources close to ideal,
and will automatically reconfigure to the cheapest option allowed by the ranges when
necessary.
The ideal utilization takes into account that a node
may be down or failing, that another region may be down causing doubling of traffic, and that we need headroom for
maintenance operations and handling requests with low latency. It acts on what it has observed on your system
in the recent past. If you need much more capacity in the near future than you do currently, you may want
to set the lower limit to take this into account. Upper limits should be set to the maximum size
that makes business sense.
When a new cluster (or application) is deployed it will initially be configured with the minimal
resources given by the ranges. When autoscaling is turned on for an existing cluster, it will continue
unchanged until autoscaling determines that a change is beneficial.