services.xml: Vespa Cloud

services.xml is the primary Vespa configuration file. This documents services.xml amendments for Vespa Cloud - see services.xml at docs.vespa.ai for the general reference.

<nodes>

In cloud applications nodes are specified by count and node resources. Example:

<nodes count="4">
    <resources vcpu="8" memory="16Gb" disk="200Gb"/>
</nodes>

Resources must match a node flavor on the cloud(s) you are deploying to, see AWS flavors and GCP flavors.

Subelements: resources

Attributes:

In addition there are some attributes for specific cluster types, listed below.

<nodes> for <content>

<nodes> for <cluster-controllers> <slobroks> and <logservers>

The nodes element nested in these elements allow specifying whether the nodes used should be dedicated to the service or if it should run on existing nodes. Attribute:

<resources>

Contained in the nodes element, specifies the resources available on each node. The resources must match a node flavor on AWS, GCP, or both, depending on which zones you are deploying to. Exception: If you use remote disk, you can specify any number lower than the max size.

Any element not specified will be assigned a default value.

Subelements: gpu

Attribute type Default Description
vcpu float or range 2 CPU, virtual threads
memory float or range, each followed by a byte unit, such as "Gb" 16 - content nodes
8 - container nodes
Memory
disk float or range, each followed by a byte unit, such as "Gb" 300 - content nodes
50 - container nodes
Disk space. To fit core dumps/heap dumps, the disk space should be larger than 3 x memory size for content nodes, 2 x memory size for container nodes.
storage-type (optional) string (enum) any The type of storage to use. This is useful to specify local storage when network storage provides insufficient io operations or too noisy io performance:
local Node-local storage is required.
remote Network storage must be used.
any Both remote or local storage may be used.
disk-speed (optional) string (enum) fast The required disk speed category:
fast SSD-like disk speed is required
slow This is sized for spinning disk speed
any Performance does not depend on disk speed (often suitable for container clusters).
architecture (optional) string (enum) any Node CPU architecture:
x86_64
arm64
any Use any of the available architectures.

See index bootstrap for how to set resources in a step-by-step procedure, estimating settings by feeding smaller subsets. Note that autoscaling of content clusters involves data redistribution and cannot speed up bootstrapping.

<gpu>

Declares GPU resources to provision.

Current limitations:

Subelements: None

AttributetypeDescription
count integer Number of GPUs
memory integer, followed by a byte unit, such as "Gb" Amount of memory per GPU. Total amount of GPU memory available is this number multiplied by count

Example:

<nodes count="2">
    <resources vcpu="4" memory="16Gb" disk="125Gb">
        <gpu count="1" memory="16Gb"/>
    </resources>
</nodes>

<clients>

Parent element for client security configuration, child element of a container. Find practical examples in the security guide.

<client>

Child element of clients. Use to configure security credentials for a container cluster, using certificate or token.

<certificate>

Child element of client. Configure certificates using the file attribute.

<token>

Child element of client. Configure certificates using the id attribute.

Autoscaling ranges

Resources specified as a range will be autoscaled by the system. Ranges are expressed by the syntax [lower-limit, upper-limit]. Both limits are inclusive.

Autoscaling will attempt to keep utilization of all allocated resources close to ideal, and will automatically reconfigure to the cheapest option allowed by the ranges when necessary.

The ideal utilization takes into account that a node may be down or failing, that another region may be down causing doubling of traffic, and that we need headroom for maintenance operations and handling requests with low latency. It acts on what it has observed on your system in the recent past. If you need much more capacity in the near future than you do currently, you may want to set the lower limit to take this into account. Upper limits should be set to the maximum size that makes business sense.

When a new cluster (or application) is deployed it will initially be configured with the minimal resources given by the ranges. When autoscaling is turned on for an existing cluster, it will continue unchanged until autoscaling determines that a change is beneficial.

Examples:

Node count autoscaling:

<nodes count="[2, 3]">
    <resources vcpu="2" memory="16Gb" disk="300Gb"/>
</nodes>

Resource autoscaling:

<nodes count="2">
    <resources vcpu="[2, 4]" memory="16Gb" disk="300Gb"/>
</nodes>

You can use ranges on any combination of resource attributes - read the autoscaling guide to learn more.

<secrets>

In cloud applications you can set up a secret store to manage secrets needed by application in a secure manner. Please refer to the secret store guide for more information.

Elements with different meaning on Vespa Cloud

<redundancy>

<redundancy> sets the number of data copies in each group, not in total.

It is usually preferable to set <min-redundancy> instead - especially with autoscaling.

Ignored elements

The admin element is ignored, with one exception: when using metrics - example:

<admin version="4.0">
    <metrics>
        <consumer id="my-custom-consumer">
            <metric-set id="default" />
            <metric id="vds.idealstate.garbage_collection.documents_removed.count" />
        </consumer>
    </metrics>
</admin>

When migrating from self-hosted to Vespa Cloud, one can safely remove admin, unless one want to configure a metric-set.

Deployment variants

services.xml settings can be made to vary by tags, instance, environment and region, see deployment variants.