Content Cluster Topology

Vespa has many features to optimize cost, query latency and throughput, at the same time making tradeoffs for availability. This guide goes through various topologies by example, highlighting the most relevant tradeoffs and discusses operational events like node stop and changing the topology.

The background for deploying with a grouped topology is found in sizing search. In short, query latency dictates the maximum number of document per node and hence how many node are needed in a group. Example: If query latency is at maximum tolerated for 1M documents, 6 nodes are needed in a group for a 6M index.

Content nodes are stateful, holding replicas of the documents to be queried. Content nodes can be deployed in different topologies - example using 6 nodes:

4 different topologies

Vespa Cloud requires a redundancy of at least 2. In this guide, it is assumed that redundancy, expressed as min-redundancy is set to n=3. Redundancy is a function of data availability / criticality and cost, and varies from application to application.

Redundancy is for storing a document replica on a node. Not all replicas are searchable - read Proton for a detailed understanding of sub-databases.

Out of the box: 1x6

Start off with a configuration like this:

<nodes count="6">
    <resources .../>

This means, the corpus is spread over 6 nodes, with 17% of documents active in queries each. This topology is called 1x6 in this guide.

This is important to remember when benchmarking for latency, normally done on a single node with n=1. In the 6-node system with n=3, more memory and disk space is used for the redundant replicas - more on that later.

This topology is the default topology, and works great:

  • When a node is stopped (unplanned, or planned like a software upgrade), there are 5 other nodes to serve queries, where each of the 5 will have 1/5 larger corpus to serve
  • Adding capacity, say 17% is done by increasing node count to 7

3-row topology: 3x2

Some applications, particularly the ones with extreme low-latency serving, will find that queries are dominated by the static part of query execution. This means, reducing number of documents queried does not lower latency.

The flip side is, increasing document count does not increase the latency much, either - consider 3x2:

<nodes count="6" groups="3">
    <resources .../>

Here we have configured 3 groups, with n=3. This means, the other node in the row does not have a replica - redundancy is between the rows.

Each node now has 3x the number of documents per query (compared to 1x6), but query capacity is also tripled, as each row has the full document corpus. This can be a great way to scale query throughput! Notes:

  • At planned/unplanned node stop, the full row is eliminated from query serving - there are four nodes total left, in two rows. Query capacity is hence down to 67%.
  • Feeding requirements is the same as in 1x6 - every document write is written to 3 replicas.
  • Document reconciliation is independent of topology - replicas from all nodes are used when rebuilding nodes after a node stop.

6-row topology: 6x1

Maximizing number of documents per node is good for cases where the query latency is still within requirements, and less total work is done, as fewer nodes in a row calculates candidates in ranking. The extreme case is all documents on a single node replicated with 6 groups. This is a quite common configuration due to high throughput and simplicity:

<nodes count="6" groups="6">
    <resources .../>


  • Feeding total work is higher - with n=6, six replicas are written (compared to three above). See feeding latency notes below.

2-row topology: 2x3

In this case, the application has a redundancy of 2 - it must be the same as number of rows:

<nodes count="6" groups="2">
    <resources .../>

This is a configuration most applications do not use: When a node stops (and it does daily for Vespa upgrades), the full row stops serving, which is 50% of the capacity out.

Topology migration

Migrating from one topology to another is easy, as Vespa Cloud will auto-migrate documents:

  • All rows must have same node count, meaning count / groups must be an integer.
  • Data can be unavailable for queries during migration, as nodes change row - see changing group configuration. The easiest procedure is serving from a different zone while the migration is running. Observe the merge_pending metric to find when migration is completed, then deploy the change to the other zone.


Documents are fed to Vespa Cloud using the <document-api> endpoint. This means, one Vespa Container node forwards document writes to all the replicas, in parallel. As all groups have a replica, adding a group will not add feed latency in theory due to the parallelism. However, there will be an increase in practise as more nodes means more latency variation, and the slowest node sets the end latency.