Topology and Resizing

Vespa has features to optimize cost, query latency and throughput, at the same time making tradeoffs for availability. This guide goes through various topologies by example, highlighting the most relevant tradeoffs and discusses operational events like node stop and changing the topology.

Use cases for using a grouped topology is found in the elasticity guide. E.g., query latency can dictate the maximum number of document per node and hence how many node are needed in a group - if query latency is at maximum tolerated for 1M documents, 6 nodes are needed in a group for a 6M index.

Content nodes are stateful, holding replicas of the documents to be queried. Content nodes can be deployed in different topologies - example using 6 nodes:

4 different topologies

Vespa Cloud requires a redundancy of at least 2. In this guide, it is assumed that redundancy, configured as min-redundancy, is set to n=3. Redundancy is a function of data availability / criticality and cost, and varies from application to application.

Redundancy is for storing a document replica on a node. Not all replicas are searchable - read Proton for a detailed understanding of sub-databases.

Out of the box: 1x6

Most applications should be configured without a grouped topology, until optimizing for a use case - see the elasticity guide linked above. Therefore, start with a flat configuration, like:

1x6
<min-redundancy>3</min-redundancy>
<nodes count="6">
    <resources .../>
</nodes>

This means, the corpus is spread over 6 nodes, with 17% of documents active in queries each. This topology is called 1x6 in this guide.

This is important to remember when benchmarking for latency, normally done on a single node with n=1. In the 6-node system with n=3, more memory and disk space is used for the redundant replicas - more on that later.

This topology is the default topology, and works great:

3-row topology: 3x2

Some applications, particularly the ones with extreme low-latency serving, will find that queries are dominated by the static part of query execution. This means, reducing number of documents queried does not lower latency.

The flip side is, increasing document count does not increase the latency much, either - consider 3x2:

3x2
<min-redundancy>3</min-redundancy>
<nodes count="6" groups="3">
    <resources .../>
</nodes>

Here we have configured 3 groups, with n=3. This means, the other node in the row does not have a replica - redundancy is between the rows.

Each node now has 3x the number of documents per query (compared to 1x6), but query capacity is also tripled, as each row has the full document corpus. This can be a great way to scale query throughput! Notes:

6-row topology: 6x1

Maximizing number of documents per node is good for cases where the query latency is still within requirements, and less total work is done, as fewer nodes in a row calculates candidates in ranking. The extreme case is all documents on a single node replicated with 6 groups. This is a quite common configuration due to high throughput and simplicity:

6x1
<min-redundancy>6</min-redundancy>
<nodes count="6" groups="6">
    <resources .../>
</nodes>

Notes:

2-row topology: 2x3

In this case, the application has a redundancy of 2 - it must be the same as number of rows:

2x3
<min-redundancy>2</min-redundancy>
<nodes count="6" groups="2">
    <resources .../>
</nodes>

This is a configuration most applications do not use: When a node stops (and it does daily for Vespa upgrades), the full row stops serving, which is 50% of the capacity out.

Topology migration

Migrating from one topology to another is easy, as Vespa Cloud will auto-migrate documents:

Feeding

Documents are fed to Vespa Cloud using the <document-api> endpoint. This means, one Vespa Container node forwards document writes to all the replicas, in parallel. As all groups have a replica, adding a group will not add feed latency in theory due to the parallelism. However, there will be an increase in practise as more nodes means more latency variation, and the slowest node sets the end latency.