Vespa has many features to optimize cost, query latency and throughput, at the same time making tradeoffs for availability. This guide goes through various topologies by example, highlighting the most relevant tradeoffs and discusses operational events like node stop and changing the topology.
The background for deploying with a grouped topology is found in sizing search. In short, query latency dictates the maximum number of document per node and hence how many node are needed in a group. Example: If query latency is at maximum tolerated for 1M documents, 6 nodes are needed in a group for a 6M index.
Content nodes are stateful, holding replicas of the documents to be queried. Content nodes can be deployed in different topologies - example using 6 nodes:
In this guide, it is assumed that redundancy is set to n=3. Vespa Cloud requires a minimum redundancy of 2, for availability. Redundancy is a function of data availability / criticality and cost, and varies from application to application.
Start off with a configuration like this:
<redundancy>3</redundancy> <nodes count="6"> <resources .../> </nodes>
This means, the corpus is spread over 6 nodes, with 17% of documents active in queries each. This topology is called 1x6 in this guide.
This is important to remember when benchmarking for latency, normally done on a single node with n=1 (redundancy). In the 6-node system with n=3, more memory and disk space is used for the redundant replicas - more on that later.
This topology is the default topology, and works great:
See redundancy for detailed configuration notes.
Some applications, particularly the ones with extreme low-latency serving, will find that queries are dominated by the static part of query execution. This means, reducing number of documents queried does not lower latency.
The flip side is, increasing document count does not increase the latency much, either - consider 3x2:
<redundancy>1</redundancy> <nodes count="6" groups="3"> <resources .../> </nodes>
Here we have configured 3 groups, with n=1 - the redundancy config is per-group. This means, the other node in the row does not have a replica - redundancy is between the rows
Each node now has 3x the number of documents per query, but query capacity is also tripled, as each row has the full document corpus. This can be a great way to scale query throughput! Notes:
Maximizing number of documents per node is good for cases where the query latency is still within requirements, and less total work is done, as fewer nodes in a row calculates candidates in ranking. The extreme case is all documents on a single node replicated with 6 groups. This is a quite common configuration due to high throughput and simplicity:
<redundancy>1</redundancy> <nodes count="6" groups="6"> <resources .../> </nodes>
Notes:
In this case, the application has a redundancy of 2:
<redundancy>1</redundancy> <nodes count="6" groups="2"> <resources .../> </nodes>
This is a configuration most applications do not use: When a node stops (and it does minimum daily for Vespa upgrades), the full row stops serving, which is 50% of the capacity.
Migrating from one topology to another is easy, as Vespa Cloud with auto migrate documents:
count / groups
must be an integer.merge_pending
metric to find when migration is completed,
then deploy the change to the other zone.
Redundancy is for a document replica on a node, not necessarily immediately searchable - read Proton for a detailed understanding of sub-databases. In short:
searchable-copies
in any of the topologies with groups and n=1.
searchable-copies
equals 1 regardless.searchable-copies
for a flat (non-grouped) topology with n=2, this is defaultsearchable-copies=3
.
It is possible to save some memory by setting searchable-copies=2
on the third replica,
but care must be taken to not slow down features like document expiry and feeding to in-memory attributes.
Contact the Vespa Team if in doubt.Note than self-hosted Open Source Vespa configures groups
and redundancy
differently than Vespa Cloud.
The latter simplifies both the configuration and the operational aspects of modifying it.