This is a step-by-step guide to get started with benchmarking on Vespa Cloud.
It is based on the Vespa benchmarking guide,
adding what is needed for the Vespa Cloud.
Overview:
Notes:
The application can be deployed from anywhere,
using the Vespa CLI.
Query files should be available in the same data center as where production load originates -
or same zone as Vespa Cloud.
Documents are normally stored in same location as query files, but not necessarily.
Both need [data plane public and private key](security-model#data-plane) to access data in Vespa Cloud,
as well as getting metrics.
Monitoring is useful to track metrics when benchmarking.
Security - add and validate access
Refer to the Vespa security guide.
The user running benchmarks must have read access to the endpoint -
if you already have, you can skip this section.
Add more public certificates to security/clients.pem
in the application package by just cat'ing .pem files to clients.pem -
and then deploy the application package again.
Run a test query to test credentials - count all documents using schema music - using POST:
$ curl \
--cert ~/.vespa/mytenant.myapp.default/data-plane-public-cert.pem \
--key ~/.vespa/mytenant.myapp.default/data-plane-private-key.pem \
-H "Content-Type: application/json" \
--data '{"yql" : "select * from music where true;"}' \
https://myapp.mytenant.aws-us-east-1c.z.vespa-app.cloud/search/
Thu Feb 13 14:05:44 UTC 2020 Result received: 0 (0 failed so far, 382 sent, success rate 0.00 docs/sec).
Thu Feb 13 14:05:49 UTC 2020 Result received: 382 (0 failed so far, 382 sent, success rate 77.39 docs/sec).
Test using vespa-fbench
Test a single query, using
vespa-fbench
running in a docker container:
$ ls -1 *.pem
data-plane-private-key.pem
data-plane-public-cert.pem
$ cat query001.txt
/search/?yql=select%20%2A%20from%20music%20%2A%20where%20true
$ docker run -v $(pwd):/files -w /files --entrypoint '' vespaengine/vespa \
/opt/vespa/bin/vespa-fbench \
-C data-plane-public-cert.pem -K data-plane-private-key.pem -T /etc/ssl/certs/ca-bundle.crt \
-n 1 -q query001.txt -s 1 -c 0 \
myapp.mytenant.aws-us-east-1c.z.vespa-app.cloud 443
Starting clients...
Stopping clients
Clients stopped.
.
Clients Joined.
*** HTTP keep-alive statistics ***
connection reuse count -- 4
***************** Benchmark Summary *****************
clients: 1
ran for: 1 seconds
cycle time: 0 ms
lower response limit: 0 bytes
skipped requests: 0
failed requests: 0
successful requests: 5
cycles not held: 5
minimum response time: 128.17 ms
maximum response time: 515.35 ms
average response time: 206.38 ms
25 percentile: 128.70 ms
50 percentile: 129.60 ms
75 percentile: 130.20 ms
90 percentile: 361.32 ms
95 percentile: 438.36 ms
99 percentile: 499.99 ms
actual query rate: 4.80 Q/s
utilization: 99.03 %
zero hit queries: 5
http request status breakdown:
200 : 5
Make sure there are no SSL_do_handshake errors in the output.
Run queries from a data center
At this point, you have verified that the benchmarking tool is able to push load to the application.
Next step is to run this from the same location (data center) as the clients are deployed in.
In this example, an AWS zone. Deduce the AWS zone from Vespa Cloud zone name.
Below is an example using an AWS free-tier host with Amazon Linux 2 AMI (HVM) image:
Create the host - here assume key pair is named key.pem.
No need to do anything other than default.
Log in, update, install docker
(guide courtesy of Yevgeniy Brikman):
Whenever deploying changes to configuration, track progress in the Deployment dashboard.
Some changes, like changing
requestthreads
will restart content nodes, and this is done in sequence and takes time.
Wait for successful completion in Wait for services and endpoints to come online.
When changing node type/count, wait for auto data redistribution to complete,
watching the vds.idealstate.merge_bucket.pending.average_ metric:
.role=="content/search/0/1" - the host index will vary, depending on number of changes to such nodes.
E.g. after adding mode nodes, this metric will jump, then decrease (not necessarily linearly) -
speed depending on data volume.
This is checking just one node, check all vespa.distributor for progress.
Sizing
Using Vespa Cloud enables the Vespa Team to assist you to optimise the application to reduce resource spend.
Based on 150 applications running on Vespa Cloud today, savings are typically 50%.
Cost optimization is hard to do without domain knowledge -
but few teams are experts in both their application and its serving platform.
Sizing means finding both the right node size and the right cluster topology:
Applications use Vespa for their primary business use cases.
Availability and performance vs. cost are business decisions.
The best sized application can handle all expected load situations,
and is configured to degrade quality gracefully for the unexpected.
Even though Vespa is cost-efficient out of the box,
Vespa experts can usually spot over/under-allocations in CPU, memory and disk space/IO,
and discuss trade-offs with the application team.
Using automated deployments applications go live with little risk.
After launch, right-size the application based on true load after using Vespa’s elasticity features
with automated data migration.
Use the Vespa sizing guide
to size the application and find metrics used there. Pro-tips:
60% is a good max memory allocation
50% is a good max CPU allocation, although application dependent.
70% is a good max disk allocation
Rules of thumb:
Memory and disk scales approximately linearly for indexed fields' data -
attributes have a fixed cost for empty fields.