This is a guide on how to replicate a Vespa application into different environments, with or without data. Use cases for cloning include:
dev
environment to easily cooperate and share.prod
environment to experiment with a CI/CD pipeline,
without touching the current production serving.dev
environment.perf
environment for load testing.This guide uses applications. One can also use instances, but that will not work across Vespa major versions on Vespa Cloud - refer to tenant, applications, instances for details.
Vespa Cloud has different environments dev/perf
and prod
, with different characteristics -
details.
Clone to dev/perf
for short-lived experiments/development,
use prod
for serving applications with a CI/CD pipeline.
As some steps are similar, it is a good idea to read through all, as details are added only first time for brevity. Examples are based on the album-recommendation sample application.
dev/perf
environments are auto-expired (details),
so application cloning is a safe way to work with Vespa.
Find more details in deleting an application.
Creating a copy from one self-hosted application to another. Self-hosted means running vespa.ai on a laptop or a multinode system.
This example sets up a source app and deploys the application package - use album-recommendation as an example. The application package is then exported from the source and deployed to a new target app. Steps:
Source setup:
$ docker run --detach --name vespa1 --hostname vespa-container1 \
--publish 8080:8080 --publish 19071:19071 \
vespaengine/vespa
$ vespa deploy -t http://localhost:19071
Target setup:
$ docker run --detach --name vespa2 --hostname vespa-container2 \
--publish 8081:8080 --publish 19072:19071 \
vespaengine/vespa
Export source application package
If the resource/laptop running Docker does not have tar
,
mount /tmp/d out of the container or just copy the files by other means. Export files:
$ docker exec vespa1 sh -c "mkdir -p /tmp/d && cd /tmp/d && /opt/vespa/bin/vespa-deploy fetch"
$ docker exec -w /tmp/d vespa1 tar cvf - . | tar xvf -
$ docker exec vespa1 rm -rf /tmp/d
Deploy application package to target
Before deploying, one can make changes to the application package files as needed. Deploy to target:
$ vespa deploy -t http://localhost:19072
Data copy from source to target
This pipes the source data directly into vespa-feed-client
-
another option is to save the data to files temporarily and feed these individually:
$ docker exec vespa1 /opt/vespa/bin/vespa-visit | \
vespa-feed-client-cli/vespa-feed-client --stdin --endpoint http://localhost:8081
Data copy 5%
This is an example on how to use a selection to specify a subset of the documents - here a “random” 5% selection:
$ docker exec vespa1 /opt/vespa/bin/vespa-visit -s 'id.hash().abs() % 20 = 0' | \
vespa-feed-client-cli/vespa-feed-client --stdin --endpoint http://localhost:8081
Get access log from source
Get the current query access log from the source application (there might be more files there):
$ docker exec vespa1 cat /opt/vespa/logs/vespa/access/JsonAccessLog.default
Source setup:
$ docker run -v --detach --name vespa1 --hostname vespa-container1 \
--publish 8080:8080 --publish 19071:19071 \
vespaengine/vespa
$ vespa deploy -t http://localhost:19071
Target setup:
Create a tenant in the Vespa Cloud console, in this guide using “mytenant”.
Export source application package:
$ docker exec vespa1 sh -c "mkdir -p /tmp/d && cd /tmp/d && /opt/vespa/bin/vespa-deploy fetch"
$ docker exec -w /tmp/d vespa1 tar cvf - . | tar xvf -
$ docker exec vespa1 rm -rf /tmp/d
Deploy target application package
The procedure differs a little whether deploying to dev/perf or prod environment.
The mvn -U clean package
step is only needed for applications with custom code.
Configure application and instance names and create data plane credentials:
$ vespa config set target cloud && \ vespa config set application mytenant.myapp.myinstance $ vespa auth login $ vespa auth cert -f $ mvn -U clean package
vespa auth cert -f
.
If reusing a cert/key pair, drop -f
and make sure to put the pair in .vespa, to avoid errors like
Error: open /Users/me/.vespa/mytenant.myapp.myinstance/data-plane-public-cert.pem: no such file or directory
in the subsequent deploy step.
Then deploy the application.
Depending on the use case, deploy to dev
/perf
or prod
:
dev
/perf
:
$ vespa deployExpect something like:
Uploading application package ... done Success: Triggered deployment of . with run ID 1 Use vespa status for deployment status, or follow this deployment at https://console.vespa-cloud.com/tenant/mytenant/application/myapp/dev/instance/myinstance/job/dev-aws-us-east-1c/run/1
prod
environment requires deployment.xml -
select which zone to deploy to:
$ cat <<EOF > deployment.xml <deployment version="1.0"> <prod> <region>aws-us-east-1c</region> </prod> </deployment> EOF
prod
deployments also require resources
specifications
in services.xml
- use
vespa-documentation-search as an example and add/replace nodes
elements
for container
and content
clusters.
If in doubt, just add a small config to start with, and change later:
<nodes count="2"> <resources vcpu="2" memory="8Gb" disk="10Gb" /> </nodes>Submit the application package:
$ vespa prod submitExpect something like:
Hint: See https://cloud.vespa.ai/en/getting-to-production Success: Submitted . for deployment See https://console.vespa-cloud.com/tenant/mytenant/application/myapp/prod/deployment for deployment progressA proper deployment to a
prod
zone should have automated tests,
read more in automated deployments
Data copy
Get the vespa-feed-client first. Find the endpoint in the Vespa Cloud Console, then:
$ docker exec vespa1 /opt/vespa/bin/vespa-visit | \
./vespa-feed-client-cli/vespa-feed-client --stdin --show-errors \
--certificate /Users/me/.vespa/mytenant.myapp.myinstance/data-plane-public-cert.pem \
--private-key /Users/me/.vespa/mytenant.myapp.myinstance/data-plane-private-key.pem \
--endpoint https://myinstance.myapp.mytenant.aws-us-east-1c.dev.z.vespa-app.cloud
Get access log from source:
$ docker exec vespa1 cat /opt/vespa/logs/vespa/access/JsonAccessLog.default
Download application from Vespa Cloud
The application package can be downloaded from the Vespa Cloud Console:
dev/perf: Navigate to https://console.vespa-cloud.com/tenant/mytenant/application/myapp/dev/instance/myinstance, Click the Application download:
prod: Navigate to https://console.vespa-cloud.com/tenant/mytenant1/application/myapp/prod/deployment?tab=builds and select the version of the application to download:
Target setup:
Note the name of the application package .zip-file.
If changes are needed, unzip it and use vespa deploy -t http://localhost:19071
to deploy from current directory:
$ docker run --detach --name vespa1 --hostname vespa-container1 \
--publish 8080:8080 --publish 19071:19071 \
vespaengine/vespa
$ vespa config set target local
$ vespa deploy -t http://localhost:19071 mytenant.myapp.myinstance.dev.aws-us-east-1c.zip
Data copy
Modify dump.sh, use correct tenant.app.instance names - then start a dump/feed job. The json cannot be fed directly, hence the little JSON filtering using jq:
$ ./dump.sh | jq .documents[] | \
vespa-feed-client-cli/vespa-feed-client --stdin --show-errors \
--endpoint http://localhost:8081
data copy - minimal
For use cases requiring a few documents, visit just a few documents:
$ curl --cert data-plane-public-cert.pem --key data-plane-private-key.pem \
"https://myinstance.myapp.mytenant.aws-us-east-1c.dev.z.vespa-app.cloud/document/v1/?cluster=music&wantedDocumentCount=10"
Get access log from source:
Use the Vespa Cloud Console to get access logs
This is a combination of the procedures above.
Download the application package from dev/perf or prod,
make note of the source name, like mytenant.myapp.myinstance.
Then use vespa deploy
or vespa prod submit
as above to deploy to dev/perf or prod.
If cloning from dev/perf
to prod
, pay attention to changes in deployment.xml and services.xml
as in cloning to Vespa Cloud.
Data copy
Update dump.sh with source, e.g. mytenant.myapp.myinstance, and set the endpoint name / paths based on source name, e.g. mytenant.myapp-new.myinstance:
$ ./dump.sh | jq .documents[] | \
vespa-feed-client-cli/vespa-feed-client --stdin --show-errors \
--certificate /Users/me/.vespa/mytenant.myapp-new.myinstance/data-plane-public-cert.pem \
--private-key /Users/me/.vespa/mytenant.myapp-new.myinstance/data-plane-private-key.pem \
--endpoint https://myinstance.myapp-new.mytenant.aws-us-east-1c.dev.z.vespa-app.cloud
Data copy 5% Set the SELECTION variable in dump.sh to select a subset of the documents
#!/bin/bash
ENDPOINT="https://myinstance.myapp.mytenant1.aws-us-east-1c.dev.z.vespa-app.cloud"
NAMESPACE=mynamespace
DOCTYPE=music
CLUSTER=music
unset SELECTION
# Use a selection to visit a subset - example 5% selection: id.hash().abs() % 20 = 0
# SELECTION='&selection=id.hash%28%29.abs%28%29%20%25%2020%20%3D%200'
continuation=""
idx=0
while
((idx+=1))
printf -v out "%05g" $idx
filename=${NAMESPACE}-${DOCTYPE}-${out}.data
token=$( curl -s \
--cert /Users/me/.vespa/mytenant.myapp.myinstance/data-plane-public-cert.pem \
--key /Users/me/.vespa/mytenant.myapp.myinstance/data-plane-private-key.pem \
"${ENDPOINT}/document/v1/${NAMESPACE}/${DOCTYPE}/docid?wantedDocumentCount=1000&cluster=${CLUSTER}&${continuation}${SELECTION}" \
| tee ${filename} | jq -re .continuation )
do
continuation="continuation=${token}"
done
cat *.data