Cloning applications and data

This is a guide on how to replicate a Vespa application in different environments, with or without data. Use cases for cloning include:

This guide uses applications. One can also use instances, but that will not work across Vespa major versions on Vespa Cloud - refer to tenant, applications, instances for details.

Vespa Cloud has different environments dev/perf and prod, with different characteristics - details. Clone to dev/perf for short-lived experiments/development, use prod for serving applications with a CI/CD pipeline.

As some steps are similar, it is a good idea to read through all, as details are added only the first time for brevity. Examples are based on the album-recommendation sample application.

Cloning - self-hosted to Vespa Cloud

Source setup:

$ docker run --detach --name vespa1 --hostname vespa-container1 \
  --publish 8080:8080 --publish 19071:19071 \
  vespaengine/vespa

$ vespa deploy -t http://localhost:19071

Target setup:

Create a tenant in the Vespa Cloud console, in this guide using “mytenant”.

Export source application package:

This gets the application package and copies it out of the container to local file system:

$ vespa fetch -t http://localhost:19071 && \
  unzip application.zip -x application.zip

Deploy target application package

The procedure differs a little whether deploying to dev/perf or prod environment. The mvn -U clean package step is only needed for applications with custom code. Configure application name and create data plane credentials:

$ vespa config set target cloud && \
  vespa config set application mytenant.myapp

$ vespa auth login

$ vespa auth cert -f

$ mvn -U clean package

Then deploy the application. Depending on the use case, deploy to dev/perf or prod:

Data copy

Export documents from the local instance and feed to the Vespa Cloud instance:

$ vespa visit -t http://localhost:8080 | vespa feed -

Add more parameters as needed to vespa feed for other endpoints.

Get access log from source:

$ docker exec vespa1 cat /opt/vespa/logs/vespa/access/JsonAccessLog.default

Cloning - Vespa Cloud to self-hosted

Download application from Vespa Cloud

Validate the endpoint, and fetch the application package:

$ vespa config get application
application = mytenant.myapp.default

$ vespa fetch
Downloading application package... done
Success: Application package written to application.zip

The application package can also be downloaded from the Vespa Cloud Console:

Target setup:

Note the name of the application package .zip-file just downloaded. If changes are needed, unzip it and use vespa deploy -t http://localhost:19071 to deploy from current directory:

$ docker run --detach --name vespa1 --hostname vespa-container1 \
  --publish 8080:8080 --publish 19071:19071 \
  vespaengine/vespa

$ vespa config set target local

$ vespa deploy -t http://localhost:19071 mytenant.myapp.default.dev.aws-us-east-1c.zip

Data copy

Set config target cloud for vespa visit and pipe the jsonl output into vespa feed to the local instance:

$ vespa config set target cloud

$ vespa visit | vespa feed - -t http://localhost:8080

data copy - minimal

For use cases requiring a few documents, visit just a few documents:

$ vespa visit --chunk-count 10

Get access log from source:

Use the Vespa Cloud Console to get access logs

Cloning - Vespa Cloud to Vespa Cloud

This is a combination of the procedures above. Download the application package from dev/perf or prod, make note of the source name, like mytenant.myapp.default. Then use vespa deploy or vespa prod deploy as above to deploy to dev/perf or prod.

If cloning from dev/perf to prod, pay attention to changes in deployment.xml and services.xml as in cloning to Vespa Cloud.

Data copy

Set the feed endpoint name / paths, e.g. mytenant.myapp-new.default:

$ vespa config set target cloud

$ vespa visit | vespa feed - -t https://default.myapp-new.mytenant.aws-us-east-1c.dev.z.vespa-app.cloud

Data copy 5% Set the –selection argument to vespa visit to select a subset of the documents.

Cloning - self-hosted to self-hosted

Creating a copy from one self-hosted application to another. Self-hosted means running Vespa on a laptop or a multinode system.

This example sets up a source app and deploys the application package - use album-recommendation as an example. The application package is then exported from the source and deployed to a new target app. Steps:

Source setup:

$ vespa config set target local

$ docker run --detach --name vespa1 --hostname vespa-container1 \
  --publish 8080:8080 --publish 19071:19071 \
  vespaengine/vespa

$ vespa deploy -t http://localhost:19071

Target setup:

$ docker run --detach --name vespa2 --hostname vespa-container2 \
  --publish 8081:8080 --publish 19072:19071 \
  vespaengine/vespa

Export source application package

Export files:

$ vespa fetch -t http://localhost:19071

Deploy application package to target

Before deploying, one can make changes to the application package files as needed. Deploy to target:

$ vespa deploy -t http://localhost:19072 application.zip

Data copy from source to target

This pipes the source data directly into vespa feed - another option is to save the data to files temporarily and feed these individually:

$ vespa visit -t http://localhost:8080 | vespa feed - -t http://localhost:8081

Data copy 5%

This is an example on how to use a selection to specify a subset of the documents - here a “random” 5% selection:

$ vespa visit -t http://localhost:8080 --selection 'id.hash().abs() % 20 = 0' | \
  vespa feed - -t http://localhost:8081

Get access log from source

Get the current query access log from the source application (there might be more files there):

$ docker exec vespa1 cat /opt/vespa/logs/vespa/access/JsonAccessLog.default