Automated Deployments

This document describes Vespa Cloud's continuous deployment system for Vespa applications, and is required reading for getting to production, or getting to production with Java, and assumes the reader is already familiar with the developer guide, and has made their first manual deployment to Vespa Cloud using one of the getting started guides.

An easy way to get started is by cloning an existing application.

The desired state for all Vespa Cloud users should be to have a CI job which automatically builds and submits their application, on all source changes. Vespa Cloud then deploys users' applications to production using a safe procedure where changes are first tested; and then rolled out to production zones, with optional post-deployment verification. Upgrades of the Vespa platform follows the same deployment process as for new application revisions. System and staging tests are strongly recommended for a production application in Vespa Cloud. The tests can be basic HTTP tests, or more advanced Java JUnit 5 tests. See Vespa application system tests for how to write the different types of tests.

The deployment process for an application is configured in deployment.xml, placed alongside services.xml in the application source, and may include arbitrary dependencies between instances, deployments to different zones within an instance, configured delays, or production verification tests. It is also possible to control when Vespa upgrades and application revision are allowed to roll out.

The deployment status is visualized in the Vespa Cloud console, and it is also possible to trigger and pause jobs, and manipulate what is currently deploying in that UI. Additionally, a history of recent deployments, including logs, can be found there. For illustration purposes the most trivial deployment pipeline is shown here:

Minimal deployment pipeline

The application has one instance named default. The 4th revision of the application package has just been submitted, and the current platform version in the production zone is 7.513.6. The system tests are thus run at (revision, platform) = (#4, 7.513.6), while the staging tests verifies the change (#3, 7.513.6) -> (#4, 7.513.6). When these complete successfully, the change will roll out to aws-us-east-1c.

The next section of this guide describes how submissions to production differ from manual deployments to dev/perf zones, used for application development and sizing, and things to consider when setting up a CI job for the application source.

Then, the deployment process of the CD system is explained in detail.

Next, there's a section with tips on how to develop and maintain a production application, and some troubleshooting.

Finally, the last section shows how to delete parts of a production application, or the whole of it.

A deployment badge is available from the console's deployment view:

vespa-team.vespacloud-docsearch.default overview

Production deployments

Deployments to the production environment in Vespa Cloud can only be done by its automated CD system, unlike deployments to dev/perf, which are initiated directly by the developer. This is to ensure proper testing of application code before it enters production, to catch broken code before deployment. A test suite containing at least one system test, staging setup, and staging test should therefore accompany the application package upon submissions to the CD system. The getting-to-production guides show how to generate this application-test.zip for their respective tools, in addition to the application.zip.

To differentiate the upload of this application and test package pair from direct deployments, we use the term submit. Each new submission is assigned an increasing build number by Vespa Cloud, which can be used to track the roll-out of the new package to the instances and their zones, as defined in deployment.xml. Likewise, any Vespa platform upgrade for an application rolls out in the same manner, and can be tracked via its version number.

Submit that includes Java code

Vespa is backwards compatible within major versions (and major versions rarely change). This means that Java code compiled against an older version of Vespa APIs can always be run in the Vespa Cloud on the same major version. However, if the application package is compiled against a newer API version, and then deployed to an older runtime version in production, it may fail. See how to avoid this minor possibility in the guide for getting to production with Java.

Another caveat for Java users is that the application-test.zip test bundle will only include files Java cares about. To include test documents, etc., they must be located under src/test/resources, and they should be referenced with getClass().getClassLoader().getResourceAsStream(...), as demonstrated in VespaDocTester.java.

Continuous deployment

Submission of application packages should be done by a build job, triggered regularly and/or on changes to the application package source, to achieve continuous deployment.

Refer to this example for an example with Java, using GitHub actions.

Notice the trick of passing sourceUrl to point to the source revision. This is displayed in the console and makes it possible to keep track of what exactly is being deployed.

Deployment keys

Even though users can submit applications, for CI build jobs application keys should be preferred. The application key has fewer privileges and is intended for automatic application submission. You can upload or create new Application keys in the console, and store them as a secret in the repository; see the GitHub actions example.

Some services like Travis CI do not accept multi-line values for Environment Variables in Settings. A workaround is to use the output of

$ openssl base64 -A -a < mykey.pem && echo

in a variable, say VESPA_MYAPP_API_KEY, in Travis Settings. VESPA_MYAPP_API_KEY is exported in the Travis environment, example output:

Setting environment variables from repository settings
$ export VESPA_MYAPP_API_KEY=[secure]

Then, before deploying/submitting to Vespa Cloud, regenerate the key value:

$ MY_API_KEY=`echo $VESPA_MYAPP_API_KEY | openssl base64 -A -a -d`

and use ${MY_API_KEY} in the deploy/submit command.

Deployment orchestration

The deployment orchestration implemented in Vespa Cloud is much more flexible than a simple pipeline. deployment.xml can describe almost arbitrary dependencies between deployments to production zones, production verification tests, and configured delays; by ordering these in parallel and serial blocks of steps. On a higher level, instances can also depend on each other in the same way. This makes it easy to configure a deployment process which gradually rolls out changes to increasingly larger subsets of production nodes, as confidence grows with successful production verification tests.

The first production deployments are also guarded by system and staging tests; this is enforced by the CD system, which adds these tests to the deployment specification if they're not present. System and staging tests in Vespa Cloud must always be successful before the corresponding revision and platform combination (system test) or upgrade (staging test) is automatically deployed to any production zone. Review the deployment status at the start of this guide for an example, read more about tests in the Vespa application system tests guide, how they are run in Vespa Cloud below, and check out the deployment.xml reference to learn how to configure the deployment process.

Deployments run sequentially by default, but can be configured to run in parallel. Inside each zone, Vespa Cloud orchestrates the deployment, such that the change is applied without disruption to read or write traffic against the application. A production deployment in a zone is complete when the new configuration is active on all nodes. Most changes are applied to running nodes, which makes this a fast process. If restarts are needed, e.g., during platform upgrades, these will happen automatically and safely as part of the deployment. When this is necessary deployments will take longer to complete.

It is also possible to block specific deployments during certain windows throughout the week, e.g., to avoid rolling out changes during peak hours; see block-change.

Deployment verification

System and staging tests require dedicated tests deployments to run against—running them against a production cluster would be disastrous, as these tests typically clear all existing documents as a pre-step. In Vespa Cloud, these test clusters are stood up only when tests are to be run, and in dedicated environments where clusters are automatically downsized, to reduce hardware costs. The test deployments have the usual data plane protection, but the test code has access to their endpoints.

Status and logs of ongoing tests can be found under Deployment in the application view in the console.

System tests

When a system test is run, the application is deployed to a zone in the test environment. The system test suite is then run against the endpoints of the test deployment. The test deployment is empty when the test execution begins. The application package and Vespa platform version is the same as that to be deployed to production; however, the size of each test cluster is reduced to 1 node.

Vespa CD

Staging tests

A staging test verifies the upgrade from application package Xold to Xnew, and from Vespa platform version Yold to Ynew. The staging test then consists of the following steps:

  1. The application at revision Xold is deployed on platform version Yold, to a zone in the staging environment.
  2. The staging setup test code is run, typically making the cluster reasonably similar to a production cluster.
  3. The test deployment is then upgraded to application revision Xnew and platform version Ynew.
  4. Finally, the staging test test code is run, to verify the deployment works as expected after the upgrade.

Note that one or both of the application revision and platform may be upgraded during the staging test, depending on what upgrade scenario the test is run to verify. These changes are usually kept separate, but in some cases is necessary to allow them to roll out together.

The sizes of clusters in the staging environment are by default reduced to 10% of the size specified in services.xml, or at least 2 nodes.

Production tests

Production tests can be configured in deployment.xml. Unlike the system and staging test, production tests do not have access to the Vespa endpoints, for security reasons. Dependent steps in the release pipeline will stop if the tests fail, but upgraded regions will remain on the version where the test failed.

Maintaining an application

This section contains a loose collection of tips on how to further develop, and maintain, a Vespa Cloud application.

Developing tests

When using Vespa Cloud, system and tests are most easily developed using a test deployment in a dev zone to run the tests against. Refer to general testing guide for a discussion of the different test types, and the basic HTTP tests or Java JUnit tests reference for how to write the relevant tests.

If using the Vespa CLI to deploy and run basic HTTP tests, the same commands as in the test reference will just work, provided the CLI is configured to use the cloud target.

Running Java tests

With Maven, and Java Junit tests, some additional configuration is required, to infuse the test runtime on the local machine with API and data plane credentials:

$ mvn test \
  -D test.categories=system \
  -D dataPlaneKeyFile=data-plane-private-key.pem -D dataPlaneCertificateFile=data-plane-public-cert.pem \
  -D apiKey="$API_KEY"

The apiKey is used to fetch the dev instance's endpoints. The data plane key and certificate pair is used by ai.vespa.hosted.cd.Endpoint to access the application endpoint. See the Vespa Cloud API reference for details on configuring Maven invocations. Note that the -D vespa.test.config argument is gone; this configuration is automatically fetched from the Vespa Cloud API—hence the need for the API key.

When running Vespa self-hosted like in the sample application, no authentication is required by default, to either API or container, and specifying a data plane key and certificate will instead cause the test to fail, since the correct SSL context is the Java default in this case.

Make sure the TestRuntime is able to start. As it will init an SSL context, make sure to remove config when running locally, in order to use a default context. Remove properties from pom.xml and IDE debug configuration.

Developers can also set these parameters in the IDE run configuration to debug system tests:

-D test.categories=system
-D tenant=my_tenant
-D application=my_app
-D instance=my_instance
-D apiKeyFile=/path/to/myname.mytenant.pem
-D dataPlaneCertificateFile=data-plane-public-cert.pem
-D dataPlaneKeyFile=data-plane-private-key.pem

Monitoring

In addition to a strong test suite, it is recommended to set up monitoring of your application, to identify and react to issues quickly.

Validation overrides

Vespa Cloud has another safety mechanism to avoid accidentally damaging production clusters, in addition to tests: potentially destructive application changes, such as removing fields, are disallowed by default. Such changes require a validation override as part of the application package, to ensure it is really intended.

Feature switches and bucket tests

With continuous deployment, it is not practical to hold off releasing a feature until it is done, test it manually until convinced it works and then release it to production. Read more about feature switches and bucket tests.

Integration testing

Another challenge with continuous deployment is integration testing across multiple services: Another service depends on this Vespa application for its own integration testing. There are two ways to provide this: Either create an additional application instance for testing or use test data in the production instance. Using test data in production requires that some thought is given to separating this data from the real data in queries. A separate instance gives complete isolation, but with some additional overhead, and may not produce quite as realistic testing of queries, as those will run only over the test data in the separate instance.

Deleting an application

To delete an application, use the console, navigate to the deploy view at http://console.vespa.ai/tenant/tenant-name/application/app-name/prod/deploy

WARNING! Data will be unrecoverable.

delete production deployment

When the application deployments are deleted, delete the application

  1. Delete the application in the console.

  2. Remove the CI job that builds and submits application packages, if any.

Deleting an instance / region

To remove an instance or a deployment to a region from an application, modify deployment.xml and validation-overrides.xml.

WARNING! Following these steps will remove production instances or regions and all data within them. Data will be unrecoverable.

  1. Remove the region from the <prod> element, or the instance from the <deployment> element in the deployment.xml:

    <deployment version="1.0">
      <prod>
        <region>aws-us-east-1c</region>
        <!-- Removing the deployment in the region 'aws-eu-west-1a' -->
        <!--region>aws-eu-west-1a</region-->
      </prod>
    </deployment>
    
  2. Add or modify validation-overrides.xml, allowing Vespa Cloud to remove production instances:

    <validation-overrides>
        <allow until="2021-03-01" comment="Remove region/instance ...">deployment-removal</allow>
        <!-- If the region was part of a global endpoint/instance had a global endpoint: -->
        <allow until="2021-03-01" comment="Remove region/instance ...">global-endpoint-change</allow>
    </validation-overrides>
    
  3. Build and submit the application package.