Automated Deployments

This document explains the process — and safety mechanisms — which allows application changes and Vespa platform releases to be continuously deployed to production.

Continuous deployment on Vespa

Each application package build which is submitted to Vespa Cloud constitutes an application change which must be tested and, if found healthy, deployed. Similarly, each change to the Vespa platform, by the Vespa team, must be tested and deployed for all the hosted applications. Vespa Cloud automates all these tests and deployments, with features including:

  • chained runs of tests and deployment, with retries of failed jobs;
  • multiple concurrent instances of an application in each zone, upgraded as specified by the user, for testing application changes in a subset of the service before rolling them out further;
  • separation of application and platform changes, making it easier to pinpoint breaking changes (application changes are always allowed when an upgrade fails, as they may be necessary to fix the breakage);
  • cancellation of any current application roll-out, upon submission of a new application revision;
  • throttling of platform upgrades, to detect unhealthy upgrades with a subset of applications; and
  • cancellation of platform upgrades which are found unhealthy, across all applications.
With Continuous Integration (CI) that builds and submits changes to the application as they are committed, Vespa Cloud thus provides full-fledged Continuous Deployment (CD) of all its applications, both for application developers, and for the Vespa team.

Setting up deployment to production

Follow these steps to set up deployment to production of an application. See the Vespa cloud API reference for configuration details.

1. Create deployment.xml

Create a deployment.xml file alongside services.xml to configure where and when to deploy — supported zones.

2. Create system and staging tests

Create at least one system, and one staging test. Use the ones in album-recommendation-java as a starting point. As you add more functionality in your application you should add more tests.

3. Set up a deployment job

In the continuous build tool, set up a job which builds the Vespa application and ships it to the Vespa cloud. Trigger this job by a merge to the main branch in the source control system, where the Vespa application repository is stored.

The job should execute something like the following (modify as needed if not using git):

mvn clean vespa:compileVersion
mvn -P fat-test-application \
  package vespa:submit \
  -Dvespaversion="$(cat target/vespa.compile.version)"  \
  -Drepository=$(git config --get remote.origin.url) \
  -Dbranch=$(git rev-parse --abbrev-ref HEAD) \
  -Dcommit=$(git rev-parse HEAD) \
  -DauthorEmail=$(git log -1 --format=%aE)
Track deployment at https://console.vespa.oath.cloud/tenant/mytenant/application/myapp/deployment - click "Deployment" in the console / refresh page.

If you need a Docker image in which to run this you can use vespaengine/vespa-pipeline.

The deployment job must have access to the API key of your application, configured in pom.xml - example <apiKeyFile>.

Once this is set up, you can make any change to your application in production simply by checking in the change to the application source repository. The rest of this document will explain how deployments are orchestrated, and cover some common topics around continuous deployment in stateful systems.

Deployment orchestration

Vespa applications are compiled against one version of the Vespa Java artifacts, and then deployed to nodes in the cloud where the runtime Vespa version is controlled by the system. This runtime, or platform, version is also continuously updated, independently of application updates. This leads to a number of possible combinations of application packages and platform versions for each application.

Instead of a simple pipeline, Vespa deployments are orchestrated such that any deployment of an application package X to a production cluster with platform version Y is preceded by system and staging tests using the same version pair; and likewise for any upgrade of the platform to version Y of a production cluster running an application package X. Good system and staging tests therefore guard against both unfortunate changes in the application, and in the Vespa platform. System and staging tests are mandatory; see below for how to write them.

When an application or platform change has been successfully verified in a system and staging tests, it is deployed to a production zone. This deployment job may also contain verification tests that need to succeed before the change rolls on to more zones. Good production tests fail if a change is deployed in production which impacts the observed behavior of the application negatively, typically by asserting on application metrics after a delay. If the application is deployed in multiple prod zones, this makes it possible to revert to the old version quickly by shifting traffic to another production zone.

Status of ongoing tests and deployments is found by clicking Deployment in the application view in the console. Examples of advanced deployment configuration which can be set in deployment.xml include:

  • Deployment order and parallelism
  • Time windows with no deployments
  • Grace periods between deployments, and before their tests

System tests

When a Vespa application is built with the fat-test-application profile — mvn package -Pfat-test-application — all Java JUnit5 tests with the @SystemTest annotation, and all their dependencies, are stored in a separate test code artifact, which is submitted to the Vespa cloud together with the application package. During an automated system test, a fresh test deployment is created, and the system tests in the test artifact are run to verify the test deployment behaves as expected. Minimal test:

import ai.vespa.hosted.cd.SystemTest;
import org.junit.jupiter.api.Test;

@SystemTest
public class MiminalSystemTest {
    @Test
    public void testSearchAndFeeding() throws Exception {
        // Test code and assertions here
    }
}

The system test framework in com.yahoo.vespa:tenant-cd contains tools for runtime-dependent authentication against the Vespa deployment to test, and for endpoint discovery. The default behavior of mvn package vespa:deploy is to deploy to the dev environment, and the default behavior of mvn test -Dtest.categories=system is to run system tests against this dev deployment. The tenant and application properties from the pom.xml, together with the instance property which defaults to the current user's username, determines the deployment to create or test. Read more about this here. Tests can also be run from an IDE without additional setup. See album-recommendation-java for sample system tests.

During automated tests, the deployment is instead done to the test environment, with the same application package and Vespa runtime combination as is to be deployed in production; and when the tests are run, the endpoints from the test deployment are used. The test deployment is empty when the test execution begins, and is torn down again when it ends, so documents must be fed as part of the system test. The size of each test cluster is by default reduced to 1 node.

Vespa CD

It's also possible to use local endpoints, e.g., in a docker container on the development machine, in the system tests; specify -Dvespa.test.config=/some/path/to/test/config/json and put a JSON file there, which has the endpoints for each of the clusters defined in services.xml, like:

{
  "localEndpoints": {
    "default": "https://localhost:8080/",
    "container": "https://localhost:8081/"
  }
}

Staging tests

Just like tests with the @SystemTest annotation, tests with the @StagingTest and @StagingSetup annotations are also included in the test artifact. These are run in the automated staging test job, also against a fresh deployment. The goal of a staging test, however, is not to ensure the new deployment satisfies its functional specifications, like in the system test; rather, it is to ensure the upgradeof an existing production cluster is safe, and compatible with the behaviour expected by existing clients.

import ai.vespa.hosted.cd.StagingSetup;
import org.junit.jupiter.api.Test;

@StagingSetup
class StagingSetupTest {

    @Test
    void feedAndSearch() throws IOException {
        // Feed the static staging test documents; staging clusters are always empty when setup is run.
        // Verify documents are searchable and rendered as expected, prior to upgrade.
    }

}
import ai.vespa.hosted.cd.StagingTest;
import org.junit.jupiter.api.Test;

@StagingTest
public class MiminalStagingTest {
    @Test
    public void testSearchAndFeeding() throws Exception {
        // Test code and assertions here
    }
}

A staging test may, for instance, test an upgrade from application package X to X+1, and from platform version Y to Y+1. The staging test then consists of the following steps:

  1. Deploy the initial pair X, Y to the staging environment.
  2. Populate the deployment with data, making it reasonably similar to a production deployment. This is done by the @StagingSetup-annotated code, which typically feeds a set of static documents.
  3. Upgrade the deployment to the target pair X+1, Y+1.
  4. Verify the deployment works as expected after the upgrade. This is done by the @StagingTest-annotated code.
Because the staging tests are there to verify continued service during an upgrade, it is important to hold off changes in the staging tests until new application changes are completely rolled out, and all clients updated. With a significant change, the workflow is to
  1. update the application code and the system and production tests,
  2. deploy the change,
  3. update all clients, and, possibly, the documents of the application, and then
  4. update the staging tests to expect the new functionality, and, possibly, its setup phase to use the new documents.
Staging tests can also be run against a dev deployment, or against a local Vespa deployment, just like system tests. Specify -Dtest.categories=staging-setup for the setup code, and -Dtest.categories=staging for the actual tests. To deploy to a certain platform version, use, e.g., mvn vespa:deploy -DvespaVersion=1.2.3.

The sizes of clusters in staging are by default reduced to 10% of the size specified in services.xml, or at least 2 nodes.

Production deployments

Production jobs run sequentially by default, but can be configured to run in parallel, in deployment.xml; inside each zone, Vespa itself orchestrates the deployment, such that the application may continue to serve, even as subsets of its nodes are down for upgrade. A production deployment job is not complete before the upgrade is complete on all nodes, and the cluster has returned to a stable state. When the Vespa platform is upgraded, each node has to restart with the new runtime; this is typically slower than an application change by the user, which often amounts only to a reconfiguration of smaller parts of the deployment.

Production tests

Finally, tests may also be annotated with the @ProductionTest annotation. These are run against production after deployment, and any failure will stop the roll-out. Make sure the tests do not modify production data in an unintended fashion.

Production tests must be specified with the <test> tag under <prod> in deployment.xml, and it is also possible to add a <delay> between the deployment and the test tag to, e.g., allow time for gathering higher-level metrics which the production test will verify.

To run production tests manually, use an IDE, or run all tests with mvn test -Dtest.categories=production. This, again, assumes there is a dev deployment to run the tests against. To run a production test against a production deployment, specify -Denvironment=prod -Dregion=<region name> to mvn test on the command line, or as a VM argument in your IDE. Be careful not to run system or staging tests against production deployments.

Deleting an application

  1. Remove all instances in the deployment spec, then run the CI job.
  2. Delete the application in the console.
  3. Delete the CI job that builds and pushes new artifacts.

Feature switches and bucket tests

With CD, it is not possible to hold off releasing a feature until it is done, test it manually until convinced it works and then release it to production. What to do instead? The answer is feature switches: release new features to production as they are developed, but include logic which keeps them deactivated until they are ready, or until they have been verified in production with a subset of users.

Bucket tests is the practice of systematically testing new features or behavior for a controlled subset of users. This is common practice when releasing new science models, as they are difficult to verify in test, but can also be used for other features.

To test new behavior in Vespa, use a combination of search chains and rank profiles, controlled by query profiles, where one query profiles correspond to one bucket. These features support inheritance to make it easy to express variation without repetition.

Some times a new feature require incompatible changes to a data field. To be able to CD such changes, it is necessary to create a new field containing the new version of the data. This costs extra resources but less than the alternative: standing up a new system copy with the new data. New fields can be added and populated while the system is live.

It should be mentioned that the need for incompatible changes can be decreased by making the semantics of the fields more precise. E.g., if a field is defined as the "quality" of a document, where a higher number means higher quality, a new algorithm which produces a different range and distribution will typically be an incompatible change. However, if the field is defined more precisely as the average time spent on the document once it is clicked, then a new algorithm which produces better estimates of this value will not be an incompatible change. Using precise semantics also have the advantage of making it easier to understand if the use of the data and its statistical properties are reasonable.

Integration testing

Another challenge with CD is integration testing across multiple services: another service depends on this Vespa application for its own integration testing. There are two ways to provide this: Either create an additional application instance for testing or use test data in the production instance. Using test data in production requires that some thought is given to separating this data from the real data in queries. A separate instance gives complete isolation, but with some additional overhead, and may not produce quite as realistic testing of queries, as those will run only over the test data in the separate instance.

Change field type

Changing a field's type is a breaking change and not allowed. The CD way of changing field type is to do such changes in multiple steps: Add a new field with the new desired type, populate the new field, change all operations to use the new field, then remove the old field. See changing live search definitions.

Note that removing a field from the search definition does not drop the data immediately. This is to prevent accidental data loss from bad configuration — one can revert the change and get the data back. An implication is that you cannot remove a field and add it again immediately with another data type — updates to documents with the new type will fail. The field data will automatically be removed by Vespa as sufficient time has passed.