Vespa Cloud and GDPR Compliance

Many applications running Vespa Cloud handle data that would be classified as personal data per the European Union’s (EU) General Data Protection Regulation (GDPR).

When running an application on Vespa Cloud, it is the application owners’ that have the sole responsibility to ensure that all data in their applications is handled in compliance with GDPR requirements. The overall data retention policies for Vespa Cloud ensure that non-application data handled by Vespa is compliant.

GDPR Application Considerations

As an application owner, there are some considerations you could make to help be GDPR-compliant:

  • Limit use/storage of personal data to a minimum as required by your application.
  • If relevant, make sure your schema allows you to easily select relevant data for deletion should you receive a Right To Be Forgotten-request. See Delete Where documentation for details on document selection when deleting documents.
  • Configure document expiry for your documents to ensure expired data is automatically removed.
  • Make sure that you delete your application when it is no longer actively maintained, to avoid any abandoned, unmanaged data.
  • Make sure that any potential data you preserved or copied from your log archive is managed per GDPR requirements as Vespa data retention mechanisms don’t apply to this data.

Vespa Cloud Data Retention Policies

The management of data stored in an application running on Vespa Cloud is the responsibility of the application owner and, as such, Vespa Cloud does not have any retention policy for this data as long as it is stored by the application.

The following data retention policies applies to Vespa Cloud:

  • After a node previously allocated to an application has been deallocated (e.g. due to application being deleted by application owner), all application data will be deleted within four hours.

  • All application log data will be deleted from Vespa servers after no more than 30 days (most often sooner) dependent on log volume, allocated disk resources, etc. PLEASE NOTE: This is the theoretical maximum retention time - see archive guide for how to ensure access to your application logs.

Vespa Cloud Data Details

Below, we dive into the details around how we handle data in regard to GDPR. For most application owners, following the guidelines above should be sufficient, but an understanding of the underlying details can help make informed decisions to ensure your application is compliant.

Data inputs:

  • Vespa Cloud is a content agnostic service with dedicated instances per application. Applications will feed data (documents) into the system according to their application specific schema. The data may or may not include user data depending on application schema and supported use-cases.

  • Application owners typically retrieve data out of their Vespa application by sending HTTP
    requests containing a query for what data to retrieve. This data may or may not include user data depending on application schema and supported use-cases.

  • Requests to Vespa Cloud most often come via an application’s middle tier - there is no direct user (e.g. browser) access to the Vespa application.

  • Authentication to the Vespa Cloud console happens through the external Auth0 identity management service which, in turn, supports various identity providers such as Google and GitHub.

  • All user data collected by the console come from either information entered directly at the user Vespa Cloud sign-up form or meta-data associated with the identity used by the application owners to authenticate against the service, typically e-mail address.

Collected metadata:

  • The user sign-up form (subject to change) collects the following information from the primary
    contact for a tenant:
    • E-mail address
    • Tenant name/ID (usually a variant of organization name)
    • Organization name
    • Contact person name
    • Google or GitHub account ID (e-mail address)
    • Use-case information
  • For all console users associated with a tenant we log the following identity information:
    • Name (if given by user)
    • E-mail address
    • Sign-up date
    • Last login
  • For all incoming requests - either document feed or query requests - we keep standard HTTP access- and connection log. As requests to hosted Vespa typically do not originate from the end user directly, but comes via the application’s middle tier, whether there is actual user data stored in the access logs depends on what data the application passes on in their request to Vespa.

  • All Vespa Cloud access- and connection logs retained by the Vespa team are subject to the Vespa Cloud data retention policies. Detailed overview of data captured in these logs can be found in the Vespa documentation for the access log and connection log.

Purpose for processing data:

  • The sign-up form information is used for invoicing and contact info for support / production issue requests.
  • User ID (typically e-mail address) used for mapping from third party identity provider (e.g. Google) against tenant and application privileges.
  • User data collected directly by applications are per each application’s specific use-cases and not controlled by the hosted Vespa service.
  • Access logs are kept for operational reasons only; to report service health and -usage back to the application owners as well as assist the Vespa team in supporting and operating the service.

Processing performed on data:

  • The data from the sign-up form is stored as entered by the user and used for invoicing and support lookups as appropriate.
  • Access- and connection logs are processed for basic parsing and data extraction when needed for operational or support reasons.
  • Application-specific data is processed and stored on dedicated application instances and will be subject to each application’s configuration and business logic.
  • Access- and connection logs are stored in three separate locations:
    • At each application’s dedicated Vespa HTTP container instance
    • In each application’s dedicated Vespa Cloud log archive as detailed in the Archive Guide.
    • In the Vespa operational team’s log processing and -inspection service.

External parties/system receiving data:

  • The user ID (e-mail address) is stored in the Auth0 identity management system as the user signs in - it is not explicitly shared by Vespa, but by the user through logging in via this system.
  • All other user data is only used for invoicing or whenever Vespa Support Team must reach out
    to the user during outages. Data is never shared with external parties except for these explicit purposes.