Vespa Cloud provides a set of machine-learned models that you can use in your applications. These models will always be available on Vespa Cloud and are frozen models. You can also bring your own embedding model, by deploying it in the Vespa application package.
You specify to use a model provided by Vespa Cloud by setting the model-id
attribute where you specify a model config. For example, when configuring the
Huggingface embedder
provided by Vespa, you can write:
With this, your application will have support for text embedding inference for both queries and documents. Nodes that have been provisioned with GPU acceleration, will automatically use GPU for embedding inference.
Models on Vespa model hub are selected open-source embedding models with great performance. See the Massive Text Embedding Benchmark (MTEB) Leaderboard for details. These embedding models are useful for retrieval (semantic search), re-ranking, clustering, classification, and more.
These models are available for the Huggingface Embedder type="hugging-face-embedder"
.
All these models supports both mapping from string
or array<string>
to tensor representations.
The output tensor cell-precision
can be <float>
or <bfloat16>
.
model-id: e5-small-v2 | |
---|---|
The smallest and most cost-efficient model from the E5 family. | |
Tensor definition | tensor<float>(x[384]) or tensor<float>(p{},x[384]) |
distance-metric | angular |
License | MIT |
Source | https://huggingface.co/intfloat/e5-small-v2 |
Language | English |
Comment | See using E5 models |
model-id: e5-base-v2 | |
The base model of the E5 family. | |
Tensor definition | tensor<float>(x[768]) or tensor<float>(p{},x[768]) |
distance-metric | angular |
License | MIT |
Source | https://huggingface.co/intfloat/e5-base-v2 |
Language | English |
Comment | See using E5 models |
model-id: e5-large-v2 | |
The largest model of the E5 family, at time of writing, this is the best performing embedding model on the MTEB benchmark. | |
Tensor definition | tensor<float>(x[1024]) or tensor<float>(p{},x[1024]) |
distance-metric | angular |
License | MIT |
Source | https://huggingface.co/intfloat/e5-large-v2 |
Language | English |
Comment | See using E5 models |
model-id: multilingual-e5-base | |
The multilingual model of the E5 family. Use this model for multilingual queries and documents. | |
Tensor definition | tensor<float>(x[768]) or tensor<float>(p{},x[768]) |
distance-metric | angular |
License | MIT |
Source | https://huggingface.co/intfloat/multilingual-e5-base |
Language | Multilingual |
Comment | See using E5 models |
The E5 family uses keywords with the input to differentiate query and document side embedding.
The query text should be prefixed with "query: ". In this example the original user query is how to format e5 queries.
The same technique also must be applied for document side embedding inference. The input text should be prefixed with "passage: "
The above example reads a chunks
field of type array<string>
,
and prefixes each item with "passage: ", followed by the concatenation
of the title and the item chunk (_).
See execution value example.
These models are available for the Bert Embedder
type="bert-embedder"
:
Note bert-embedder requires both transformer-model
and tokenizer-vocab
.
model-id: minilm-l6-v2 | |
---|---|
A small, fast sentence-transformer model. | |
Tensor definition | tensor<float>(x[384]) or tensor<float>(p{},x[384]) |
distance-metric | angular |
License | apache-2.0 |
Source | https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 |
Language | English |
model-id: mpnet-base-v2 | |
A larger, but better than minilm-l6-v2 sentence-transformer model. | |
Tensor definition | tensor<float>(x[768]) or tensor<float>(p{},x[768]) |
distance-metric | angular |
License | apache-2.0 |
Source | https://huggingface.co/sentence-transformers/all-mpnet-base-v2 |
Language | English |
These are embedder implementations that tokenize text and embed string to the vocabulary identifiers. These are most useful for creating the tensor inputs to re-ranking models that takes both the the query and document token identifiers as input. See example in the transformer ranking sample app.
model-id: bert-base-uncased | |
---|---|
A vocabulary text (vocab.txt) file on the format expected by WordPiece: A text token per line. | |
License | apache-2.0 |
Source | https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 |
model-id: e5-base-v2-vocab | |
A tokenizer.json configuration file on the format expected by
HF tokenizer.
This tokenizer configuration can be used with e5-base-v2 , e5-small-v2 and e5-large-v2 .
| |
License | MIT |
Source | https://huggingface.co/intfloat/e5-base-v2 |
Language | English |
model-id: multilingual-e5-base-vocab | |
A tokenizer.json configuration file on the format expected by
HF tokenizer.
This tokenizer configuration can be used with multilingual-e5-base-vocab .
| |
License | MIT |
Source | https://huggingface.co/intfloat/multilingual-e5-base |
Language | Multilingual |
These are global significance models that can be added to significance element in services.xml.
model-id: significance-en-wikipedia-v1 | |
---|---|
This significance model was generated from English Wikipedia dump data from 2024-08-01. Available in Vespa as of version 8.426.8. | |
License | Creative Commons Attribution-ShareAlike 3.0 Unported (CC BY-SA 3.0) License. |
Source | https://data.vespa-cloud.com/significance_models/significance-en-wikipedia-v1.json.zst |
Language | English |
You can also specify both a model-id
, which will be used on Vespa Cloud,
and a url/path, which will be used on self-hosted deployments:
<transformer-model model-id="minilm-l6-v2" path="myAppPackageModels/myModel.onnx"/>
This can be useful for example to create an application package which uses models from Vespa Cloud for production and a scaled-down or dummy model for self-hosted development.
Specifying a model-id can be done for any
config field of type model
,
whether the config is from Vespa or defined by you.