Documents

Improve Retrieval Results with Metadata Filtering

Some use-cases of Vellum Search require you to narrow in on a subset of documents prior to searching based on keyword match / semantic similarity. For example, you might want to search across historical conversations for a specific user or only across documents that have specific tags.

You can do this through metadata filtering.

Metadata filtering requires that you:

  1. Provide structured metadata for your documents either upon initial upload or later; and
  2. Provide filter criteria when performing a search.

Let’s see how to do each.

Specifying Metadata

You can specify metadata for documents through both the UI and API.

Through the UI

You can provide metadata upon initial upload.

Metadata Specification

You can also view metadata associated with a document and edit it after it’s been uploaded.

Viewing Metadata

Through the API

You can provide metadata as stringified JSON upon initial upload using the upload Documents API here.

You can also update a document’s metadata after-the-fact using the the Document - Partial Update endpoint here.

Note that in this endpoint, you can simply provide a JSON object (rather than a stringified JSON object as is required during initial upload).

Filtering Against Metadata

You use the search endpoint to perform a search against an index (documented here). This endpoint exposes an options.filters.metadata field for filtering against your provided metadata prior to matching on keywords/semantic similarity.

The syntax of the metadata property supports complex boolean logic and was borrowed from React Query Builder. You can use their demo here to get a feel for the query syntax.

Note that values for fields must be JSON-deserializable. If you’re looking to filter against a string, then the value passed in should contain escaped double quotes.

Example

Suppose you have two documents with the following metadata:

1// Document A
2{
3 "tags": [
4 "customer-facing", "needs-triage", "bug"
5 ],
6 "priority": "high"
7}
8// Document B
9{
10 "tags": [
11 "needs-triage", "bug"
12 ],
13 "priority": "low"
14}

And you wanted to perform a search across all documents that are marked as high priority, customer-facing bugs, you would use the following query:

1{
2 ...,
3 "options": {
4 "filters": {
5 "metadata": {
6 "combinator": "AND",
7 "rules": [
8 {
9 "field": "tags",
10 "operator": "contains",
11 "value": "\"customer-facing\""
12 },
13 {
14 "field": "tags",
15 "operator": "contains",
16 "value": "\"bug\""
17 },
18 {
19 "priority": "tags",
20 "operator": "+",
21 "value": "high"
22 }
23 ],
24 "negated": false
25 }
26 }
27 }
28}