Improve Retrieval Results with Metadata Filtering
Some use-cases of Vellum Search require you to narrow in on a subset of documents prior to searching based on keyword match / semantic similarity. For example, you might want to search across historical conversations for a specific user or only across documents that have specific tags.
You can do this through metadata filtering.
Metadata filtering requires that you:
- Provide structured metadata for your documents either upon initial upload or later; and
- Provide filter criteria when performing a search.
Let’s see how to do each.
Specifying Metadata
You can specify metadata for documents through both the UI and API.
Through the UI
You can provide metadata upon initial upload.
![Metadata Specification](https://storage.googleapis.com/vellum-public/help-docs/document_metadata_specification.png)
You can also view metadata associated with a document and edit it after it’s been uploaded.
![Viewing Metadata](https://storage.googleapis.com/vellum-public/help-docs/edit_document_metadata.png)
Through the API
You can provide metadata as stringified JSON upon initial upload using the upload Documents API here.
You can also update a document’s metadata after-the-fact using the the Document - Partial Update
endpoint here.
Note that in this endpoint, you can simply provide a JSON object (rather than a stringified JSON object as is required during initial upload).
Filtering Against Metadata
You use the search
endpoint to perform a search against an index (documented here). This endpoint exposes an options.filters.metadata
field for filtering against your provided metadata prior to matching on keywords/semantic similarity.
The syntax of the metadata
property supports complex boolean logic and was borrowed from React Query Builder. You can use their demo here to get a feel for the query syntax.
Note that values for fields must be JSON-deserializable. If you’re looking to filter against a string, then the value passed in should contain escaped double quotes.
Example
Suppose you have two documents with the following metadata:
And you wanted to perform a search across all documents that are marked as high priority, customer-facing bugs, you would use the following query: