For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
BlogLog InRequest Demo
HomeProductDevelopersSelf-HostingChangelog
HomeProductDevelopersSelf-HostingChangelog
  • Getting Started
    • Overview
  • Agent Builder
    • Using the Agent Builder
  • Prompts
    • Prompt Engineering
    • Collaboration
    • Custom Models
    • Multimodality
    • Prompt Caching
  • Workflows
    • Introduction
    • Experimenting
    • Integrating
    • Function Calling
  • Evaluation & Test Suites
    • Quantitative Evaluation
    • Evaluating RAG Pipelines
    • Online Evaluations
  • Metrics
    • Out of the Box Metrics
    • Custom Metrics
    • Reusing Metrics in Test Suites
  • Deployments
    • Deployment Lifecycle Management
    • Observability in Production
    • Environments
    • Release Tags
    • Release Reviews
  • Monitoring
    • Monitoring Production Trends
    • Track Workflow Execution Costs
    • Datadog Integration
    • Webhook Integration
    • Execution URLs
  • Documents
    • Uploading Documents
    • Integrating w/ Search API
    • Metadata Filtering
  • Security
    • Data Privacy and Storage
    • HMAC Authentication
    • Role-Based Access Control (RBAC)
    • Static IPs
  • Organizations
    • Manage Organization Access
    • Data Retention Policies
LogoLogo
BlogLog InRequest Demo
On this page
  • Metrics
  • Exact Match
  • Regex Match
  • Semantic Similarity
  • JSON Validity
  • Ragas - Faithfulness
  • Ragas - Answer Relevance
  • Ragas – Context Relevancy
Metrics

Evaluate your LLM Workflows with Dozens of Premade Vellum Metrics

Was this page helpful?
Previous

Create Custom Reusable Metrics for LLM Evaluation

Next
Built with

Metrics

Vellum comes with a set of Metrics that you can use right away within your Test Suites. We are continually adding new Metrics based on the needs of Vellum users.

Here are the default Metrics currently available within Vellum:

Exact Match

Check that the output is exactly equal to the target.

Returns a score of 1 if the output is an exact match, and 0 otherwise.

Regex Match

Check that the specified regular expression can be found in the output.

Returns a score of 1 if the regular expression matches, and 0 otherwise.

Note that unless the regular expression is explicitly anchored, it can match anywhere in the output.

Semantic Similarity

Check that the output is semantically similar to the target.

Returns a score between 0 and 1, where 1 is a perfect match.

Uses a cross encoder to compute the similarity.

JSON Validity

Check that the output is valid JSON.

Returns a score of 1 if the output is valid JSON, and 0 otherwise.


The Metrics below are Ragas Metrics designed to evaluate your Retrieval Augmented Generation (RAG) systems. For tips on evaluating your RAG pipeline in Vellum, check out this help center article

Ragas - Faithfulness

Faithfulness measures the factual consistency of the generated answer against the given context. It is calculated from answer and retrieved context. The answer is scaled to (0,1) range. Higher the better.

For details, see: https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/faithfulness/

Ragas - Answer Relevance

The Metric, Answer Relevancy, focuses on assessing how pertinent the generated answer is to the given prompt. A lower score is assigned to answers that are incomplete or contain redundant information and higher scores indicate better relevancy.

For details, see: https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/answer_relevance/

Ragas – Context Relevancy

This Metric gauges the relevancy of the retrieved context, calculated based on both the question and contexts. The values fall within the range of (0, 1), with higher values indicating better relevancy.

For details, see: https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/context_recall/