Evaluate your LLM Workflows with Dozens of Premade Vellum Metrics
Metrics
Vellum comes with a set of Metrics that you can use right away within your Test Suites. We are continually adding new Metrics based on the needs of Vellum users.
Here are the default Metrics currently available within Vellum:
Exact Match
Check that the output is exactly equal to the target.
Returns a score of 1 if the output is an exact match, and 0 otherwise.
Regex Match
Check that the specified regular expression can be found in the output.
Returns a score of 1 if the regular expression matches, and 0 otherwise.
Note that unless the regular expression is explicitly anchored, it can match anywhere in the output.
Semantic Similarity
Check that the output is semantically similar to the target.
Returns a score between 0 and 1, where 1 is a perfect match.
Uses a cross encoder to compute the similarity.
JSON Validity
Check that the output is valid JSON.
Returns a score of 1 if the output is valid JSON, and 0 otherwise.
The Metrics below are Ragas Metrics designed to evaluate your Retrieval Augmented Generation (RAG) systems. For tips on evaluating your RAG pipeline in Vellum, check out this help center article
Ragas - Faithfulness
Faithfulness measures the factual consistency of the generated answer against the given context. It is calculated from answer and retrieved context. The answer is scaled to (0,1) range. Higher the better.
For details, see: https://docs.ragas.io/en/latest/concepts/metrics/faithfulness.html
Ragas - Answer Relevance
The Metric, Answer Relevancy, focuses on assessing how pertinent the generated answer is to the given prompt. A lower score is assigned to answers that are incomplete or contain redundant information and higher scores indicate better relevancy.
For details, see: https://docs.ragas.io/en/latest/concepts/metrics/answer_relevance.html
Ragas – Context Relevancy
This Metric gauges the relevancy of the retrieved context, calculated based on both the question and contexts. The values fall within the range of (0, 1), with higher values indicating better relevancy.
For details, see: https://docs.ragas.io/en/v0.1.5/concepts/metrics/context_relevancy.html