Changelog | May, 2024

Guardrail Workflow Nodes

May 23rd, 2024

You can now use Evaluation Metrics inside of Workflows with the new Guardrail Node! Guardrail Nodes let you run pre-defined evaluation criteria at runtime as part of a Workflow execution so that you can drive downstream behavior based on that metric’s score.

For example, if building a RAG application, you might determine whether the generated response passes some threshold for Ragas Faithfulness and if not, loop around to try again.

Guardrail Nodes

Chat Mode Revamp

May 22th, 2024

Chat Mode in Prompt Sandboxes has received a major facelift! The left side of the new interface will be familiar to anyone using the Prompt Editor, while the rest of the interface retains its functionality with a fresh new look. We’ve also fixed some UX wonk and minor bugs during the restyling process.

Chat Mode Styling Update

Double-Click to Resize Rows & Columns in Prompt Sandboxes

May 22th, 2024

You can now double-click on resizable row and column edges in both Comparison and Chat modes to auto-expand that row/column to its maximum size. If already at maximum size, double-clicking will reset them to their default size. Additionally, in Comparison mode, double-clicking on cell corners will auto-resize both dimensions simultaneously.

Improved Image Support in Chat History Fields

May 22th, 2024

We’ve made several changes to enhance the UX of working with images. Chat History messages now include an explicit content-type selector, making it easier to work with image content using supported models. You can now add publicly-hosted images in multiple ways: by pasting an image URL, pasting a copied image, or dragging and dropping an image from another window.

Additionally, we’ve added limited support for embedded images. You can embed an image directly into the prompt by copy/pasting or dragging/dropping an image file from your computer’s file browser. This method has a 1MB size limit and is an interim solution as we continue to explore image upload and hosting options.

Gemini 1.5 Flash

May 20th, 2024

Google’s Gemini 1.5 Flash model is now available in Vellum. You can add it to your workspace from the models page.

Llama 3 Models on Bedrock

May 14th, 2024

We now support both of the Llama 3 models on AWS Bedrock. You can add them to your workspace from the models page.

GPT-4o Models

May 13th, 2024

OpenAI’s newest GPT-4o models gpt-4o & gpt-4o-2024-05-13 are now available in Vellum and have been added to all workspaces!

GPT 4o

Organization and Workspace Names in Side Nav

May 13th, 2024

You can now view the active Organization’s name and the active Workspace’s name in the left sidebar navigation.

Workspace and Org Name Nav

Run All Button on Evaluation Reports

May 10th, 2024

There’s now a “Run All” button on evaluation reports that runs a test suite for all variants. Instead of running each variant individually, you can now run them all with one click.

Prompt Node Execution

Prompt Node Monitoring

May 9th, 2024

Vellum is now capturing monitoring data for deployed Prompt Nodes. Whenever a deployed Workflow invokes a Prompt Node, it will now show a link displaying the Prompt Deployment label:

Prompt Node Monitoring

Clicking on the link will take you to the Prompt’s executions page, where you can then see all metadata captured for the execution, including the raw request data sent to the model:

Prompt Node Execution

Groq Support

May 9th, 2024

Vellum now has a native integration with the LPU Inference Engine, Groq. All public models on Groq are now available to add to your workspace. Be sure to add your API key as a Secret named GROQ_API_KEY on the API Keys page.

Groq is an LLM hosting provider that offers incredible inference speed for open source LLMs, including the recently released (and very hyped!) Llama 3 model.

Groq Support

Function Calling in Prompt Evaluation

May 8th, 2024

Prompts that output function calls can now be evaluated via Test Suites. This allows you to define Test Cases consisting of the inputs to the prompt, and the expected function call, then assert that there’s a match. For more, check out our docs.

Function Call Prompts

Out-of-Box Ragas Metrics

May 7th, 2024

Test-driven development for your RAG-based LLM pipelines is now easier than ever within Vellum!

Three new Ragas metricsContext Revelancy, Answer Relevance and Faithfulness – are now available out-of-box in Vellum. These can be used within Workflow Evaluations to measure the quality of a RAG system.

For more info, check out our new help center article on Evaluating RAG Pipelines.

Ragas metrics

Subworkflow Node Streaming

May 7th, 2024

Subworkflow Nodes can now stream their output(s) to parent workflows.

This allows you to compose workflows using modular subworkflows without sacrificing the ability to delivery incremental results to your end user.

Note that only nodes immediately prior to Final Output Nodes can have their output(s) streamed.

Subworkflow Streaming

Default Test Case Concurrency in Evaluations

May 4th, 2024

You can now configure how many Test Cases should be run in parallel during an Evaluation. You might lower this value if you’re running into rate limits from the LLM provider, or might increase this value if your rate limits are high.

Test Case Concurrency