Changelog | April, 2024

Support for Gemini 1.5 Pro

April 30th, 2024

Gemini 1.5 Pro is now available in Vellum.

You can add it to your workspace through the models page.

Improved Monitoring on Workflow Deployments

April 30th, 2024

We’ve added new functionality to the monitoring tab on workflow deployments. It’s now possible to see a breakdown of executions by the release tag used, and further filter down based on a specific release tag.

Release Tag Monitoring

Reusable Evaluation Metrics

April 30th, 2024

Introducing Reusable Evaluation Metrics!

Evaluation Metrics can now be shared across your Test Suites making it easier for you to consistently test and evaluate your Prompt / Workflow quality. Define a suite of custom metrics tailored to your business logic and use-case to save time and ensure standardized evaluation criteria.

Metric Definition

Prompt Blocks

April 30th, 2024

Prompts can now be be broken down into multiple sections and organized using “blocks.” Prompt blocks can be reordered, and toggled on or off.

Splitting your Prompt into multiple blocks can make it easier to navigate complex Prompts and help you focus on iterating on specific sections. Check out the demo below to see how it works!

Filtering Executions on Release Tags

April 29th, 2024

It’s now possible to filter workflow deployment executions by the release tag used when executing the workflow.

This can be very useful for monitoring differences between releases of a deployment. Are you still using an older release in production? Are executions of your new release behaving as expected?

Execution Release Tags

Faster Queries on Workflow Deployment Executions

April 26th, 2024

The executions tab of the workflow deployments page now fetches historical executions much faster. This tab is a great way to see how your customers are actually using your deployments.

In our test for deployments with over 200k executions, data now loads in under 4 seconds instead of the previous 15+ seconds - a 4x speed improvement.

Support for Evaluating External Functions

April 25th, 2024

Vellum’s Evaluation framework can now be used to test arbitrary functions defined in your codebase – not just Prompts and Workflows managed by Vellum.

For example, you might test a prompt chain that lives in your codebase and that’s defined using another third party library. This can be particularly useful if you want to incrementally migrate to Vellum Prompts/Workflows, but ensure that the outputs remain consistent.

For a detailed example of how to use Vellum’s evaluation framework to test external functions, see the python example here

Fireworks Finetuned Models

April 24th, 2024

Vellum now supports models that you’ve fine-tuned on Fireworks AI. You can add your fine-tuned Fireworks model by navigating to the Models page and clicking on the featured model template at the top.

Fireworks Model Template

Note that only the Mistral family of models are supported currently. If there are other base models that you would like to see supported, please reach out to us!

Updated Prompt UI

April 23rd, 2024

We’ve updated the prompt editing UI throughout Vellum. You’ll see the new look in the Prompt Editor, Comparison Mode, Chat Mode, Prompt Nodes in Workflows, and Deployment Overviews. This is the first in a series of exciting improvements to the prompt editing experience that will be rolling out over the coming weeks and months.

New Prompt Block UI

New Upsert Prompt Sandbox Scenario API

April 23rd, 2024

The API for upserting a Prompt Sandbox Scenario now requests and responds with schemas that are more consistent with other Vellum APIs, using discriminated unions for improved type safety. This API is available on version 0.4.0 of our SDKs.

You can find the API documentation for it here.

Function Call Input in Test Cases

April 23rd, 2024

Workflows support Function Call values as a valid output type. Because these function calls often come from models, it is valuable to have evaluations on these workflows that ensure that the function call output is what we expect. Test suites in Vellum now support specifying test case input and evaluation values.

Test Case Function Call

Support for Additional Models

April 19th, 2024

The following models are now available in Vellum:

  • Llama-3-70B-Instruct
  • Llama-3-8B-Instruct
  • Mixtral-8x22B-Instruct-v0.1

They can be added to your workspace through the models page.

Claude 3 Opus Prompt Generators

April 18th, 2024

If you’ve been using GPT models, you’ve likely relied on prompt engineering tips that worked well for those models. But when you apply the same prompts to Claude 3 Opus, you might notice they don’t perform as expected.

This happens because Claude 3 Opus is trained using different methods and data, so the way you prompt it differs from how you would prompt GPT-4. We have some helpful tips in our guide, but as of today, you can convert your prompts even faster…

GPT-4 to Claude 3 Opus Prompts

We’ve released a free tool for that allows you to paste your GPT-4 prompt and get an adapted Claude 3 Opus prompt with suggestions for dynamic variables. You can try the tool here.

GPT-4 to Claude 3 Opus

Claude 3 Opus Prompt Generator

If you don’t have a working GPT-4 prompt but need to create a prompt for Claude 3 Opus from scratch, you can use our second new free tool – “Claude Prompt Generator.”

This generator lets you input your 'prompt objective' and creates a suitable prompt for Claude 3 Opus, with suggestions for dynamic variables that you should include. You can try the tool here.

Max Tokens Warning

April 10th, 2024

When iterating on a Prompt in Vellum’s Prompt Sandbox, you may find that its output stops mid-sentence. This is often because the “Max Tokens” parameter is set too low, or the prompt itself is too long. To help you identify when this is the case, we’ve added a warning that will appear when this max is hit.

Max Tokens Warning

GPT-4 Turbo 04/09/2024 Model

April 9th, 2024

OpenAI’s newest GPT-4 Turbo model gpt-4-turbo-2024-04-09 is now available in Vellum!

Usage Tracking in Prompt Sandbox and Prompt API

April 9th, 2024

We have added the ability for you to track model host usage from the execute-prompt API. This API update is available on version 0.3.21 of our SDKs.

You can also now view model host usage in the Prompt Sandbox by enabling the “Track Usage” toggle in your Prompt Sandbox’s settings.

Usage Tracking Sandbox

New API for Listing a Test Suite’s Test Cases

April 8th, 2024

We have a new API available in beta for listing the Test Cases belonging to a Test Suite at GET /v1/test-suites/{id}/test-cases.

This API is available on version 0.3.20 of our SDKs.

Prompt Editor

April 5th, 2024

Prompt Sandboxes have an entirely new view mode: Prompt Editor. It’s a dedicated space for iterating on a single Variant and Scenario. All of the features you need to work quickly are easily accessible, and collapsible sections make it simple to free up screen space. There are even more improved experiences and exciting coming down the pike for Prompt Editor, and many of those improvements will make their way into Comparison and Chat Modes, as well.

Copy and Paste Logit Bias

April 5th, 2024

You can now copy logit bias parameters from one Prompt Variant and paste them into another Prompt. This works in both Prompt Sandboxes and Prompt Nodes within Workflows.

Logit Bias Copy

Test Suite Improvements

April 4th, 2024

We’ve made some changes to our Test Suite UX. Here’s what’s new:

  • Simplified Creation Process: We’ve broken down the test suite creation into clear, manageable steps, ensuring a more guided and less overwhelming setup.
  • In-Context Editing: You can now edit test suites directly from the Prompt or Workflow evaluations page via a new, sleek modal.
  • Enhanced Error Messaging: We’ve revamped our error messages to be clearer and more actionable. You’ll now receive specific feedback that pinpoints exactly where things went wrong.

Test Suite Improvements

New APIs for Accessing Test Suite Runs

April 3rd, 2024

We have two new APIs available in beta for accessing your Test Suite Runs:

  • A Retrieve endpoint to fetch metadata about the test suite run like it’s current state at GET /v1/test_suite_runs/{id}
  • A List executions endpoint to fetch the results of the test suite run at GET /v1/test_suite_runs/{id}/executions

These APIs are available on version 0.3.15 of our SDKs.