Changelog | October, 2024

Online Evaluations for Workflow and Prompt Deployments

October 3rd, 2024

We’re excited to announce the launch of Online Evaluations for Workflow and Prompt Deployments! This new feature allows you to configure Metrics for your Deployments to be evaluated in real-time as they’re executed. Key highlights include:

  • Continuous Assessment: Automatically evaluate the quality of your deployed LLM applications as they handle live requests.
  • Flexible Configuration: Set up multiple Metrics to assess different aspects of your Deployment’s performance.
  • Easy Access to Results: View evaluation results directly in the execution details of your Deployments.

It works by configuring Metrics for your Workflow or Prompt Deployment in the new “Metrics” tab.

Configure Metrics for use in Online Evals

Once configured, every execution of your Deployment will be evaluated against these Metrics. You can then view the results alongside the execution details.

See results of Metrics alongside Execution details

For more details on how to get started with Online Evaluations, check out our help documentation.

OpenRouter Model Hosting + WizardLM-2 8x22B

October 2nd, 2024

We’ve added OpenRouter as a new model host in Vellum! OpenRouter provides access to a wide range of AI models through a single API, expanding the options of models available to our users.

As part of our new OpenRouter integration, we’re pleased to introduce the WizardLM-2 8x22B model to our platform. WizardLM-2 8x22B is known for its strong performance across various natural language processing tasks and is now available for use in your Vellum projects.

OpenRouter Model Host

Prompt Caching Support for OpenAI

October 2nd, 2024

Today OpenAI introduced Prompt Caching for GPT-4o and o1 models. Subsequent invocations of the same prompt will produce outputs with lower latency and up to 50% reduced costs.

To follow this, we’ve begun capturing cache tokens in Vellum’s monitoring layer. With this update, you’ll now see the number of Prompt Cache Tokens used by a Prompt Deployment’s executions if it’s backed by an OpenAI model. This new monitoring data can be used to help analyze your cache hit rate with OpenAI and optimize your LLM spend.