2024

Changelog | May, 2024

Llama 3 Models on Bedrock

May 14th, 2024

We now support both of the Llama 3 models on AWS Bedrock. You can add them to your workspace from the models page.

GPT-4o Models

May 13th, 2024

OpenAI’s newest GPT-4o models gpt-4o & gpt-4o-2024-05-13 are now available in Vellum and have been added to all workspaces!

GPT 4o

Organization and Workspace Names in Side Nav

May 13th, 2024

You can now view the active Organization’s name and the active Workspace’s name in the left sidebar navigation.

Workspace and Org Name Nav

Run All Button on Evaluation Reports

May 10th, 2024

There’s now a “Run All” button on evaluation reports that runs a test suite for all variants. Instead of running each variant individually, you can now run them all with one click.

Prompt Node Execution

Prompt Node Monitoring

May 9th, 2024

Vellum is now capturing monitoring data for deployed Prompt Nodes. Whenever a deployed Workflow invokes a Prompt Node, it will now show a link displaying the Prompt Deployment label:

Prompt Node Monitoring

Clicking on the link will take you to the Prompt’s executions page, where you can then see all metadata captured for the execution, including the raw request data sent to the model:

Prompt Node Execution

Groq Support

May 9th, 2024

Vellum now has a native integration with the LPU Inference Engine, Groq. All public models on Groq are now available to add to your workspace. Be sure to add your API key as a Secret named GROQ_API_KEY on the API Keys page.

Groq is an LLM hosting provider that offers incredible inference speed for open source LLMs, including the recently released (and very hyped!) Llama 3 model.

Groq Support

Function Calling in Prompt Evaluation

May 8th, 2024

Prompts that output function calls can now be evaluated via Test Suites. This allows you to define Test Cases consisting of the inputs to the prompt, and the expected function call, then assert that there’s a match. For more, check out our docs.

Function Call Prompts

Out-of-Box Ragas Metrics

May 7th, 2024

Test-driven development for your RAG-based LLM pipelines is now easier than ever within Vellum!

Three new Ragas metricsContext Revelancy, Answer Relevance and Faithfulness – are now available out-of-box in Vellum. These can be used within Workflow Evaluations to measure the quality of a RAG system.

For more info, check out our new help center article on Evaluating RAG Pipelines.

Ragas metrics

Subworkflow Node Streaming

May 7th, 2024

Subworkflow Nodes can now stream their output(s) to parent workflows.

This allows you to compose workflows using modular subworkflows without sacrificing the ability to delivery incremental results to your end user.

Note that only nodes immediately prior to Final Output Nodes can have their output(s) streamed.

Subworkflow Streaming

Default Test Case Concurrency in Evaluations

May 4th, 2024

You can now configure how many Test Cases should be run in parallel during an Evaluation. You might lower this value if you’re running into rate limits from the LLM provider, or might increase this value if your rate limits are high.

Test Case Concurrency