Llama 3 Models on Bedrock
May 14th, 2024
We now support both of the Llama 3 models on AWS Bedrock. You can add them to your workspace from the models page.
GPT-4o Models
May 13th, 2024
OpenAI’s newest GPT-4o models gpt-4o
& gpt-4o-2024-05-13
are now available in Vellum and have been added to all workspaces!
Organization and Workspace Names in Side Nav
May 13th, 2024
You can now view the active Organization’s name and the active Workspace’s name in the left sidebar navigation.
Run All Button on Evaluation Reports
May 10th, 2024
There’s now a “Run All” button on evaluation reports that runs a test suite for all variants. Instead of running each variant individually, you can now run them all with one click.
Prompt Node Monitoring
May 9th, 2024
Vellum is now capturing monitoring data for deployed Prompt Nodes. Whenever a deployed Workflow invokes a Prompt Node, it will now show a link displaying the Prompt Deployment label:
Clicking on the link will take you to the Prompt’s executions page, where you can then see all metadata captured for the execution, including the raw request data sent to the model:
Groq Support
May 9th, 2024
Vellum now has a native integration with the LPU Inference Engine, Groq. All public models on Groq are now available to add to your workspace. Be sure to add your API key as a Secret named GROQ_API_KEY
on the API Keys page.
Groq is an LLM hosting provider that offers incredible inference speed for open source LLMs, including the recently released (and very hyped!) Llama 3 model.
Function Calling in Prompt Evaluation
May 8th, 2024
Prompts that output function calls can now be evaluated via Test Suites. This allows you to define Test Cases consisting of the inputs to the prompt, and the expected function call, then assert that there’s a match. For more, check out our docs.
Out-of-Box Ragas Metrics
May 7th, 2024
Test-driven development for your RAG-based LLM pipelines is now easier than ever within Vellum!
Three new Ragas metrics – Context Revelancy, Answer Relevance and Faithfulness – are now available out-of-box in Vellum. These can be used within Workflow Evaluations to measure the quality of a RAG system.
For more info, check out our new help center article on Evaluating RAG Pipelines.
Subworkflow Node Streaming
May 7th, 2024
Subworkflow Nodes can now stream their output(s) to parent workflows.
This allows you to compose workflows using modular subworkflows without sacrificing the ability to delivery incremental results to your end user.
Note that only nodes immediately prior to Final Output Nodes can have their output(s) streamed.
Default Test Case Concurrency in Evaluations
May 4th, 2024
You can now configure how many Test Cases should be run in parallel during an Evaluation. You might lower this value if you’re running into rate limits from the LLM provider, or might increase this value if your rate limits are high.