Changelog | October, 2024
New Models Added to OpenRouter Integration
October 11th, 2024
We now have the addition of 8 new models integrated into Vellum via our OpenRouter integration:
- Magnum v2 72B - A powerful model designed to achieve prose quality similar to Claude 3 models.
- Nous: Hermes 3 405B Instruct - A frontier-level, full-parameter finetune of the Llama-3.1 405B foundation model.
- NousResearch: Hermes 2 Pro - Llama-3 8B - An upgraded version of Nous Hermes 2 with improved capabilities.
- Nous: Hermes 3 405B Instruct (extended) - An extended context version of Hermes 3 405B Instruct.
- Goliath 120B - A large LLM created by combining two fine-tuned Llama 70B models.
- Dolphin 2.9.2 Mixtral 8x22B - An uncensored model designed for instruction following, conversation, and coding.
- Anthropic: Claude 3.5 Sonnet (self-moderated) - A faster, self-moderated endpoint of Claude 3.5 Sonnet.
- Liquid: LFM 40B MoE - A 40.3B Mixture of Experts (MoE) model for general-purpose AI tasks.
These new models offer a wide range of capabilities, from improved prose quality and instruction following to extended context lengths and specialized tasks like coding. Users can now leverage these models in their Vellum projects, expanding the possibilities for AI-powered applications.
Workflow Edge Type Improvements
October 10th, 2024
In the past, it could be quite difficult to achieve a perfectly straight line between two Nodes in a Workflow with the “smooth-step” edge type, but those days are behind us, friends.
You’ll now see that your edges will automagically snap into straight-line connectors whenever they’re close-to-horizontal.
AutoLayout and AutoConnect for Workflows
October 10th, 2024
Two exciting new features have been added to Workflows — AutoLayout and AutoConnect.
AutoLayout allows you to instantly organize your workflow via algorithm, making it easier than ever to tame even the most-unruly of Workflows.
AutoConnect will automatically connect any unconnected Nodes in your Workflow by creating edges from left to right (more-or-less).
Both of these features are accessible via new buttons in the bottom left toolbar in your Workflow Sandboxes.
In the event that you only want to use AutoConnect or AutoLayout on a specific subset of Nodes, simply drag to select and you’ll see a new temporary toolbar that allows you to do just that.
Reorder Entities in Evaluation Reports
October 9th, 2024
You can now reorder entities in the Evaluation Report table. Simply select the “Reorder” option in the entity column’s menu to adjust the order to your preference.
Online Evaluations for Workflow and Prompt Deployments
October 3rd, 2024
We’re excited to announce the launch of Online Evaluations for Workflow and Prompt Deployments! This new feature allows you to configure Metrics for your Deployments to be evaluated in real-time as they’re executed. Key highlights include:
- Continuous Assessment: Automatically evaluate the quality of your deployed LLM applications as they handle live requests.
- Flexible Configuration: Set up multiple Metrics to assess different aspects of your Deployment’s performance.
- Easy Access to Results: View evaluation results directly in the execution details of your Deployments.
It works by configuring Metrics for your Workflow or Prompt Deployment in the new “Metrics” tab.
Once configured, every execution of your Deployment will be evaluated against these Metrics. You can then view the results alongside the execution details.
For more details on how to get started with Online Evaluations, check out our help documentation.
OpenRouter Model Hosting + WizardLM-2 8x22B
October 2nd, 2024
October 2nd, 2024
We’ve added OpenRouter as a new model host in Vellum! OpenRouter provides access to a wide range of AI models through a single API, expanding the options of models available to our users.
As part of our new OpenRouter integration, we’re pleased to introduce the WizardLM-2 8x22B model to our platform. WizardLM-2 8x22B is known for its strong performance across various natural language processing tasks and is now available for use in your Vellum projects.
Prompt Caching Support for OpenAI
October 2nd, 2024
Today OpenAI introduced Prompt Caching for GPT-4o and o1 models. Subsequent invocations of the same prompt will produce outputs with lower latency and up to 50% reduced costs.
To follow this, we’ve begun capturing cache tokens in Vellum’s monitoring layer. With this update, you’ll now see the number of Prompt Cache Tokens used by a Prompt Deployment’s executions if it’s backed by an OpenAI model. This new monitoring data can be used to help analyze your cache hit rate with OpenAI and optimize your LLM spend.