Enhance AI Model Accuracy with Vellum’s Observability Tools

After using a Prompt in prompt with your application, you’ll likely wonder what the contents of the requests were and whether the model provided reasonable responses. A big benefit of using Vellum’s proxy layer via Prompt Deployments is that we automatically keep track of every request and the details you need to debug issues.

Prompt Deployment Completions

You can go to the “Completions” tab of any prompt Deployment to see the requests that were made. Columns can be hidden, shown, filtered, and sorted.

Completions

As you apply filters/sorting, the page’s url is updated. You can bookmark this link or share with others to return to the same view later.

Completion Columns

Capturing End-User Feedback

Vellum has the concept of “Completion Actuals” where you can say, for a given request, what the output should have been and what its quality was. This is particularly using for monitoring quality, and later, for usage as training data to fine-tune your own custom model.

Capturing Actuals works best if your end users have some mechanism (usually via a UI) to provide feedback on the output of the model.

For example, you’re creating an AI Recruiting Email Generator for recruiters where they can use AI to generate rough draft, you might:

  1. Infer that if they hit “Send” without making edits, the quality was great (a 1.0 in Vellum)
  2. Infer that if they hit “Discard” then the quality was bad (a 0.0 in Vellum)
  3. Or you might have a 5-star “Rating” system that they can use to explicitly provide feedback on the quality of the output.

In all cases, you could integrate with Vellum’s Completion Actuals API to capture this feedback. You can find a code snippet for this in a Prompt Deployment’s “Overview” tab. It’ll look like this:

Deployment Actuals

Note that you reference a Completion made previously by the ID that Vellum generates and returns, or by some UUID that you track and provide via the “external_id” property.

Built with