LLM Output Evaluation with Workflow Metrics

The Workflow evaluation metric allows you to use a Workflow to evaluate outputs, allowing LLM based evaluation for outputs that may be hard to score via traditional methods.

Setting up an Evaluator Workflow

  1. Create a new Workflow Sandbox.
  2. Add one input variable for each Test Suite variable you want to pass to the Workflow. You’ll map these to the Test Suite variables when setting up the metric later, so you can name them anything you want. Examples of variables you may want to include: the output to be evaluated, the desired output, the inputs to the evaluated prompt.
  3. Create a Final Output, set the name to score, and set the output type to Number.
  4. [Optional] Create a Final Output, set the name to normalized_score, and set the output type to Number. If included, normalized_score must be in the interval [0, 1]. This value is used for display purposes such as color coding outputs.
  5. Fill in the logic of your Workflow!
  6. Deploy your Workflow using the Deploy button in the top right corner of the Workflow Sandbox.

Example Evaluator Workflow

Example Evaluator Workflow