Input Images and Documents in Prompts and Workflows

Leverage the power of multimodal models to process both natural language and files within your LLM-applications using Vellum.

Vellum supports multimodal inputs for supported models from OpenAI, Anthropic, Google, Meta, and more via API and in the Vellum UI.

Images in Vellum UI

Images in Vellum UI

Read on to get started!

Using Images in the UI

Vellum supports images as inputs to both your Prompts and Workflows. In either Sandbox, you can add images inside of scenario Chat History messages.

Begin by selecting a model that supports images in your Prompt. In Workflows, you can set the model within a Prompt Node. Before you do, you’ll want to add a Chat History variable as an input to your Workflow.

Next, add a Chat History block and some messages to your template so you can drag images within them.

Here’s how to do it:

  • In the Prompt Sandbox, add a Chat History input variable by adding a new variable of type “Chat History”. This will allow you to drag and drop PDF files directly into Content Blocks.
  • In the Workflow Sandbox, add a Chat History input variable by adding a new variable of type “Chat History”. This will allow you to drag and drop PDF files directly into Content Blocks.

Now you’re ready to add images! Drag and drop a valid image into a Chat History message block. This converts the Chat Message into a draggable array that can be easily re-ordered and can contain multiple image and/or text items.

Valid image URLs: Images must have their absolute path including the image filetype in their URL and must be publicly visible (example: https://storage.googleapis.com/vellum-public/help-docs/release-tags-on-deploy.png)

Images in Prompt Sandbox

Images in Prompt Editor

Images in Prompt Scenarios

Images in Prompt Comparison Mode

Images in Workflow Scenarios

Images in Workflow Scenarios

Once you’ve added in your image, you can configure its settings by clicking the small gear icon to the right of the image. Here you’ll be able to adjust things like the Image Detail which can have a big impact on token usage (more on that below).

Image Configuration Steps

You can also switch out an image you’ve dragged in for a new one by updating the image URL in the settings.

Image Configuration Modal

Image Specifications

Here are some important model specifications for OpenAI vision models to keep in mind as you’re incorporating images into your Prompts and Workflows:

  • Number of Images: No set limit

    There is no fixed number here but token and image size restrictions still apply to determine the number of images that can be sent

  • Image Size: Less than 32MB

    For prompts and workflows with multiple images, the combined image size should not exceed this limit

  • Supported Image Formats:

    • JPEG (.jpeg / .jpg)
    • PNG (.png)
    • Non-animated GIF (.gif)
    • WEBP (.webp)

Sending Images via API

Here’s a short example on how to send an image to the model using Vellum’s Python Client SDK:

1image_link = "https://storage.googleapis.com/vellum-public/help-docs/add_prompt_block_button.png"
2response = client.execute_prompt(
3 prompt_deployment_name="github-loom-demo",
4 inputs=[
5 PromptDeploymentInputRequest_ChatHistory(
6 name="chat_history",
7 value=[
8 ChatMessageRequest(
9 role=ChatMessageRole.USER,
10 content={
11 "type": "ARRAY",
12 "value": [
13 {"type": "STRING", "value": "What's in this image?"},
14 {"type": "IMAGE", "value": {"src": image_link}},
15 ],
16 },
17 )
18 ],
19 type=VellumVariableType.CHAT_HISTORY,
20 ),
21 ],
22)
23print(response.outputs[0].value)

To see how to do this with other SDKs or cURL, see our API Explorer for the /execute-prompt endpoint or /execute-workflow endpoint.

Image Detail and Token Usage

When working with image models, token usage is an important factor to consider. For OpenAI vision models, the two main factors for token count are the image’s size and its detail setting.

There are three possible settings for the image detail: low, high, or auto

The low setting processes a lower resolution 512x512 version of the image. With low, the response time is faster. This setting is great when the fine details of the image are not required.

The high setting on the other hand is the high resolution mode. In this mode, the input image is tiled and a detailed segment is created from it. Token usage is calculated based on the number of these segments which correlates to the image size. High resolution allows for a more comprehensive interpretation of your image.

You can learn more about the image detail setting and OpenAI vision models on their site

Using Documents (PDFs) in the UI

Vellum also supports uploading and processing documents such as PDFs in your Prompts and Workflows. This enables you to leverage the power of LLMs for document analysis, summarization, extraction, and more.

Uploading PDFs in Vellum UI

Uploading PDFs in Vellum UI

How to Add PDFs

  • In the Prompt Sandbox, add a Chat History input variable by adding a new variable of type “Chat History”. This will allow you to drag and drop PDF files directly into Content Blocks.
  • In the Workflow Sandbox, add a Chat History input variable by adding a new variable of type “Chat History”. This will allow you to drag and drop PDF files directly into Content Blocks.

Once the Chat History variable is set up, simply drag and drop your PDF file into a message. The document will be uploaded and available for the model to process as part of your prompt or workflow scenario.

PDFs must be publicly accessible and under 32MB in size. For best results, use standard, non-password-protected PDF files.

What Happens Next?

After uploading, the PDF will appear as an attachment in your Chat History message. You can add text alongside your document, reorder items, or remove the file as needed. The model will receive the document as part of the input and can answer questions, summarize, or extract information from it.

Sending PDFs via API

Here’s a short example on how to send a PDF to the model using Vellum’s Python Client SDK:

1pdf_link = "https://storage.googleapis.com/vellum-public/help-docs/sample.pdf"
2response = client.execute_prompt(
3 prompt_deployment_name="github-loom-demo",
4 inputs=[
5 PromptDeploymentInputRequest_ChatHistory(
6 name="chat_history",
7 value=[
8 ChatMessageRequest(
9 role=ChatMessageRole.USER,
10 content={
11 "type": "ARRAY",
12 "value": [
13 {"type": "STRING", "value": "Please summarize the contents of this PDF."},
14 {"type": "DOCUMENT", "value": {"src": pdf_link}},
15 ],
16 },
17 )
18 ],
19 type=VellumVariableType.CHAT_HISTORY,
20 ),
21 ],
22)
23print(response.outputs[0].value)

To see how to do this with other SDKs or cURL, see our API Explorer for the /execute-prompt endpoint or /execute-workflow endpoint.