For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
BlogLog InRequest Demo
HomeProductDevelopersSelf-HostingChangelog
HomeProductDevelopersSelf-HostingChangelog
  • Getting Started
    • Overview
  • Agent Builder
    • Using the Agent Builder
  • Prompts
    • Prompt Engineering
    • Collaboration
    • Custom Models
    • Multimodality
    • Prompt Caching
  • Workflows
    • Introduction
    • Experimenting
    • Integrating
      • Overview
      • RAG System
      • Escalation to a Human
      • Prompt Retry Logic
      • PDF Content Summarization
      • Fallback Models
    • Function Calling
  • Evaluation & Test Suites
    • Quantitative Evaluation
    • Evaluating RAG Pipelines
    • Online Evaluations
  • Metrics
    • Out of the Box Metrics
    • Custom Metrics
    • Reusing Metrics in Test Suites
  • Deployments
    • Deployment Lifecycle Management
    • Observability in Production
    • Environments
    • Release Tags
    • Release Reviews
  • Monitoring
    • Monitoring Production Trends
    • Track Workflow Execution Costs
    • Datadog Integration
    • Webhook Integration
    • Execution URLs
  • Documents
    • Uploading Documents
    • Integrating w/ Search API
    • Metadata Filtering
  • Security
    • Data Privacy and Storage
    • HMAC Authentication
    • Role-Based Access Control (RBAC)
    • Static IPs
  • Organizations
    • Manage Organization Access
    • Data Retention Policies
LogoLogo
BlogLog InRequest Demo
On this page
  • Prerequisites
  • Implementation Steps
WorkflowsCommon Architectures

PDF Content Summarization

Was this page helpful?
Previous

Fallback Models

Next
Built with

Vellum Document Indexes are typically used to power RAG systems via Search Nodes. However, they can also be used to operate on the entirety of a single file’s contents. In this example, we make use of Vellum Document Indexes not for the purpose of search, instead, to leverage the OCR that’s performed and operate on the raw text that’s extracted from a PDF file.

Prerequisites

Before building this workflow, you need to have:

  • Created a Document Index. Note: it doesn’t matter what embedding model or chunking strategy you choose, since we’re only leveraging the OCR capabilities of the Document Index.
  • Uploaded a PDF file to the Document Index and noted down its ID.
  • Generated a Vellum API Token and saved its value as a Workspace Secret.

Implementation Steps

1

Set the input to the workflow

This will be the ID of a Document that was previously uploaded to a Document Index

2

Add a Templating Node (Document API URL)

This will construct the url of a Vellum API we want to hit.

3

Add an API Node (Document API)

This will ping the Vellum API and retrieve metadata about the Document.

4

Add a Templating Node (Processed Document URL)

This will extract the url of the processed document from the API response.

5

Add an API Node (Processed Document Contents)

This will retrieve the text contents of the Document.

6

Pass those contents to a Prompt Node that summarizes the text.

Extracting and summarizing PDF content