In this example, we’ll build a basic RAG chatbot. The chatbot will be able to answer questions whose answers are grounded in the contents of PDF documents (in this case, Vellum’s Trust Center Policies). This is useful if you want to help scale your support team by finding them quick answers to common questions from customers.

Ultimately, we’ll end up with a Workflow that performs the following steps:

ExtractUserMessage: extracts the most recent message from the user
SearchNode: uses the user’s message to find relevant quotes from ingested PDFs
FormatSearchResultsNode: reformats quotes to include the name of the document that they came from
PromptNode: passes the user’s question and the PDF context to the LLM to answer the question

1 ## Graph Definition
2 class BasicRAGWorkflow(BaseWorkflow[Inputs, BaseState]):
3   graph = ExtractUserMessage >> SearchNode >> FormatSearchResultsNode >> PromptNode
4 
5   class Outputs(BaseWorkflow.Outputs):
6     result = PromptNode.Outputs.text
7 
8   ## Running it
9   workflow = BasicRAGWorkflow()
10   terminal_event = workflow.run(
11     inputs=Inputs(
12       chat_history=[
13           ChatMessageRequest(
14               role="USER",
15               text="How often is employee training?",
16           )
17       ]
18     )
19   )
20 
21 ## Output:
22 print(terminal_event.outputs.result)
23 
24 """
25 Employee training, as outlined in the Information Security Policy
26   occurs on an annual basis. All new hires are required to complete
27   information security awareness training as part of their new employee
28   onboarding process and then annually thereafter. This ongoing training
29   includes security and privacy requirements, the correct use of 
30   information assets and facilities, and, consistent with assigned roles 
31   and responsibilities, incident response and contingency training. 
32   Additionally, individuals responsible for supporting or writing code for 
33   internet-facing applications or internal applications that handle customer 
34   information must complete annual security training specific to secure coding 
35   practices, which includes OWASP secure development principles and OWASP top 10 
36   vulnerability awareness for the most recent year available.
37 
38 Citation: Policy Information Security Policy - v1.pdf & Policy Software Development Life Cycle Policy - v1.pdf\
39 """

Which corresponds to a Workflow graph like this:

Basic RAG Chatbot

Let’s dive in!

Setup

Install Vellum

$ pip install vellum-ai

Create your Project

In this example, we’ll structure our project like this:

1 basic_rag_chatbot/
2 ├── workflow.py
3 ├── inputs.py
4 ├── __init__.py
5 └── nodes/
6     ├── __init__.py
7     ├── extract_user_message.py
8     ├── search_node.py
9     ├── format_search_results_node.py
10     └── prompt_node.py

Folder structure matters! Vellum relies on this structure to convert between UI and code representations of the graph. If you don’t want to use the UI, you can use whatever folder structure you’d like.

Define Workflow Inputs

1 from typing import List
2 from vellum import ChatMessageRequest
3 from vellum.workflows.inputs import BaseInputs
4 
5 class Inputs(BaseInputs):
6   chat_history: List[ChatMessageRequest]

Our chatbot will have a chat history, which is a full list of messages between the user and the bot. If we want, we could use this to answer follow-up questions with context from previous messages.

Build the Nodes

Extract User Message

We’ll use the output from this node in the next step— to search relevant documents to answer the user’s question factually.

1 # nodes/extract_user_message.py
2 from vellum.workflows.nodes import TemplatingNode
3 from ..inputs import Inputs
4 
5 class ExtractUserMessage(TemplatingNode):
6   # Here, we reference the chat_history input that we've connected to this node.
7   template = """\
8     {{ chat_history[-1]["text"] }}\
9   """
10 
11   # Here, we define the inputs to _this_ node.
12   inputs = {
13     "chat_history": Inputs.chat_history,
14   }

You can see that we’re subclassing the TemplatingNode class, which allows us to use a Jinja template to extract the user’s query from the chat history.

Search Node

Specify which document index to search over, and use the user’s query to find relevant chunks of information.

1 # nodes/search_node.py
2 from vellum.workflows.nodes import BaseSearchNode
3 
4 from .extract_user_message import ExtractUserMessage
5 
6 class SearchNode(BaseSearchNode):
7   document_index = "vellum-trust-center-policies"
8   query = ExtractUserMessage.Outputs.result

Here, we subclass BaseSearchNode, which allows us to specify a document index to search over, and a query to search with. Vellum provides out-of-the-box, scalable vector database and embeddings solutions that make this easy.

Format Search Results Node

This is an optional step, but it can be useful to format the search results in a way that’s optimal for an LLM to consume. You may want to include metadata in a certain format or omit it altogether. Here, we include the name of the document that each chunk came from, so that we can later instruct an LLM to cite its sources.

1 # nodes/format_search_results_node.py
2 from vellum.workflows.nodes import TemplatingNode
3 
4 from .search_node import SearchNode
5 
6 class FormatSearchResultsNode(TemplatingNode):
7   template = """\
8     {% for result in results -%}
9     Policy: {{ result.document.label }}
10     ------
11     {{ result.text }}
12     {% if not loop.last %}
13     #####
14     {% endif -%}
15     {% endfor %}\
16   """
17 
18   inputs = {
19     "results": SearchNode.Outputs.results,
20   }

Use an LLM to Answer the User’s Question

Pass the user’s question and the answer context to the LLM so the LLM can answer in a personalized manner for the user.

1 # nodes/prompt_node.py
2 from vellum.workflows.nodes import InlinePromptNode
3 from vellum import (
4   ChatMessagePromptBlock,
5   JinjaPromptBlock,
6 )
7 
8 from .extract_user_message import ExtractUserMessage
9 from .format_search_results_node import FormatSearchResultsNode
10 
11 class PromptNode(InlinePromptNode):
12   ml_model = "gpt-4o"
13   prompt_inputs = {
14     "question": ExtractUserMessage.Outputs.result,
15     "context": FormatSearchResultsNode.Outputs.result,
16   }
17   blocks = [
18       ChatMessagePromptBlock(
19           chat_role="SYSTEM",
20           blocks=[
21               JinjaPromptBlock(
22                   block_type="JINJA",
23                   template="""\
24                       Answer user question based on the context provided below, if you don't know the answer say "Sorry I don't know"
25 
26                       **Context**
27                       ``
28                       {{ context }}
29                       ``
30 
31                       Limit your answer to 250 words and provide a citation at the end of your answer\
32                   """,
33               ),
34           ],
35       ),
36       ChatMessagePromptBlock(
37           chat_role="USER", 
38           blocks=[
39               JinjaPromptBlock(
40                   block_type="JINJA",
41                   template="""\
42                       {{ question }}\
43                   """,
44               ),
45           ],
46       ),
47   ]

Instantiate the Graph and Invoke it

Define the Graph and its Outputs

1 # workflow.py
2 from vellum.workflows import BaseWorkflow
3 from vellum.workflows.state import BaseState
4 
5 from .inputs import Inputs
6 from .nodes.extract_user_message import ExtractUserMessage
7 from .nodes.search_node import SearchNode
8 from .nodes.format_search_results_node import FormatSearchResultsNode
9 from .nodes.prompt_node import PromptNode
10 
11 class BasicRAGWorkflow(BaseWorkflow[Inputs, BaseState]):
12   graph = ExtractUserMessage >> SearchNode >> FormatSearchResultsNode >> PromptNode
13 
14   class Outputs(BaseWorkflow.Outputs):
15     result = PromptNode.Outputs.text

Running the Workflow

Using the Sandbox Runner

The sandbox runner is ideal for testing and development. It enables you to execute the workflow locally using sample inputs, providing a quick way to validate functionality.

You can run the sandbox runner by running the following command: python -m basic_rag_chatbot.sandbox 0 (where 0 is the index of the Scenario you want to run).

1 # sandbox.py
2 from vellum import ChatMessage
3 from vellum.workflows.sandbox import WorkflowSandboxRunner
4 
5 from .inputs import Inputs
6 from .workflow import BasicRAGWorkflow
7 
8 if __name__ != "__main__":
9     raise Exception("This file is not meant to be imported")
10 
11 runner = WorkflowSandboxRunner(
12     workflow=BasicRAGWorkflow(),
13     inputs=[
14         Inputs(
15             chat_history=[
16                 ChatMessage(
17                     role="USER",
18                     text="How often is employee training?",
19                 )
20             ]
21         ),
22     ],
23 )
24 
25 runner.run()
26 
27 """
28 Example Output Final Lines:
29 2025-01-22 22:48:38,256 - vellum.workflows - INFO - result: Employee training is conducted annually. All new hires are required to complete information security awareness training as part of their onboarding process and annually thereafter. Additionally, incident response and contingency training is provided annually. Employees must also acknowledge their understanding of the Information Security Program upon hire and annually. For those involved in software development, annual security training specific to secure coding practices is required.
30 """

Integration into a Project

Instantiate the Workflow

1 ## From any file / function from which you want to reference the Workflow
2 
3 # Required import (the file imported from depends on your folder structure)
4 # from .workflow import BasicRAGWorkflow
5 
6 workflow = BasicRAGWorkflow()

Invoke the Workflow and Output the Answer

1 ## From any file / function from which you want to run the Workflow
2 
3 # Required imports (the file imported from depends on your folder structure)
4 # from .inputs import Inputs
5 # from vellum import ChatMessageRequest
6 
7 terminal_event = workflow.run(
8   inputs=Inputs(
9     chat_history=[
10         ChatMessageRequest(
11             role="USER",
12             text="How often is employee training??",
13         )
14     ]
15   )
16 )
17 ## Output:
18 print(terminal_event.outputs.result)
19 
20 """
21 Employee training, as outlined in the Information Security Policy
22   occurs on an annual basis. All new hires are required to complete
23   information security awareness training as part of their new employee
24   onboarding process and then annually thereafter. This ongoing training
25   includes security and privacy requirements, the correct use of 
26   information assets and facilities, and, consistent with assigned roles 
27   and responsibilities, incident response and contingency training. 
28   Additionally, individuals responsible for supporting or writing code for 
29   internet-facing applications or internal applications that handle customer 
30   information must complete annual security training specific to secure coding 
31   practices, which includes OWASP secure development principles and OWASP top 10 
32   vulnerability awareness for the most recent year available.
33 
34 Citation: Policy Information Security Policy - v1.pdf & Policy Software Development Life Cycle Policy - v1.pdf\
35 """

Conclusion

In under 120 lines of code, we built a RAG chatbot that can answer users’ questions with context from a vector database. Looking forward, we can:

Version control the graph with the rest of our project in a git repository
Continue building the graph in the Vellum UI
Evaluate the pipeline with test data, see Evaluating RAG Pipelines
Host it on our own servers or deploy to Vellum