Basic RAG Chatbot

Overview

In this example, we’ll build a basic RAG chatbot. The chatbot will be able to answer questions whose answers are grounded in the contents of PDF documents (in this case, Vellum’s Trust Center Policies). This is useful if you want to help scale your support team by finding them quick answers to common questions from customers.

Ultimately, we’ll end up with a Workflow that performs the following steps:

  1. ExtractUserMessage: extracts the most recent message from the user
  2. SearchNode: uses the user’s message to find relevant quotes from ingested PDFs
  3. FormatSearchResultsNode: reformats quotes to include the name of the document that they came from
  4. PromptNode: passes the user’s question and the PDF context to the LLM to answer the question
1## Graph Definition
2class BasicRAGWorkflow(BaseWorkflow[Inputs, BaseState]):
3 graph = ExtractUserMessage >> SearchNode >> FormatSearchResultsNode >> PromptNode
4
5 class Outputs(BaseWorkflow.Outputs):
6 result = PromptNode.Outputs.text
7
8 ## Running it
9 workflow = BasicRAGWorkflow()
10 terminal_event = workflow.run(
11 inputs=Inputs(
12 chat_history=[
13 ChatMessageRequest(
14 role="USER",
15 text="How often is employee training?",
16 )
17 ]
18 )
19 )
20
21## Output:
22print(terminal_event.outputs.result)
23
24"""
25Employee training, as outlined in the Information Security Policy
26 occurs on an annual basis. All new hires are required to complete
27 information security awareness training as part of their new employee
28 onboarding process and then annually thereafter. This ongoing training
29 includes security and privacy requirements, the correct use of
30 information assets and facilities, and, consistent with assigned roles
31 and responsibilities, incident response and contingency training.
32 Additionally, individuals responsible for supporting or writing code for
33 internet-facing applications or internal applications that handle customer
34 information must complete annual security training specific to secure coding
35 practices, which includes OWASP secure development principles and OWASP top 10
36 vulnerability awareness for the most recent year available.
37
38Citation: Policy Information Security Policy - v1.pdf & Policy Software Development Life Cycle Policy - v1.pdf\
39"""

Which corresponds to a Workflow graph like this:

Basic RAG Chatbot

Let’s dive in!

Setup

Install Vellum

$pip install vellum-ai

Create your Project

In this example, we’ll structure our project like this:

1basic_rag_chatbot/
2├── workflow.py
3├── inputs.py
4├── __init__.py
5└── nodes/
6 ├── __init__.py
7 ├── extract_user_message.py
8 ├── search_node.py
9 ├── format_search_results_node.py
10 └── prompt_node.py

Folder structure matters! Vellum relies on this structure to convert between UI and code representations of the graph. If you don’t want to use the UI, you can use whatever folder structure you’d like.

Define Workflow Inputs

1from typing import List
2from vellum import ChatMessageRequest
3from vellum.workflows.inputs import BaseInputs
4
5class Inputs(BaseInputs):
6 chat_history: List[ChatMessageRequest]

Our chatbot will have a chat history, which is a full list of messages between the user and the bot. If we want, we could use this to answer follow-up questions with context from previous messages.

Build the Nodes

1

Extract User Message

We’ll use the output from this node in the next step— to search relevant documents to answer the user’s question factually.

1# nodes/extract_user_message.py
2from vellum.workflows.nodes import TemplatingNode
3from ..inputs import Inputs
4
5class ExtractUserMessage(TemplatingNode):
6 # Here, we reference the chat_history input that we've connected to this node.
7 template = """\
8 {{ chat_history[-1]["text"] }}\
9 """
10
11 # Here, we define the inputs to _this_ node.
12 inputs = {
13 "chat_history": Inputs.chat_history,
14 }

You can see that we’re subclassing the TemplatingNode class, which allows us to use a Jinja template to extract the user’s query from the chat history.

2

Search Node

Specify which document index to search over, and use the user’s query to find relevant chunks of information.

1# nodes/search_node.py
2from vellum.workflows.nodes import BaseSearchNode
3
4from .extract_user_message import ExtractUserMessage
5
6class SearchNode(BaseSearchNode):
7 document_index = "vellum-trust-center-policies"
8 query = ExtractUserMessage.Outputs.result

Here, we subclass BaseSearchNode, which allows us to specify a document index to search over, and a query to search with. Vellum provides out-of-the-box, scalable vector database and embeddings solutions that make this easy.

3

Format Search Results Node

This is an optional step, but it can be useful to format the search results in a way that’s optimal for an LLM to consume. You may want to include metadata in a certain format or omit it altogether. Here, we include the name of the document that each chunk came from, so that we can later instruct an LLM to cite its sources.

1# nodes/format_search_results_node.py
2from vellum.workflows.nodes import TemplatingNode
3
4from .search_node import SearchNode
5
6class FormatSearchResultsNode(TemplatingNode):
7 template = """\
8 {% for result in results -%}
9 Policy: {{ result.document.label }}
10 ------
11 {{ result.text }}
12 {% if not loop.last %}
13 #####
14 {% endif -%}
15 {% endfor %}\
16 """
17
18 inputs = {
19 "results": SearchNode.Outputs.results,
20 }
4

Use an LLM to Answer the User’s Question

Pass the user’s question and the answer context to the LLM so the LLM can answer in a personalized manner for the user.

1# nodes/prompt_node.py
2from vellum.workflows.nodes import InlinePromptNode
3from vellum import (
4 ChatMessagePromptBlock,
5 JinjaPromptBlock,
6)
7
8from .extract_user_message import ExtractUserMessage
9from .format_search_results_node import FormatSearchResultsNode
10
11class PromptNode(InlinePromptNode):
12 ml_model = "gpt-4o"
13 prompt_inputs = {
14 "question": ExtractUserMessage.Outputs.result,
15 "context": FormatSearchResultsNode.Outputs.result,
16 }
17 blocks = [
18 ChatMessagePromptBlock(
19 chat_role="SYSTEM",
20 blocks=[
21 JinjaPromptBlock(
22 block_type="JINJA",
23 template="""\
24 Answer user question based on the context provided below, if you don't know the answer say "Sorry I don't know"
25
26 **Context**
27 ``
28 {{ context }}
29 ``
30
31 Limit your answer to 250 words and provide a citation at the end of your answer\
32 """,
33 ),
34 ],
35 ),
36 ChatMessagePromptBlock(
37 chat_role="USER",
38 blocks=[
39 JinjaPromptBlock(
40 block_type="JINJA",
41 template="""\
42 {{ question }}\
43 """,
44 ),
45 ],
46 ),
47 ]

Instantiate the Graph and Invoke it

1

Define the Graph and its Outputs

1# workflow.py
2from vellum.workflows import BaseWorkflow
3from vellum.workflows.state import BaseState
4
5from .inputs import Inputs
6from .nodes.extract_user_message import ExtractUserMessage
7from .nodes.search_node import SearchNode
8from .nodes.format_search_results_node import FormatSearchResultsNode
9from .nodes.prompt_node import PromptNode
10
11class BasicRAGWorkflow(BaseWorkflow[Inputs, BaseState]):
12 graph = ExtractUserMessage >> SearchNode >> FormatSearchResultsNode >> PromptNode
13
14 class Outputs(BaseWorkflow.Outputs):
15 result = PromptNode.Outputs.text
2

Instantiate the Workflow

1## From any file / function from which you want to reference the Workflow
2
3# Required import (the file imported from depends on your folder structure)
4# from .workflow import BasicRAGWorkflow
5
6workflow = BasicRAGWorkflow()
3

Invoke the Workflow and Output the Answer

1## From any file / function from which you want to run the Workflow
2
3# Required imports (the file imported from depends on your folder structure)
4# from .inputs import Inputs
5# from vellum import ChatMessageRequest
6
7terminal_event = workflow.run(
8 inputs=Inputs(
9 chat_history=[
10 ChatMessageRequest(
11 role="USER",
12 text="How often is employee training??",
13 )
14 ]
15 )
16)
17## Output:
18print(terminal_event.outputs.result)
19
20"""
21Employee training, as outlined in the Information Security Policy
22 occurs on an annual basis. All new hires are required to complete
23 information security awareness training as part of their new employee
24 onboarding process and then annually thereafter. This ongoing training
25 includes security and privacy requirements, the correct use of
26 information assets and facilities, and, consistent with assigned roles
27 and responsibilities, incident response and contingency training.
28 Additionally, individuals responsible for supporting or writing code for
29 internet-facing applications or internal applications that handle customer
30 information must complete annual security training specific to secure coding
31 practices, which includes OWASP secure development principles and OWASP top 10
32 vulnerability awareness for the most recent year available.
33
34Citation: Policy Information Security Policy - v1.pdf & Policy Software Development Life Cycle Policy - v1.pdf\
35"""

Conclusion

In under 120 lines of code, we built a RAG chatbot that can answer users’ questions with context from a vector database. Looking forward, we can:

  • Version control the graph with the rest of our project in a git repository
  • Continue building the graph in the Vellum UI
  • Evaluate the pipeline with test data, see Evaluating RAG Pipelines
  • Host it on our own servers or deploy to Vellum
Built with