Basic RAG Chatbot

Overview

In this example, we’ll build a basic RAG chatbot. The chatbot will be able to answer questions whose answers are grounded in the contents of PDF documents (in this case, Vellum’s Trust Center Policies). This is useful if you want to help scale your support team by finding them quick answers to common questions from customers.

Ultimately, we’ll end up with a Workflow that performs the following steps:

  1. ExtractUserMessage: extracts the most recent message from the user
  2. SearchNode: uses the user’s message to find relevant quotes from ingested PDFs
  3. FormatSearchResultsNode: reformats quotes to include the name of the document that they came from
  4. PromptNode: passes the user’s question and the PDF context to the LLM to answer the question
1## Graph Definition
2class BasicRAGWorkflow(BaseWorkflow[Inputs, BaseState]):
3 graph = ExtractUserMessage >> SearchNode >> FormatSearchResultsNode >> PromptNode
4
5 class Outputs(BaseWorkflow.Outputs):
6 result = PromptNode.Outputs.text
7
8 ## Running it
9 workflow = BasicRAGWorkflow()
10 terminal_event = workflow.run(
11 inputs=Inputs(
12 chat_history=[
13 ChatMessageRequest(
14 role="USER",
15 text="How often is employee training?",
16 )
17 ]
18 )
19 )
20
21## Output:
22print(terminal_event.outputs.result)
23
24"""
25Employee training, as outlined in the Information Security Policy
26 occurs on an annual basis. All new hires are required to complete
27 information security awareness training as part of their new employee
28 onboarding process and then annually thereafter. This ongoing training
29 includes security and privacy requirements, the correct use of
30 information assets and facilities, and, consistent with assigned roles
31 and responsibilities, incident response and contingency training.
32 Additionally, individuals responsible for supporting or writing code for
33 internet-facing applications or internal applications that handle customer
34 information must complete annual security training specific to secure coding
35 practices, which includes OWASP secure development principles and OWASP top 10
36 vulnerability awareness for the most recent year available.
37
38Citation: Policy Information Security Policy - v1.pdf & Policy Software Development Life Cycle Policy - v1.pdf\
39"""

Which corresponds to a Workflow graph like this:

Basic RAG Chatbot

Let’s dive in!

Setup

Install Vellum

$pip install vellum-ai

Create your Project

In this example, we’ll structure our project like this:

1basic_rag_chatbot/
2├── workflow.py
3├── inputs.py
4├── __init__.py
5└── nodes/
6 ├── __init__.py
7 ├── ExtractUserMessage.py
8 ├── SearchNode.py
9 ├── FormatSearchResultsNode.py
10 └── PromptNode.py

Folder structure matters! Vellum relies on this structure to convert between UI and code representations of the graph. If you don’t want to use the UI, you can use whatever folder structure you’d like.

Define Workflow Inputs

1from typing import List
2from vellum.workflows.inputs import BaseInputs
3
4class Inputs(BaseInputs):
5 chat_history: List[ChatMessageRequest]

Our chatbot will have a chat history, which is a full list of messages between the user and the bot. If we want, we could use this to answer follow-up questions with context from previous messages.

Build the Nodes

1

Extract User Message

We’ll use the output from this node in the next step— to search relevant documents to answer the user’s question factually.

1# nodes/extract_user_message.py
2from vellum.workflows.nodes import TemplatingNode
3
4class ExtractUserMessage(TemplatingNode):
5 # Here, we reference the chat_history input that we've connected to this node.
6 template = """\
7 {{ chat_history[-1]["text"] }}\
8 """
9
10 # Here, we define the inputs to _this_ node.
11 inputs = {
12 "chat_history": Inputs.chat_history,
13 }

You can see that we’re subclassing the TemplatingNode class, which allows us to use a Jinja template to extract the user’s query from the chat history.

2

Search Node

Specify which document index to search over, and use the user’s query to find relevant chunks of information.

1# nodes/search_node.py
2from vellum.workflows.nodes import BaseSearchNode
3
4class SearchNode(BaseSearchNode):
5 document_index = "vellum-trust-center-policies"
6 query = ExtractUserMessage.Outputs.result

Here, we subclass BaseSearchNode, which allows us to specify a document index to search over, and a query to search with. Vellum provides out-of-the-box, scalable vector database and embeddings solutions that make this easy.

3

Format Search Results Node

This is an optional step, but it can be useful to format the search results in a way that’s optimal for an LLM to consume. You may want to include metadata in a certain format or omit it altogether. Here, we include the name of the document that each chunk came from, so that we can later instruct an LLM to cite its sources.

1# nodes/format_search_results_node.py
2from vellum.workflows.nodes import TemplatingNode
3
4class FormatSearchResultsNode(TemplatingNode):
5 template = """\
6 {% for result in results -%}
7 Policy: {{ result.document.label }}
8 ------
9 {{ result.text }}
10 {% if not loop.last %}
11 #####
12 {% endif -%}
13 {% endfor %}\
14 """
15
16 inputs = {
17 "results": SearchNode.Outputs.results,
18 }
4

Use an LLM to Answer the User’s Question

Pass the user’s question and the answer context to the LLM so the LLM can answer in a personalized manner for the user.

1# nodes/prompt_node.py
2from vellum.workflows.nodes import InlinePromptNode
3from vellum import (
4 ChatMessagePromptBlock,
5 ChatMessageRequest,
6 JinjaPromptBlock,
7)
8
9class PromptNode(InlinePromptNode):
10 ml_model = "gpt-4o"
11 prompt_inputs = {
12 "question": ExtractUserMessage.Outputs.result,
13 "context": FormatSearchResultsNode.Outputs.result,
14 }
15 blocks = [
16 ChatMessagePromptBlock(
17 chat_role="SYSTEM",
18 blocks=[
19 JinjaPromptBlock(
20 block_type="JINJA",
21 template="""\
22 Answer user question based on the context provided below, if you don't know the answer say "Sorry I don't know"
23
24 **Context**
25 ``
26 {{ context }}
27 ``
28
29 Limit your answer to 250 words and provide a citation at the end of your answer\
30 """,
31 ),
32 ],
33 ),
34 ChatMessagePromptBlock(
35 chat_role="USER",
36 blocks=[
37 JinjaPromptBlock(
38 block_type="JINJA",
39 template="""\
40 {{ question }}\
41 """,
42 ),
43 ],
44 ),
45 ]

Instantiate the Graph and Invoke it

1

Define the Graph and its Outputs

1# workflow.py
2from vellum.workflows import BaseWorkflow
3from vellum.workflows.state import BaseState
4
5class BasicRAGWorkflow(BaseWorkflow[Inputs, BaseState]):
6 graph = ExtractUserMessage >> SearchNode >> FormatSearchResultsNode >> PromptNode
7
8class Outputs(BaseWorkflow.Outputs):
9 result = PromptNode.Outputs.text
2

Instantiate the Workflow

1## From any file / function from which you want to reference the Workflow
2workflow = BasicRAGWorkflow()
3

Invoke the Workflow and Output the Answer

1## From any file / function from which you want to run the Workflow
2terminal_event = workflow.run(
3 inputs=Inputs(
4 chat_history=[
5 ChatMessageRequest(
6 role="USER",
7 text="How often is employee training??",
8 )
9 ]
10 )
11)
12## Output:
13print(terminal_event.outputs.result)
14
15"""
16Employee training, as outlined in the Information Security Policy
17 occurs on an annual basis. All new hires are required to complete
18 information security awareness training as part of their new employee
19 onboarding process and then annually thereafter. This ongoing training
20 includes security and privacy requirements, the correct use of
21 information assets and facilities, and, consistent with assigned roles
22 and responsibilities, incident response and contingency training.
23 Additionally, individuals responsible for supporting or writing code for
24 internet-facing applications or internal applications that handle customer
25 information must complete annual security training specific to secure coding
26 practices, which includes OWASP secure development principles and OWASP top 10
27 vulnerability awareness for the most recent year available.
28
29Citation: Policy Information Security Policy - v1.pdf & Policy Software Development Life Cycle Policy - v1.pdf\
30"""

Conclusion

In under 120 lines of code, we built a RAG chatbot that can answer users’ questions with context from a vector database. Looking forward, we can:

  • Version control the graph with the rest of our project in a git repository
  • Continue building the graph in the Vellum UI
  • Evaluate the pipeline with test data, see Evaluating RAG Pipelines
  • Host it on our own servers or deploy to Vellum