Best Practices for Replaying Requests & Backtesting Prompts

Despite how well you test your Prompts before promoting them to production, edge cases you didn’t expect will likely appear. The Prompt Lifecycle Management page described how to update a Deployed Prompt. Before doing so, however, it’s best practice to replay recent requests seen in prod to the new prompt and spot check to confirm that outputs look reasonable. LLMs are sometimes unpredictable, even changing the word “good” to “great” in a prompt can result in differing outputs!

Starting a Back-test

After clicking the “Deploy” button below a Prompt and selecting “Update Existing Deployment”, you’ll see an option to “Run Back-Tests”.

Run Back-Tests Button

Spot-Checking the “Before” and “After”

In the back-testing UI, you can choose which entries you want to replay. This will re-use the prompt variable input values previously used to generate a new Completion. You can compare this “After” to the original output (the “Before”) to get a sense for how this Prompt would have performed in production had it been live the whole time.

Back Testing Prompt Requests

Once you’re happy, you can close out and proceed to update the Deployment, or pump the breaks and do more Prompt engineering if it’s not good enough.