When building production LLM applications, it’s often beneficial to implement a fallback strategy that can dynamically select different models based on various conditions. This approach allows you to:
This tutorial demonstrates how to build a workflow that dynamically selects models and implements fallback logic when errors occur.
We’ll create a workflow that:
Start by adding a Prompt Node and a Templating Node to your workflow sandbox. The Templating Node will determine which model we should use in the Prompt Node.

Open the “Model” tab in the Prompt Node, then click the 3-dot icon next to the currently selected model. You’ll see an option for “Expressions” - this allows you to dynamically select a model. Click it to enable dynamic model selection.

Connect the Templating Node’s output to the Prompt Node’s model field. This will allow the Templating Node to dynamically determine which model the Prompt Node should use.

Click anywhere on the Prompt Node to open the side panel. Navigate to the Adornments section and add a “Try” adornment. This will help us expose a new output from the Prompt Node that will let us detect errors and handle them appropriately.

Navigate to the Ports tab and add a new port. Create an expression for the “If” case that checks if there was no error. If there was an error, we’ll take the “Else” path and route back to the Templating Node to select a different model.

Your canvas should now look something like this, with the Prompt Node having two output paths - one for successful execution and one for error handling:

Finish setting up your workflow by:

Open the Templating Node and rename the input variable to times_executed. Connect it to the Execution Count of the Prompt Node by selecting Execution Count: Prompt.
Then, paste the following Jinja template into the Template area:
This logic says:
gpt-3.5-turbo (a model with a smaller context window)gpt-4o which has a larger context window
To test the fallback functionality:
This fallback model architecture can be extended for various scenarios:
Start with cheaper models for simple tasks, and only escalate to more expensive models when necessary.
Handle various types of model-specific errors (rate limits, timeouts, etc.) by trying alternative models.
Begin with faster models for quick responses, then use more capable models only when the initial response doesn’t meet quality thresholds.
Automatically switch to models with larger context windows when content exceeds the limits of smaller models.
You can customize this architecture in several ways:
By implementing dynamic model selection with fallback logic, you can build more resilient, cost-effective, and performant LLM applications that gracefully handle errors and optimize for different requirements.