Build an AI Router: Cut Costs, Boost Speed

The Overpaid Genius Intern

Picture this. You hire one intern. Let’s call him Chad. Chad went to an Ivy League school, he’s brilliant, and he costs you an absolute fortune per hour. You ask Chad to write a complex market analysis report. He nails it. It’s a masterpiece. You’re thrilled.

The next day, you ask Chad to get you a coffee. He spends 45 minutes analyzing the fluid dynamics of espresso extraction, writes a three-page treatise on the socio-economic impact of Sumatran coffee bean farming, and hands you a cold cup of coffee. And he bills you for two hours.

Then you ask him to reply to a simple client email that just says “Thanks!”. He writes a Shakespearean sonnet about gratitude, accidentally insults the client’s logo, and attaches a 50MB PDF on the history of email. You’ve just paid a genius-level salary for someone to create chaos out of a simple task.

This is exactly what you’re doing when you use one giant, expensive AI model (like GPT-4) for every single task in your business. It’s overkill, it’s slow, and it’s burning a hole in your pocket.

Why This Matters

In the world of AI automation, efficiency is everything. Using a single, super-powerful Large Language Model (LLM) for all your needs is like using a sledgehammer to crack a nut. It works, but it’s messy, expensive, and slow.

An **LLM Router** is the solution. It’s the calm, experienced manager you hire to manage your team of digital interns. Instead of one expensive “Chad,” you now have a team:

A cheap, fast intern for getting coffee and answering simple emails (like GPT-3.5 Turbo or a smaller model).
An expensive, brilliant strategist for the big reports (like GPT-4 or Claude 3 Opus).
A quiet, focused specialist who only knows your company’s legal documents.

The router’s only job is to look at an incoming task and instantly decide which specialist is the right fit. This one concept will save you 50-80% on your API costs, dramatically speed up your automations, and make your entire system more reliable.

What This Tool / Workflow Actually Is

An LLM Router is not a new, magical AI. It’s a simple, clever workflow. At its core, it’s a small, fast AI model whose only job is to classify an incoming request and route it to another, more specialized AI model or workflow.

What it does: It acts as a traffic cop for AI prompts. It reads the user’s input and, based on instructions you give it, picks the best tool for the job from a list of options.

What it does NOT do: It does not answer the user’s question itself. It doesn’t do the heavy lifting. It’s a delegator, a middle manager. Its entire purpose is to make a decision and pass the work along.

Prerequisites

I’m not going to lie, this involves a tiny bit of code. But if you can copy and paste, you can do this. I promise.

A little Python knowledge: If you’ve ever run a Python script, you’re overqualified. If not, just follow the steps exactly. We’re not building a rocket ship.
An OpenAI API Key: This is your password to use their models. You can get one from the OpenAI platform. Yes, it will cost a few cents to run these examples.
Python and a code editor: Make sure Python is installed on your computer. You can write the code in anything, even a basic text file saved as `my_router.py`.

Step-by-Step Tutorial

We’re going to build a simple router using a popular Python library called LangChain. It makes the complex parts easy.

Step 1: Set Up Your Project

First, let’s install the libraries we need. Open your terminal or command prompt and run these commands:

pip install langchain langchain-openai python-dotenv

Next, create a file in your project folder named `.env`. This is where we’ll safely store your API key. Open that file and add this line, replacing `YOUR_KEY_HERE` with your actual OpenAI key:

OPENAI_API_KEY="sk-YOUR_KEY_HERE"

Step 2: Define Your “Specialists”

Now, create your Python file. Let’s call it `router.py`. In this file, we’ll define our two specialists: a fancy math genius and a casual conversationalist.

import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain

# Load the API key from our .env file
load_dotenv()

# Initialize our two LLM specialists
# The "genius" for complex tasks (uses GPT-4)
gpt4_llm = ChatOpenAI(model="gpt-4-turbo-preview")

# The "fast and cheap" one for simple chat (uses GPT-3.5)
gpt35_llm = ChatOpenAI(model="gpt-3.5-turbo")

# Define the prompt template for the math specialist
math_template = """You are a brilliant mathematician. You solve the following math problem with a clear, step-by-step explanation.

Problem: {input}

Solution:"""
math_prompt = PromptTemplate(template=math_template, input_variables=["input"])
math_chain = LLMChain(llm=gpt4_llm, prompt=math_prompt)

# Define the prompt template for the conversation specialist
convo_template = """You are a friendly and helpful chatbot. You are having a pleasant conversation.

Human: {input}

Chatbot:"""
convo_prompt = PromptTemplate(template=convo_template, input_variables=["input"])
convo_chain = LLMChain(llm=gpt35_llm, prompt=convo_prompt)

Why this step? We’ve created two separate assembly lines. One (`math_chain`) is built with our expensive, powerful GPT-4 model and a very specific set of instructions for solving math problems. The other (`convo_chain`) uses the cheap, fast GPT-3.5 model and is primed for casual chat. These are our destinations.

Step 3: Build the Router

Now we need the manager—the router itself. LangChain has a great tool for this. We’ll give it the list of our specialists and it will create the logic to choose between them.

from langchain.chains.router import MultiPromptChain
from langchain.chains.router.llm_router import LLMRouterChain, RouterOutputParser
from langchain.chains.router.multi_prompt_prompt import MULTI_PROMPT_ROUTER_TEMPLATE

# Create a list of our destinations (the specialists)
destination_chains = {
    "math": math_chain,
    "conversation": convo_chain
}

# Create a default chain for when the router isn't sure
default_chain = LLMChain(llm=gpt35_llm, prompt=convo_prompt) # Default to conversation

# Create the descriptions for the router to use
destination_descriptions = {
    "math": "This is the best choice for answering math problems or questions involving numbers and logic.",
    "conversation": "This is the best choice for handling general conversation, greetings, or friendly chat."
}

# Build the router prompt
router_template = MULTI_PROMPT_ROUTER_TEMPLATE.format(destinations=list(destination_descriptions.keys()), destination_descriptions="\
".join(destination_descriptions.values()))
router_prompt = PromptTemplate(template=router_template, input_variables=["input"], output_parser=RouterOutputParser())

# The router chain itself, which decides where to send the input
router_chain = LLMRouterChain.from_llm(gpt35_llm, router_prompt)

# Tie it all together into the final chain
final_chain = MultiPromptChain(router_chain=router_chain, 
                             destination_chains=destination_chains, 
                             default_chain=default_chain, 
                             verbose=True) # Verbose=True lets us see the router's decision!

Why this step? This is the core logic. We created descriptions for each specialist. The `router_chain` will read the user’s input and these descriptions, and then decide which one (`math` or `conversation`) is the best fit. We also added a `default_chain` just in case it gets confused.

Step 4: Test Your Router

Let’s see our manager in action. Add this to the bottom of your `router.py` file and run it from your terminal (`python router.py`).

# Test with a math question
result_math = final_chain.invoke({"input": "What is 256 divided by 8?"})
print(f"Math Result: {result_math['text']}")

print("\
────────────────\
")

# Test with a conversation question
result_convo = final_chain.invoke({"input": "Hey, how's it going?"})
print(f"Conversation Result: {result_convo['text']}")

When you run this, you’ll see the output. Because we set `verbose=True`, you’ll see which chain the router chose before you see the final answer. The math question will be routed to the expensive GPT-4 `math` chain, and the simple greeting will be routed to the cheap GPT-3.5 `conversation` chain. Success!

Complete Automation Example

Let’s make this real. Imagine a customer support system for a software company.

The Input: A new ticket arrives from a customer in your helpdesk system (like Zendesk or Intercom).
The Goal: Automatically classify the ticket and generate a draft response.

Our router would have three destinations:

Billing Questions: For invoices, subscriptions, payments. Uses a cheap model (GPT-3.5) with a prompt that knows about pricing plans.
Technical Support: For bug reports, API questions, feature problems. Uses an expensive model (GPT-4) with a very technical prompt.
General Inquiry: For simple “how-to” questions. Uses the cheap model (GPT-3.5) and is connected to a knowledge base (we’ll cover that in a future lesson!).

The workflow would be:

Trigger: New ticket created in Zendesk.
Action: Your automation platform (like Zapier or a custom script) sends the ticket’s subject and body to your router’s API endpoint.
Routing: Your LLM Router reads the ticket. A ticket saying “My credit card was charged twice” is routed to the `Billing` chain. A ticket saying “Your Python SDK is throwing a 500 error” is routed to the `Technical Support` chain.
Output: The chosen specialist LLM generates a draft response.
Final Action: The draft is posted as an internal note on the Zendesk ticket, ready for a human agent to review, edit, and send.

This simple system alone could save a support team hundreds of hours per month by pre-processing tickets and providing instant, relevant draft answers.

Real Business Use Cases

E-commerce Store: The router analyzes incoming customer chats. “Where is my order?” goes to a simple bot connected to the shipping API. “I’m looking for a gift for my dad who likes fishing” goes to a more creative, powerful model for product recommendations.
Lead Qualification for a SaaS Company: A router reads an inquiry from a website contact form. “Can I book a demo?” is routed to an automated scheduling tool. “Does your platform comply with HIPAA regulations?” is routed to a specialized chain (or a human sales engineer) that understands compliance.
Marketing Agency: A content request comes in. The router decides: “Write a tweet” goes to a fast, cheap model. “Write a 2000-word whitepaper on quantum computing” goes to a powerful, research-capable model like Claude 3 Opus.
Internal IT Helpdesk: An employee submits a ticket. “I forgot my password” gets routed to a workflow that triggers a password reset link. “The main server is on fire” gets routed to a chain that pages the on-call engineer and sends alerts in Slack.
Recruiting Firm: The router scans incoming resumes. It can route candidates into different buckets: `Software Engineer`, `Sales`, `Marketing`. Each route could then trigger a different specialist LLM to ask relevant screening questions.

Common Mistakes & Gotchas

Vague Descriptions: The router is only as smart as the descriptions you give it. If your descriptions for `math` and `conversation` were both “Answers questions,” the router would just be guessing. Be specific.
Forgetting a Fallback: Always have a `default_chain`. What happens if a user asks something completely unexpected? Without a default, your system will break. A good default can say, “I’m not sure how to handle that, let me get a human to help.”
Not Logging the Decision: The `verbose=True` setting is your best friend. When you build a real system, you must log which route was chosen for every request. If you don’t, debugging is a nightmare.
Using an Expensive Router: The router model itself should be fast and cheap (like GPT-3.5 Turbo). You’re paying for it to make a decision on every single call. Don’t use GPT-4 as your router unless you have a ridiculously complex routing problem.

How This Fits Into a Bigger Automation System

An LLM Router isn’t an isolated gadget; it’s the central nervous system of a sophisticated automation. It’s the brain that sits in the middle of all your other tools.

It can be triggered by a new entry in your CRM, a new email parsed by a tool like Zapier, or a transcription from a voice agent.
Its output can trigger actions in other systems: update a Salesforce record, draft an email in Gmail, or send a message to Slack.
This is the fundamental building block for multi-agent workflows, where you have multiple specialized AIs working together on a complex task. The router is the project manager that assigns the work.
One of your router’s destinations could be a complex RAG (Retrieval-Augmented Generation) system that we’ll build in a future lesson. This allows you to route questions that require knowledge of your internal documents to a specialist that can actually read them.

What to Learn Next

You’ve just built a manager. A digital department head that can delegate tasks intelligently. Congratulations. You’ve unlocked a massive new capability for building efficient, cost-effective AI systems.

But what about the specialists themselves? A manager is useless without a skilled team. Right now, our specialists are just generic LLMs with a simple prompt.

In the next lesson in the Academy, we’re going to build our first true specialist: a RAG Agent That Actually Knows Your Business. We’ll teach an AI to read your private documents, your knowledge base, your PDFs, and answer questions based on *your* data, not just what it learned on the internet. Your router will finally have an employee with real company knowledge.

Stay tuned. It’s about to get even more powerful.

“,
“seo_tags”: “LLM Router, AI Automation, LangChain Tutorial, Cost Optimization, Multi-agent Systems, AI for Business, Python AI”,
“suggested_category”: “AI Automation Courses