How to choose the right LLM

Nov 3

How Model Routing Saves Costs and Drives Efficiency

In generative AI, choosing the right large language model (LLM) is like picking the perfect tool for a complex project—each model has its strengths, quirks, and ideal use cases. As someone who has been working closely with AI systems, I've come to appreciate just how powerful model routing can be when it comes to saving costs, boosting efficiency, and getting the best out of your AI copilots. This isn't just a trendy concept—it’s a game-changer for anyone looking to harness AI effectively.

Model Routing: What It Means and Why It Matters

Model routing is all about directing specific tasks to the most suitable LLM based on what the job requires. The benefits are real—optimising costs, speeding up response times, and getting better overall results. Think of it like having a well-stocked toolbox: instead of using an all-purpose tool that’s overkill for most tasks, you pick the perfect one for the job. It’s smarter, more efficient, and often a lot cheaper.

For instance, some LLMs are incredible at creative tasks like writing a story or drafting social media posts, while others are built to handle complex numbers or technical code. Understanding these strengths and routing requests accordingly is the secret to getting great results.

Real-World Insights on Model Routing

OCTOBER 2024 MODEL DISTRIBUTION, LAUNCHLEMONADE

From my experience—and from what I've observed on my platform—I’ve seen some incredible results with model routing. Here’s a snapshot of how different LLMs are being leveraged effectively (October 2024 snapshot):

Total Messages: 18,970
Total Projects Built (Lemonades): 1,209
Average Messages per Project: 15.7

This data tells an interesting story: users aren’t sticking to just one model, they’re using a variety to tap into the specific advantages each offers. Here’s a closer look:

Claude 3.5 Sonnet handled 32.9% of all messages, averaging 44.9 messages per project. It’s a favourite when users need a deep dive or a more extended conversation.
OpenAI GPT-4o Mini made up 12.2% of messages, with a leaner average of 5.4 messages per project, making it ideal for quick, efficient tasks.
Gemini 1.5 Flash was used less often (3.2% of messages) but shows 50.4 messages per project—proving its value in projects that need thorough exploration.
Codestral Mamba stands out with a messages-to-project ratio of 73.7, used for complex, iterative work despite having a smaller overall share.

Why Different LLMs Matter: Cost, Efficiency, and Computational Power

The decision to route tasks to different LLMs comes with significant benefits:

Cost Efficiency: Not every task needs the power or cost of a top-tier model. For simpler jobs, a lighter model like Llama 3.2 3B or OpenAI GPT-4o Mini is often more cost-effective. For instance, you wouldn’t want to use Claude 3.5 Sonnet to brainstorm some quick ideas if Llama 3.1 405B can do it for much less.
Optimised Performance: Specific models shine in particular roles. Claude 3.5 Sonnet is perfect for longer, more thoughtful conversations, while Flux Image v1.1 is a great fit for generating images. By routing each request to the right model, everything flows better and users get the quality they need.
Scalable Operations: Spreading tasks across different models helps keep everything running smoothly. It means faster responses and allows for easy scaling when demand increases. For example, using Gemini 1.5 Flash for longer tasks frees up other models to handle quick-fire questions, keeping the whole system efficient.
Managing Computational Power: Different LLMs require different levels of computational power. Lighter models like Llama 3.2 3B are ideal for simpler tasks, saving on computational resources and avoiding system strain. Meanwhile, heavy-hitters like Claude 3.5 Sonnet are best for more in-depth interactions but naturally come with higher costs. The beauty of model routing is that it balances these computational needs, ensuring that every task gets the right level of power without wasting resources.

Choosing the Right AI for Your Needs

So how do you know which model to pick when building an AI? Here are a few guidelines:

Creative Brainstorming: Use Claude 3.5 Sonnet or Llama 3. They’re great at handling the nuanced back-and-forth that creative tasks often need.
Code Generation: Codestral Mamba is a solid choice here. Its high messages-to-project ratio shows it excels at complex, iterative work.
Quick Responses: If speed is your goal, OpenAI GPT-4o Mini is way to go. It’s built for short, rapid interactions, which makes it perfect for those quick queries.

Maximising AI Efficiency with the Right Choices

My goal with these insights is simple: to help more people see how model routing can maximise impact without inflating costs. Generative AI can sometimes feel like an overwhelming technology—but with a smart, scalable approach like model routing, you can get a lot more done with a lot less stress. By matching each job to the right LLM—whether it’s a creative project, technical build, or something in between—you can save time, conserve resources, and really make AI work for you.

AI doesn’t have to be an intimidating leap into the unknown. Start with the right copilots and you’ll be amazed at what you can achieve. With thoughtful model selection and a clear strategy, even the biggest ideas can turn into practical, impactful solutions that make a difference.

——————————

Hi, I am Cien, CEO of LaunchLemonade and Director of Scale That Thing. I am a thought-leader in the AI world, based in London and my mission is simple: No one gets left behind in the time of AI.

Cien Solon