Sometimes the Cheap Model Costs More

June 26, 2026 2 min read

I’ve been using an AI orchestration framework I built called Mozart.

You can check it out here: https://github.com/jstuart0/mozart-orchestration

The idea behind Mozart is pretty simple. Instead of asking one AI to build an entire feature, it coordinates a team of specialized agents. Each agent has a specific responsibility, and Mozart decides who should do what, validates their work, and keeps everything moving toward the end goal.

Mozart itself, along with a few of the more strategic agents, runs on Claude Opus. Most of the other agents use Claude Sonnet. That combination has worked extremely well for me.

Then I accidentally ran an experiment.

Last week I ran out of Sonnet usage on my subscription. Rather than stop working, I had Claude update all of the agents that normally used Sonnet so they would use Haiku instead. My thought process was that the real intelligence of the system lived in Mozart and the Opus-powered agents. The worker agents were just executing tasks, so Haiku would probably be good enough.

Overall, it wasn’t terrible.

When my Sonnet allotment reset, I thought I had Claude change everything back. Apparently it didn’t. I had been running on Haiku for another week without realizing it.

One of my coding agents is named Jackson. Jackson takes direction from Mozart and writes the code. Over the next week I kept noticing something felt off. Jackson was wandering away from the assigned task, missing details, or producing code that just wasn’t at the level I had become accustomed to.

What really stood out was Mozart’s behavior.

Mozart was constantly stepping in to correct Jackson, ask for revisions, validate the implementation, or outright reject it. It was spending far more time supervising than it normally does. At first I assumed I was just working on more difficult problems.

Then I checked the configuration.

Jackson was still running on Haiku.

I switched him back to Sonnet and almost immediately everything went back to normal. Jackson stayed focused on the assigned work, the quality improved, and Mozart stopped having to constantly intervene.

For me, this was a much more valuable lesson than any benchmark chart.

This wasn’t a synthetic coding test. This was my actual development workflow over the course of a week. The difference wasn’t that Haiku occasionally made a mistake. It was that it required significantly more oversight, which meant my most capable agent spent more time fixing work instead of moving the project forward.

I already suspected Haiku wasn’t the right model for serious software engineering, but experiencing it firsthand was different than simply reading benchmarks.

If you’re building multi-agent systems, don’t just think about the cost of the worker model. Think about the cost of supervision. A cheaper model that constantly needs correction can easily make the entire system less efficient.