Why Care About Agent Authored Code Quality

A developer I was chatting with recently raised a question I’ve been hearing more and more:

Since we can produce code so fast now, does the code actually matter that much? Assuming the code fulfills the requirement through actual user testing, who cares what the code looks like? We’re just sitting at another layer of abstraction. Once we got 2nd and 3rd generation languages, did you care what assembly they output - as long as it did what you wanted?

It’s a genuinely interesting argument, and I’ll admit it gave me pause. Then they went further:

Maintainability - whose? Does it change things if AI bots can rewrite entire swaths of code in minutes? Why do we have code architecture? In a lot of cases it’s to make it easier for humans. So does architecture still matter? Observability - again, whose? Does it change things if AI bots can ingest log files almost instantaneously? Testability - does it change things if AI can write unit and integration tests? I think it just moves where [the responsibility] lies.

It’s a thoughtful take. The nail gun vs. hammer metaphor applies - you still need a plan, constraints, materials, regulations. If all of those are fulfilled, does the code itself matter?

I think the answer is still yes - but for reasons that weren’t obvious two years ago.

The Assembly Language Analogy Breaks Down

The reason we don’t care about the assembly a compiler emits is that compilers, JIT runtimes, and linkers are deterministic. If a piece of code is proven to work correctly, we have strong confidence it will continue to work correctly every time it runs. We’ve built decades of trust in these tools.

But here’s a wrinkle worth acknowledging: compilers have bugs too. Real ones, not just junior-developer folklore. Optimizing compilers especially - there has probably never been a shipping optimizing compiler that was provably bug-free. GCC, MSVC, javac, Delphi, early C# - every major compiler has had real code generation bugs filed against it at some point. So the analogy isn’t quite “compilers are perfect, LLMs are not.”

The actual distinction is determinism. A compiler bug is reproducible. You can construct a minimal test case, file a report, and the next release fixes it. The C2 wiki puts it well: “while bugs may have been common in v1.0 of the compiler, by the time it gets to v5.0 they’re rare beasts indeed.” Trust in compilers was earned over decades precisely because determinism allowed the bug surface to be systematically reduced. Each bug found was a bug permanently eliminated.

LLMs don’t work that way. A prompt that produces correct output today may produce subtly wrong output tomorrow - not because anything changed in the model, but because the sampling process is inherently stochastic. There’s no reproducible minimal test case. There’s no patch that closes the issue for good. The nondeterminism means the chance for bugs reappears every single time you invoke the tool. You can’t build up the same kind of compounding trust, because you can never be confident a problem you’ve previously observed is actually gone.

Add enough guardrails - structured outputs, validation steps, deterministic post-processing - and you can reduce the variance significantly. But you’re never going to get a compiler-level guarantee out of a language model. That’s not a criticism; it’s just a different class of tool with a different trust model.

The Free Lunch Is Ending

When I first started pushing back on the “who cares about code quality” argument, I focused on correctness and determinism. But there’s a second, increasingly practical reason: AI costs money, and bad code costs more AI. It’s probably obvious to everyone at this point, but I’ve been calling out the fact that the real costs of AI are coming to its users for a while now.

Compilers don’t charge you per minute of use. You don’t worry that building a 50,000-line class is going to run up a bill. But agentic development tools absolutely do. Every token the agent reads to understand your codebase costs money. Every minute of wall-clock time it spends figuring out what to change costs money. And the messier the codebase, the more tokens it needs to read, and the longer it takes to orient itself.

Two years into an AI-assisted project with no structure - just-in-time code added for every corner case, duplication scattered everywhere, no cohesion - and the AI is going to have an incredibly hard time making any change. It will need massive context windows just to understand what’s going on. You’ll pay for that in the form of tokens and clock time. The Big Ball of Mud architecture is every bit as much a problem for LLMs and agents as it is for human developers.

The Tokenmaxxers Will Vindicate SOLID

The people who most enthusiastically dismiss code quality principles as “for humans” (and not needed for agents) are going to circle back to them. Not because they’ll come to believe the agents can’t ignore them and still create new code. But because the cost of doing so will become untenable (or at least incredibly wasteful).

Some simple examples follow:

You should use middleware for cross-cutting concerns instead of copy-pasting logging, authentication, try-catch blocks, etc. into every endpoint or service. This keeps your actual business logic in those endpoints and handlers and services cleaner and easier to understand, for humans. But that’s not the only reason. You should do this because every duplicated copy of that logic is additional context the agent has to read, understand, and keep consistent. You’re paying for every token.

You should organize the system into encapsulated modules with clear boundaries. Not because humans find it easier to navigate (though they do), but because a well-bounded module lets the agent focus on a small, coherent slice of the codebase. The blast radius of any given change is smaller. The context required is smaller. The bill for your AI usage is smaller.

The SOLID principles, DRY, separation of concerns - all of these were articulated as ways to make code easier for humans to understand and change. It turns out they’re also exactly the properties that make code cheaper and more reliable for AI agents to understand and change.

Architecture Still Matters - Maybe More Than Ever

The argument that “architecture is just for humans” assumes that AI agents can hold arbitrarily large contexts perfectly and act on them without cost or error. Neither is true today, and neither is likely to be true for a long time. Context windows have limits. Attention degrades over long inputs. Token costs scale with codebase complexity.

Good architecture isn’t about making developers feel good about their code. It’s about minimizing the surface area that needs to change for any given requirement, making it easy to verify that a change is correct, and keeping each piece of the system focused on one thing. Those properties matter for human developers. They matter at least as much - and maybe more - for AI agents operating at scale.

So yes, the assembly language analogy is appealing. But assembly outputs from compilers are free, deterministic, and proven. Agent-authored code is none of those things. The economic reality of token costs means that a well-structured codebase isn’t a luxury for teams using AI - it’s how you keep the AI bill from eating your margins.

References