How Much Do LLMs Accelerate Software Work?

On the one hand, I hate that LLMs are giving me so many thoughts to write about. On the other, it is nice to be writing more!

The zeitgeist we find ourselves with lately is all about how LLMs and generative AI are going to commoditize software work. We've already seen a whole lot of companies put their money where they think the mouth will need to be to reap the benefits. I think those mouths will go unfilled for a while.

Software Work is More than Code

The agent workflows have been trying to crack this particular barrier, and they're doing...okay? I mess around with them for stuff that I want to exist but I don't want to put in the time or effort to learn about. They typically do just fine - my standard of quality is low for these things, edge cases are likely a single-digit cardinality, and they're small enough that I wouldn't be filled with consternation over having to fix a bug.

But for adding features to existing projects? Projects that need to be reliable and could have actual consequences if deployed carelessly? They're still pretty bad! And it should not be a mystery why that is!

Coding isn't the Bottleneck

A serious software project has a whole bunch of phases. Some phases may be elided, some phases may be optional for a given project, but in general, you have some kind of sequence like:

Idea
Sell it
Design it
Plan it (may be considered part of 3)
Implement it
1. This includes testing in an ideal world
Peer-review it
Deploy it
Maintain it
1. Ideally, at least!

We can kind of discount step 1 and 8, because ideas don't usually happen during discrete times of "idea work", and step 8 is un-timeable. So we get 6 steps with some kind of plannable time allotment.

Can we really say that implementation, testing included, is the majority of the time here? Can we even call it a plurality? It can even be parallelized far simpler than any of the other steps! How often are you going to convince other people to join your crusade to implement fishing in your ARPG for free when they won't get any organizational benefit from doing so? How does one split the work of writing down an architectural design? Can you ask 5 people to review your PRs and have it done in 1/4th of the time?

Coding, though, you'll probably have to deal with some merge conflicts, but you can put 5 modules in 5 separate PRs and now you're multi-threading, baby!

Can I get some context?

The other thing about the sequence I outlined is that it's all pretty much just playing darts blindfolded if you don't have a consistent context throughout the whole sequence. Asking a product manager off the street for ideas offers a better value than asking one who's starting their first day at your company, because neither of them know what's going on and only one of them is free.

In fact, the one starting today has only cost you money so far.

But after a week, a month, a year, and so on, the one you hired is only going to have better ideas, they'll know how to sell it and who to sell it to, they'll know what the engineers and designers and platforms and everyone are capable of.

The same goes for the other roles - engineers come up with great ideas all the time! They get better at working on the systems as they do it more. They leverage and improve the platforms as they use them. They understand the core needs of users and can integrate that into the whole process. Similar with designers.

So while, sure, Claude Code might be okay at implementing things after you give it a whole bunch of context and steer it away from cliffs and sometimes it just repeatedly dives off the cliff anyway and you have to go in and write the thing yourself.

Wait, is that okay? I dunno.

Point being, they can not do anything but the most academic of implementations without context. And often, if you want to "zero-shot", you need to spend a lot of time on making that context accurate, concise, and precise.

Improved Productivity is a Facade

One point often brought up against LLMs is that they are unpredictable and non-deterministic. This means that they won't, except by accident, produce the same output for the same input, and for some things this is a desirable characteristic.

But do we really want that in production software?

Of course, you can manually review every line they generate, and then it should be subject to typical code review processes. But reading the lines is only using one part of your brain, whereas writing code has you getting more involved and necessarily more informed on the code you're attaching your name to. Every line you ship has the potential for a defect, or vulnerability, or complexity, or wasted cycles, etc.

Beyond that, what are you supposed to do when the LLM hits a wall and you've spent a thousand bucks on a single feature? Do you continue to burn money (and forests, neighborhoods, fossil fuels...) just telling Cursor to do it right this time? At some point, you'll have to pop open the proverbial hood and check the oil. Except the manual was never written, there are six engines, for some reason there's a tire in one of them, and one of the engines is just a Nissan 300ZX miniature.

Struggle is the Best Teacher

There's a lot of research out there on pedagogy and learning methodologies and all that, and I encourage you to check out some of it, if only for your own learning. What I want to focus on here is the concept of desirable difficulty, which is basically that there's an optimal level of difficulty for long-term retention and learning, and that level is well above 0 difficulty.

Put another way: you should struggle at least a little bit. This partly explains the efficacy of flashcards, which force active recall - a very frustrating exercise at times, if anyone who has done Anki drills for months on end knows. This also aligns with common advice on reading novels, playing competitive games, learning an instrument, drawing, and basically any complex skill: you should stretch your abilities, but don't over-do it.

When you abdicate development to LLMs, you're replacing productive struggle with a slot machine. There's no hard thinking or investigation about why the feature isn't working. You're just asking it to do it again, but you're trying extra hard to time the pull this time.

Kind of a side note to this point, but it also means you surrender the ability to say "oh, yeah, I worked on that" when something comes up in the future that you worked on. What are you going to say instead, "Oh yeah, I made the robot work on that while I drank a latte"?

I guess I'd laugh if someone said that. But I'd also cry a little bit.

Robot Caretakers: The Future...?

If we ignore the significant and numerous reasons someone can refuse to use LLMs for, and purely look at LLMs from a value standpoint, then I can admit that they have a lot of real uses for crufty things. The problems I've mentioned don't really apply to audience of 1 projects or bullshit work, like quick scripts of convenience, writing MR description drafts, human language translation for your own use, things like that.

But if someone wants to hand over the entire process to the robots and just put tickets in to get merge requests out, you're just a robot caretaker. There's not a whole lot of differentiation between robot caretakers and someone off the street.

"But what about the business context, and the software engineering expertise, and the human elements?" you may ask.

I just want to ask, how much of that will be left after a year of sabbatical? About the same as a year of caretaking. And you don't even get to take a sabbatical.

Stephen Hara