I ship more in a day than I used to ship in a week. But by 4pm my brain is mush, and I’m not sure I could explain what I built without checking my commit history first.
That’s the honest reality of running parallel AI coding agents at production quality. The output is extraordinary. The human cost is real. And almost nobody is talking about the second part.
None of this has made engineers less necessary. If anything, the opposite. The work changed. What “engineering” means on a daily basis looks nothing like it did two years ago. But the need for someone who understands systems, makes architectural decisions, and knows where quality breaks down? That’s only gotten more acute.
The waiting problem
I use Claude Code as my primary coding agent. The workflow goes like this: I type a prompt, then I wait. The agent thinks. It searches my codebase. It proposes a plan. I review and refine the plan. It writes code. It runs tests. I review. Minutes at a time, often longer during peak usage. Repeatedly.
With a single agent, I’m maybe 2-3x faster than coding manually, but with a lot of dead time in between. And that dead time creates a dangerous pattern.
When you’re waiting 3 minutes for the agent to respond, you start to let things slide. You accept code that’s “good enough” because you don’t want another round trip. 15 minutes before end of day? You just merge it. That hesitancy to push back is how slop gets introduced. Not because the AI is bad, but because the human stopped being rigorous.
The fix is parallelisation. Run multiple agents at the same time. When Agent 1 is thinking, you’re reviewing Agent 2’s output and refining Agent 3’s plan. You never have dead time, which means you never have a reason to skip the feedback.
What parallel agents actually look like
On a typical day I’m running 3 to 5 agents, all working on the same service. Some get simple bug fixes or tweaks. Others carry out refinements from earlier feedback. There’s usually at least one tackling a bigger feature.
The workflow is constant context switching. Remembering “what was this agent doing?”, “what was my last feedback?”, and actively force-forgetting the code you just reviewed from another agent because it’s similar but not the same.
My comfort level is around 5 agents mobbing on a single service. That’s sustainable for a full working day. Push beyond that and the quality of my feedback drops, which defeats the whole point.
Meetings have become my rest periods. I’m not joking.
The cognitive load nobody mentions
Running parallel agents is the most productive and the most exhausting way I’ve ever worked. Imagine switching between 3 to 5 kanban tickets every 60 seconds, for hours. That’s what this feels like.
The output is tremendous. I’m shipping features, fixes, and infrastructure changes at a pace that would have required a team of engineers two years ago. But there’s a real human cost that the “AI will 10x your productivity” crowd never mentions.
By mid-afternoon, my brain is genuinely fatigued. Not from writing code, but from the relentless context switching and decision making. Every agent needs a different kind of attention at a different point in its cycle. The cognitive overhead is unlike anything I’ve experienced in 15 years of engineering.
There’s a reason the industry is scrambling to build multi-agent orchestration tools. Everyone knows the human is the bottleneck. Not because humans are the problem, but because you still need one making the decisions. The agents don’t orchestrate themselves.
The environment is the quality control
With that many agents producing code in parallel, quality could easily collapse. The thing that prevents it isn’t me reviewing every line. It’s the environment.
Every time someone tells me AI-generated code is low quality, I ask one question: what verification does your repo have?
No type checking? The agent can’t catch its own type errors. No linting? It’ll write code in whatever style it feels like. No tests? There’s no feedback loop for correctness.
The agent writes code, runs the tests, sees the failures, and fixes them. That’s the feedback loop. If the loop doesn’t exist, the agent is flying blind and so are you.
My repos have strict TypeScript enforcement, ESLint, 90%+ test coverage gates, and SAST scanning. The agent hits a wall every time it tries to cut a corner. It can’t merge slop because the pipeline won’t let it through. That’s the environment doing the work, not me staring at diffs.
Design patterns matter more, not less
Something I didn’t expect: formal software design patterns are more important with AI agents than they are with human engineers.
DDD, layered architectures, clean separation of concerns. These give agents clear boundaries to work within. But knowing which patterns to enforce, and where, is the engineering skill. DDD makes sense for your backend domain layer. MVVM might be the right call on the frontend. An agent doesn’t make that decision for you. You make it, and the agent works inside those constraints.
I had the agent implement a full DDD repository pattern across all my APIs. It built the layers, the interfaces, the decoupling from the database. Now I can swap in an in-memory data repository and test every single response type without needing a real database. Multi-step behaviour tests simulating full user onboarding flows become possible. The agent built the boundaries, and now it works inside them. It physically can’t leak persistence logic into the business layer.
The trade-off is that DDD is verbose. It requires more discipline, more lines of code, and a higher level of architectural knowledge. Normally you’d only adopt it for large, complex codebases. It’s overkill for anything smaller. But when the agent bears the maintenance burden, that calculus changes completely. You get the structural benefits even on small projects, and the agent’s output quality goes up because the architecture constrains it.
This is where the “AI will replace engineers” argument falls apart for me. The agent didn’t decide to use DDD. It didn’t evaluate the trade-offs between a repository pattern and active record. It didn’t weigh the verbosity cost against the testing gains. I did. The agent executed the decision. That distinction matters.
Code review shifted
The most valuable part of my code review now happens before a single line is written.
I routinely spend 50% to 100% of a task’s total time refining the plan. A feature that takes Claude Code 30 to 60 minutes to implement can easily represent a day or more of manual coding. Spending 30 minutes refining that plan is time well spent. Claude Code’s plan mode gives you direct file access to edit the spec yourself, and chipping away at it between meetings is a natural fit.
This pattern is getting a name: spec driven development. GitHub released Spec Kit, AWS launched Kiro. Different approaches, same core idea. Refine the spec, not the output.
When I do review the agent’s output, I start with QA. By the time the agent has navigated type checks, linting, tests, and SAST, the feature is functionally complete. It works. The real question is whether my plan was detailed enough that the agent did everything I expected.
Most of my work is full stack, so a single feature can touch the API contract, database schema, implementation, and webapp all at once. I run everything locally using git worktrees so I can hop between features in progress. First pass is just kicking the tyres, since that’s high leverage and exercises the whole stack.
After that, I review tests and schemas, then trace call chains for specific areas I know would have been troublesome. Knowing where to look is the skill. I’ve been building enterprise systems for 15 years. I know where problems hide.
It is never one shot
I want to be clear about this because the “one shot” narrative is doing real damage to people’s expectations.
An agent completing a task does not equal a pull request. There is always refinement. Sometimes that’s a quick, targeted prompt to fix something specific. Sometimes the approach needs rethinking and you kick off a whole new plan and spec cycle. A single PR can be the result of multiple plans.
Knowing when to replan, when to discard, and when to reach for an ad-hoc prompt is an art. It comes from experience. From understanding your codebase, your architecture, and the kinds of mistakes agents tend to make in your specific context.
But obsessing over the initial plan is the only way I’ve found to come close to a one-shot result. The better the plan, the less refinement needed. That’s why I’m willing to spend as much time on the plan as the agent spends on the implementation.
The honest picture
Here’s where I land on all of this.
Running parallel AI agents has made me more productive than I’ve ever been. I’m shipping production software, with green tests, passing builds, 90%+ coverage, linting and SAST, at a pace that shouldn’t be possible for one person. The economics of what a single engineer can deliver have fundamentally changed.
But it’s not magic. It requires a well-structured codebase with real verification. It requires architectural knowledge to set up the right constraints. It requires the discipline to refine plans rather than accepting whatever the agent produces. And it requires accepting that your brain will be genuinely tired by the end of the day in a way that’s different from traditional engineering fatigue.
The real productivity gain isn’t AI. It’s parallelisation of AI, inside an environment that enforces quality, guided by an engineer who knows where to look.
Engineers aren’t going anywhere
If I know anything about how the economy works, it’s that productivity gains don’t lead to fewer workers doing the same amount. They lead to the same number of workers doing more. That pattern has held through every major technological shift, and I don’t see why this one would be different.
What is changing is the definition of the job. My day-to-day looks nothing like it did two years ago. Plan refinement, agent orchestration, environment design, verification architecture. I spend more time thinking about system boundaries than I do reading code. That’s a fundamentally different skill set from what most engineers were hired to do.
Jobs will change. Roles will move. Some people will leave the industry because the work they signed up for doesn’t exist in the same form anymore. That’s real and worth being honest about.
But the world just got more complicated, and it’s moving faster. Every section of this post describes something that requires deep engineering judgement. Setting up verification pipelines. Choosing architectural patterns. Knowing when a plan needs to be scrapped. Reviewing agent output with enough context to spot the subtle failures. None of that is going away. The agents made the world bigger, not simpler.
If your AI-generated code is bad, your engineering environment was probably already worse than you thought. The AI just made it visible faster. And if you think that means you need fewer engineers, you’re solving the wrong problem.