There’s no escaping it, even if you wanted to. AI is here—from LinkedIn and Facebook through the Google App suite and GitHub to image analysis and creation, and beyond. Even editing this post, there’\s an AI assistant in the sidebar assessing the quality of the headline! Not only is it here, the AI ‘buzzword’ is getting attached to more and more products and services almost daily (in some cases not actually correctly), and being essentially marketed as, for want of a better phrase, ‘the new hotness’. Implicit in that is that if you aren’t seen to be using it, you’re missing out.
Is it a good thing? That’s very much up for debate, and it’s pretty clear that no two people are going to agree on where it fits, where it’s appropriate and whether you should be using it at all. And even that may depend on what field you’re trying to apply it to.
Now obviously, here at Shadowcat, we’re in the business of producing robust, maintainable, quality code that we’re prepared to stand by. Where AI is concerned, we’re not trying to preach ‘programmer as high priest of code’. We’re certainly not trying to say “don’t.” But from a developer standpoint, what we are concerned about, as a company who take pride in the code we put out for our clients, are questions like “are we confident we know what this AI-generated code does?”, “does it do the right thing in all cases?”, “how much is it really costing?” and “do we understand it well enough to fix it if it doesn’t?”.
AI as labour-saving device
Tools like Gemini Code Assist, Claude Code or Google CoPilot are fabulous labour saving devices: they do the bits developers hate, like writing boilerplate for event handlers, classes or methods, they do refactoring tasks like ‘extract the common bits of these four handlers into a separate method’ far faster and usually more accurately than a human can, and they don’t get bored when you ask them to check every DB query in 100,000 lines of code to see if any of them could potentially trigger a known Y2038 bug. We can, though, relatively easily satisfy ourselves that they’ve done something like those examples correctly, and still save time over doing it ourselves.
It’s the next step up that potentially gets more concerning. We can say, for example, ‘write a complete handler to process an incoming event’, and follow it up with ‘now write some tests to prove it works in all cases’. It can be very tempting to do this. Obviously, we’d prompt in more detail than that, but, nevertheless… Push the button, look, the tests pass, job done. And that’s an extremely alluring concept. Except that in order for us to be sure the code does what it needs to, we have to (at the very least) check that the tests do cover every possible case. And even if we eliminate confirmation bias by using a different AI agent to generate the tests, we still need to validate those tests.
So what does this code do?
They say that there’s no-one who understands all of Windows’ code base any more, and there hasn’t been for decades. It’s probably also true of MacOS. But they’re both massive code ecosystems developed by an equally massive team of people, and there’s at least someone who understands each bit. Injudicious use of AI agent-based coding doesn’t lead to that, so much as (far sooner) to a ‘black box’ code ecosystem where, potentially, no-one at all is left who understands exactly what happens under the hood, i.e. a codebase that the customer don’t control or understand, and for which their organisation no longer, potentially, possesses the skills to fix. This can also lead to unpredictable and, worse, uncontrollable behaviour. Examples include:
- Amazon’s recent outage[1], which was caused by its own AI agent autonomously choosing to “delete and then recreate” a part of its production environment.
- The Replit AI agent deleted[2] an entire production DB during a code freeze when an engineer was experimenting with it. And then[3], ‘lied’ about it and later ‘apologised’.
- An AI agent working on the PocketOS repository[4] was denied access to something it ‘thought’ it needed in order to do what it was being tasked to, managed to find a production credential in the depths of the code repository, and used that to delete production data.
The scary thing about all of those examples. Smart money says that no-one saw any of them coming ahead of time. All of those were potentially company-ending events.
Research[5] also suggests that AI-agent driven coding is bad at maintenance. The researchers set 18 different AI coding agents to work on 100 codebases for 233 days of maintenance. 75% of the AI models introduced dangerous, codebase-breaking regressions.
Avoiding the runaway AI agent
All these could have been avoided. We’re not talking at the permissions and firewalls level, but at the process level, about not putting an AI Agent in a position where that even might happen.
Obviously enough, one way is by just not using AI. But for a lot of organisations, that’s not the road they want to go down, whether by a perceived need to be more productive, to be seen to be using AI or perhaps just to achieve things they don’t have the resources to do the ‘traditional’ way.
Cost is something we haven’t even touched on yet: the price of units of work, tokens, or whatever your preferred AI wants to call them, is in a state of flux[6]. On top of that, it’s not easy to know ahead of time how much what you’ve tasked the agent to do will actually cost, and the more complex and long-running the task, the more true that is. The end of the month invoice for usage could be a very nasty surprise.
The big question
Where do you want to sit on the complex risk vs. reward vs. control vs. cost graph? How can you leverage the advantages of AI driven coding without winding up with a black box whose functionality you don’t understand, that might one day decide that the best solution to what might not even be a problem is to delete your customer database. And all the backups.
The solution is fourfold. These rules are for your developers, and they will keep them happy, productive and, most importantly, in control. Which I guess means the first step is “don’t get rid of your developers.”
For vanity’s sake, I’m going to call these “Whitaker’s Laws Of AI Code Agent Usage”.
1. Never give it anything you wouldn’t give a junior developer.
To quote Martin Fowler[7], in part of an excellent article well worth your time, “AI assistants are like junior developers with infinite energy but zero context.” They will constantly need reminding what they were trying to do, BUT they will code at hyper-focussed breakneck pace as if they had an intravenous Red Bull drip, and by and large, if you make your expectations very clear, they will deliver the goods and save you a lot of typing.
Believe me when I say that this, potentially, is where your long-term wins are, in making good developers who understand the problem space more productive. Us grizzled old devs love architecting and designing code. Many of us loathe the donkey work of actually starting a project from an empty IDE window, and, perhaps equally surprisingly, we actually really enjoy chasing bugs in existing code. Even if it was written by an over-caffeinated, hyperactive silicon life form with the attention span of an amnesiac gnat. And there will still be a human at the end who understands what the code does.
But, in order to achieve this, you must…
2. Prompt in easily verifiable chunks.
There are three reasons for this.
Firstly, if you throw a large problem at an AI agent, then walk away and forget about it, it will come up with a solution. But, as previously discussed, you will have no idea how it got there, and no idea if there are obscure bugs buried in the code as a result of it not ‘understanding’ the details of the problem. The more control you relinquish, the more control you will never get back.
Secondly, if you chunk things up, the AI potentially has less context to get confused about for each step. I’m not necessarily saying here you have to do all the chunking yourself, but we’ll come to that shortly.
Thirdly, you have an audit trail of what it did.
Additionally to that? Prompt your AI in plan mode, so you can see what it’s about to do. And give it a list of things it’s not allowed to do without checking with you for permission, first among which is commit to your local code repository. Any time it tries to end around those, reinforce them, slap it hard. And always push to the remote repository by hand, yourself, so you maintain control over when the AI code gets let out.
3. Trust (maybe) but verify. Independently.
Over time, you will start to get a grasp of what the AI can and can’t get right first time. It becomes tempting, then, to let it away with more and more. Don’t. Check its work. Moreover, don’t let it come up with a set of tests that it claims passes, and take that on faith.
In a former job, I was given some code developed by a consultant to render a 3D image of an object. It was, for the time, blisteringly fast, and had impressed several supervisors, till my boss smelt a rat and handed it to me. It was, in fact, extremely fast at rendering the object that came with it as test data, because it was explicitly coded to render that and had no way of rendering anything else. The analogy is, I hope, obvious.
But, I hear you asking, aren’t we just wasting valuable developers’ time verifying what the AI has done? Two answers. First, if you consider keeping your code maintainable is a waste of time, you are making yourself a world of hurt for the future. Secondly? Know your developers. We can read and review code a heck of a sight faster than we can write it. Certainly enough to spot a bad “code smell” that rings warning bells and gets us to dive deeper. Leverage those skills.
4. For the love of Knuth, do not give it commit privileges, let alone deployment rights, to anything customer-facing.
I probably shouldn’t need to even say this. If you still don’t understand why, scroll up a few pages and reread the links to what can happen if you do. Also, this means you need to practice sensible security policies. Production API keys and passwords do not, must not, live in your code repository where an AI that’s hallucinated its way into deciding it needs production access can get at them. Because it can and it will. The same goes for deployment scripts. And while you’re at it, configure your repository to block its access to the release branch.
Planning tools for your AI agent
Law 2 says “Prompt in easily verifiable chunks”. If the idea of micromanaging your enthusiastic coding puppy sounds like work, there are some very sweet tools being developed, such as superpowers[8] and OpenAgentsControl[9] which leverage the same AI tools to do that for you.
To take superpowers as an example, it will walk you through a project starting with a high level statement of what you want it to do (in what language/framework etc), and ask questions to clarify the project definition. It will then break the problem down into tasks, and then again into subtasks, while pausing for clarification and approval. Once you are happy, the framework will handle the work of feeding the subtasks to subagents under your supervision, which essentially behave like a gaggle of the aforementioned task-fixated, Coke-swilling junior devs. And again, you get to leverage the bits the AI does well, while keeping ultimate control and safety of your codebase in the hand of the people who still know where the off switch is.
In conclusion.
AI is here.
Either it’s here to stay, and it’s going to be a part of the way code is built going forward for the foreseeable future. In order to have any control over that code you’re going to have to work out a way of managing the AI now, or rescuing the mess you end up in is going to cost you a fortune in paying real developers to figure out what you’ve ended up with further down the line.
Or at some point in the future, the market will crash due to spiralling costs in excess of capital investment, and the AI tools you relied on to build your code aren’t going to be there to maintain it.
Come what may, you might just need us.
- https://www.theguardian.com/technology/2026/feb/20/amazon-cloud-outages-ai-tools-amazon-web-services-aws
- https://fortune.com/2025/07/23/ai-coding-tool-replit-wiped-database-called-it-a-catastrophic-failure/
- https://www.reddit.com/r/OpenAI/comments/1m4lqvh/replit_ai_went_rogue_deleted_a_companys_entire/
- https://securityboulevard.com/2026/04/how-a-long-lived-api-credential-let-an-ai-agent-delete-production-data/
- https://arxiv.org/html/2603.03823v1
- https://www.18aproductions.co.uk/2026/03/24/how-much-does-ai-actually-cost-and-should-you-say-please/
- https://martinfowler.com/articles/reduce-friction-ai/
- https://github.com/obra/superpowers
- https://github.com/darrenhinde/OpenAgentsControl
AI Disclaimer
No over-caffienated sillicon lifeforms were used or harmed in the making of this post.


No responses yet