No matter what you think about GenAI, with the advent of tools like Cursor and their ability to handle large-scale code changes across multiple files in existing codebases that go beyond “rename symbol”, one thing is clear: most coders are going to be using AI assistants going forward. Although i’m properly tired of hearing the breathless, over-the-top hype, we’ve just picked up a rather fancy new hammer, and with a decent mental model of which situations look like nails we’d be dumb not to use it.
Of course - there’s a mountain of nuance in here - what about junior engineers ? - the environmental impact - ? SENTIENCE? - but it’s been done to death, and I don’t have much to add .
What I am interested in is - how can we use the new hammer … but better-er?
What we mean when we talk about “code slop”
Moving beyond simple “scaffold me this basic thing up” sort of use cases - where coding assistants tend to be uncontroversially effective - I often find myself doing things like this:
Hey Claude 👋 ! Go and make this change that touches my entire project
… and whatever you do, don’t mess it up
… 5 minutes and some uncountable number of tokens later, the thing’s happened, my tests pass as does the build, and it kinda looks like it works. The tool’s done a great job of bouncing off of the compiler, linter, and unit test guard-rails, as well as any “rules” 1, and ended up with something that, at first glance, is what I asked for.
But as I dig into the diff, I discover all sorts of gnarly edges that I need to go polish - wild dependencies have been introduced! Code has been added to modules in which it clearly does not belong! In rust, mutexes and 'static
have been mysteriously added to things that do not need them!
In practice, there often remains a big gap between it builds and the tests pass and this is a quality thing I want to appear next to my name on a PR. Ultimately, our brain remains the only tool we can use to fill that gap, and in complicated cases, the amount of effort involved can approach the amount of time saved by using the assistant 2.
What This Is Actually About: Static Analysis
Here’s the buried lede: we need more-and-better static analysis. Things that the assistants automatically bang into, only to get pushed back in the right direction. Compilers, unit tests, and linting, are not enough. On the other end of the spectrum, tools like Semgrep, CodeQL and miri are excellent and cool, but are not things we can quickly and easily drop into every project we need.
There’s a big gap in the middle of the tools spectrum. What I’m really looking for is alignment between the AI’s understanding and the developer’s intent. We need to be able to describe the shape of our project - in a machine-verifiable fashion - and use this as an extra signal to guide our assistants in the right direction.
One example of what this could look like in practice is architectural unit testing 3 - effectively, a way of describing how your project should be laid out in a test, so that you can enforce structural concerns, and project-specific rules. In my experience these tools were fairly popular just before we all went cloud-crazy; the time is now ripe for a return.
Tools like the aforementioned Semgrep and CodeQL could foreseeably be used as the foundation for some of this stuff - but I think what we are looking for is something declarative - “my code should look like this” - rather than procedural - “go and find all the impl
s of trait X
, and check that …”.
Ultimately what we need is more, easy ways to encode the developer’s mental model of what “good” looks like - for a particular project - into a machine checkable form - that go beyond “it compiles” and “we’re using tabs instead of spaces”.
Closing Thoughts
Imagine a future where we can encode a much more complete model of how our could should look and work alongside the code itself, and how much easier this will make the post-assistant-clean-up!
I’ve been playing around a bit with this - and have some ideas i’d like to share soon. Watch this space.
Footnotes
-
Rules are surprisingly effective, but in comparison to the compiler and linting, they are more “gentle encouragement” than “reliable guard rail”. As an aside, I have never had to tell a junior eng that “1. fix this unit test. 2. Deleting the assertions is not a fix.” ↩
-
or even surpass it, if the thing has gone wrong! ↩
-
Java’s ArchUnit and .net’s ArchUnitNET are two great examples ↩