What’s not working for me with AI-assisted coding

I started coding when I was eight. Not because it was useful. Not because it was strategic. Just because I wanted to see what I could make the machine do.

2026 is the first time since then that coding feels like that again. Everything feels possible. The difference is that now I have judgment. I know the cost of chasing the wrong idea. I know what breaks at scale. I know what compliance audits look like.

This isn’t a post about how great AI is. You’ve read those already.

This is a collection of what’s not working for me when building high-scale, compliant production systems with AI in the loop. Real systems. Mature codebases. Years of business logic. Edge cases that only exist because a customer once did something weird in 2019.

If you’re already curious, you don’t need convincing. My goal is simpler: to show you where AI-assisted coding still falls short for me, and how I’ve had to adapt around it.

For the record, I’m not trying to draw big conclusions. I’m trying to articulate friction. The subtle ways AI-assisted coding doesn’t quite fit the realities of production software, at least not yet. If anything, I hope this sparks your own reflection.

Tab completion

For non-technical folks, this is the feature where your editor tries to finish your sentence as you type, powered by AI.

In theory, this should be magical. In practice, it feels constrained.

Large language models perform best when they have rich context. They need to see enough of the system to reason about intent, patterns, and trade-offs. Tab completion, by design, operates on a narrow slice of reality. It leans heavily on symbol lookup, the language server, and whatever happens to be nearby in the file.

Some editors attempt to expand that window. They may pull in open tabs or recent chat history. But you are still bounded by the editor’s system prompt and whatever context it decides is relevant. In mature codebases with years of accumulated decisions, that narrow framing often produces suggestions that look plausible but miss intent.

I have not found a reliable workflow around this.

Ironically, the most dependable autocomplete for me today is still the old symbol-based LSP. It is predictable. It does not try to be clever. It does not hallucinate architecture.

Tab completion with AI feels impressive in isolation. In a complex production system, it often feels like it is guessing.

So how am I using AI then?

I’ve mostly stopped expecting magic from inline tab completion.

Instead, I’ve leaned into agentic workflows and CLI-based code generation.

When I use AI, I want it to see more of the system. I want to give it explicit context. I want to step out of the micro-suggestion loop and into a deliberate interaction. A prompt. A diff. A review cycle.

In other words, I use AI where I can reason about its output, not where it interrupts mine.

That separation matters to me. Tab completion inserts AI directly into the act of reasoning. Agentic workflows let me choose when to collaborate.

My tab completion stack

Best experience: JetBrains. The integration feels the most coherent. Its memory footprint is what doesn't make it my daily driver.
Daily driver: LazyVim (nvim) with mostly defaults, no AI completion. It is predictable, fast, and transparent. I know exactly what it is doing, and more importantly, what it is not doing.

Tab completion is the smallest surface area of AI-assisted coding. The problems get more interesting once you zoom out.

User journeys

In a production system, a user journey is not just a request and a response. It is a chain of domain decisions, validations, side effects, and historical compromises. It reflects compliance constraints, previous incidents, and trade-offs that were made years ago.

AI is very good at generating a correct piece of code. A clean handler. A refactored service. A new endpoint that compiles and passes tests.

What it does not reliably capture is the intent behind the system. I’ve had AI suggest removing a validation that existed because of a past incident. In isolation, it looked redundant. In reality, it was there for a reason.

It does not know why a certain validation exists.
It does not know which abstraction was deliberately avoided.
It does not know which part of the system is fragile.

When you work on high-scale, compliant systems, you are rarely writing new logic in isolation. You are modifying a living organism.

AI sees a pattern. You see a journey.

The risk is not that the generated code is obviously wrong. The risk is that it is locally correct and globally misaligned. And that misalignment compounds over time.

When I work on a system, I am actively building a mental model of the user journey and how it maps to the code. That model is the real source of correctness during my programming sessions.

What has worked best for me is treating the LLM as a structured field note taker rather than an inline co-pilot. I use custom prompts and tools like Serena to capture important constraints, decisions, and discoveries as we uncover them. Those get stored in a persistent markdown memory.

Instead of assuming the model understands the system, I progressively disclose context. I build directed markdown graphs of documentation, notes, and domain insights, and expose them intentionally as the work evolves.

AI is far more useful to me when it operates inside a context I’ve curated, rather than one it has inferred.

My memory management stack

Serena: a local AI assistant framework I use to capture structured context and decisions as I explore and reason about the system. It becomes the active interface between me and the model’s session.
QMD (Quick Markdown Search): indexes and retrieves relevant parts of my markdown knowledge base. When I’m generating code or running agentic workflows, QMD helps surface the exact context I need instead of the model guessing.
Obsidian: where I preserve the long-term graph of domain knowledge, architectural intent, user journeys, and historical context. It’s the repository of truth that outlives ephemeral prompts.

The point isn’t the tools. It’s externalized intent.

Production systems accumulate decisions over years. Constraints, trade-offs, edge cases. If you don’t surface that intent deliberately, AI defaults to pattern completion.

I don’t assume the model understands the system. I decide what it should know, and when.

The bottleneck in 2026 is no longer implementation. It’s judgment.

"Taste"

When I started coding, I didn’t have taste. I had curiosity. Taste came later.

To me, taste is knowing when not to abstract. Knowing when to stop. It’s that subtle signal when something smells off, when you can feel an inconsistency even if you can’t yet articulate it.

Taste is not pattern matching. It is judgment shaped over time. Incidents debugged at 3 a.m. Audits survived. Migrations that went sideways. Features that looked elegant and quietly failed in production.

Over the years I’ve learned that correctness is not enough. Consistency is not enough. Passing tests is not enough.

There is a coherence that mature systems develop. A direction. A set of constraints that are rarely written down but deeply felt.

AI accelerates exploration. It assists with generation. But it does not replace judgment. It does not sense architectural drift. It does not recognize when something is technically valid but culturally wrong for the codebase.

That responsibility still sits with me.

And that’s not a limitation of AI. It’s the opportunity.

If anything, AI workflows have made taste and intuition more valuable, not less.

The leverage is no longer in typing faster. It’s in deciding better.

– Ismael.