My Current AI workflow

Some folks I pair with keep telling me my Claude Code setup is advanced. It isn't.

The stack is boring: Claude Code, some skills, thinking mode and model switching, goals, dynamic workflows, Chrome MCP for browser control, and Axe for iOS simulators. None of it is magic, and the tools are not really the point. The point is in the way I engage and I use the model, as a capable but somewhat untrusted collaborator.

I do not write code anymore. And frankly, I’m almost not reading it either. Most of the work now is steering: specifying behaviour, interrogating the product, the data models, constraining the solution, and making the model prove that what it built actually works.

My job is to keep the model pointed at reality based on my experience and understanding.

Structure shows up late

Most AI workflows people share start with a big upfront plan (or a very prescriptive ticket). Write a detailed brief, scaffold every step, run it. Treating the model like a scribe without context.

I usually do the opposite.

At the start of a task, I often do not know what the real problem is yet (and sometimes, I pretend to not know), so I start loose. I give the model a direction, not a blueprint.

Something like:

Explore this codepath and tell me what seems relevant to the bug. Do not change anything yet.

Or:

Look at this feature request, inspect the surrounding code, and tell me what implementation shape you see.

The first pass is reconnaissance, not an attempt to solve anything. I want the model to surface the shape of the work before I commit to a plan.

Once the task settles into something recognizable, I write that shape down as goals and, recently, a dynamic workflow. The plan comes out of the early work instead of going in ahead of it.

The loop™

The workflow is basically this:

Start loose.
Let the first pass expose the real problem.
Interrupt when it drifts.
Question its assumptions.
Force it to prove claims against the code.
Constrain it away from unnecessary code.
Once the shape is clear, write it down as goals and a workflow.
Implement with tests.
Verify in the actual product.
Fix what breaks.
Verify again.

That is most of it.

What looks advanced from the outside is usually just staying in the loop instead of sending one big prompt. And probably the fact that this workflow allows me to keep my understanding high while I steer with context and experience.

I interrupt constantly

I interrupt the model a lot.

The moment i detect drift, I smash the escape key. If it starts over-engineering, I stop it. If it is reading the wrong part of the codebase, I redirect it. If it makes an assumption that feels shaky, I make it verify that assumption before continuing.

Typical interruptions look like:

Stop. You're solving the wrong layer. Stay in the API boundary for now.

Do not implement yet. First prove where this state is coming from.

This is too broad. Find the smallest change that would satisfy the test.

You're assuming this is unused. Search for real call sites before deleting it.

That kind of interruption is not a failure of the workflow. It is the workflow.

The model is useful because it moves fast. But fast in the wrong direction is still wrong. My job is to keep tightening the loop between what it just learned and what it should do next.

I make it earn confidence

A lot of my prompting is basically Socratic questioning.

I ask annoying questions on purpose:

Why do you believe this is the right layer?

What evidence do we have that this design owns the behaviour?

What would make this architecture wrong?

What are you assuming that you have not verified yet?

Sometimes I deliberately pretend I understand less than I do:

I am missing the obvious thing. Why does this state change happen here and not one layer above?

That is not because I need the simplified explanation. It is because forcing the model to explain the work plainly often exposes shaky reasoning.

If it cannot prove the claim in simple terms, I do not want it building on that claim.

I constrain it away from code

The model's default failure mode is often to write more code. My default response is to make writing code harder. I’ve been experimenting with different methods to make models write less code, I’ll share once they’ve been proved to work consistently.

I will say things like:

Do not write implementation code yet. Spend this pass only trying to disprove the plan.

Or:

You are allowed to change at most one production file. First argue whether that constraint is possible.

Or:

Before adding a new abstraction, prove that the existing ones cannot handle this cleanly.

Or:

Optimize for deleting code or reusing existing structure. New code is the last resort.

These constraints change the behaviour. They make the model search harder, reason longer, and verify more before it reaches for implementation.

I keep begging for composability:

What is the smallest composable change here?

Can this extend an existing boundary instead of creating a parallel one?

What part of this is policy, what part is mechanism, and are we mixing them?

The model is very willing to build the first thing that works. I am usually trying to make it prove the first thing that works is also something I want to own.

I escalate thinking when the next step becomes a judgment call

My default is high effort.

I escalate when the model is choosing an abstraction, when the path stops being obvious, when the task crosses multiple systems, or when a wrong turn would waste a lot of time.

A decent starting rule is:

Stay on high effort while the next step is obvious. Escalate when the next step becomes a judgment call.

That rule is not perfect, but it is good enough until your instincts get better.

Verification is the whole point

This is the part I care about most. For code changes, I usually want red-green TDD. First make the model write or update a failing test, then make the smallest change to pass it, then run the targeted test.

For product work, I make the model use the product. On web, that means Chrome MCP. On iOS, that means Axe controlling the simulator. It clicks around, fills forms, waits for screens to update, checks error states, and confirms the interaction actually works.

Lately I’ve been using Codex more with computer use and Claude has learned to take screenshots and video recordings on my mac, I’ve let it do computer use too but it’s slower compared to Codex. Still good though.

A typical instruction is:

Follow red-green TDD. Once tests pass, verify the actual user flow with computer use. Use Chrome MCP for web or Axe for iOS. Click through the happy path, the obvious failure path, and any loading, empty, or disabled states. If anything fails, fix it and rerun both the tests and the UI flow.

Since fable came out two days ago, I no longer have to specify this. Fable is fantastic at verifying its own work.

Tests catch logic. Type checks catch obvious mistakes. But UI verification catches the dumb real stuff: the button is disabled, the loading state never clears, the form submits but nothing happens, the keyboard covers the input, the success screen is unreachable, the API succeeds but the UI does not update.

The whole thing

Start loose. Interrupt when it drifts. Ask annoying questions. Make it prove its claims. Constrain it away from unnecessary code. Beg for composable design. Escalate thinking when the next step becomes a judgment call. Write the workflow down once the shape is clear. Use tests. Make the model operate the product. Turn repeated verification prompts into skills.

– Ismael.