A developer orchestrating an AI-powered coding pipeline with holographic nodes, audit checkpoints, and interconnected server stacks

Late to the Game, Changed by It

8 April 2026 at 14:52 BST12 min read

Listen to this post0:00 / --:--

I discovered Claude Code in early February. By most measures, I was late — I follow Theo at t3.gg, I watch ThePrimeagen, I consider myself reasonably plugged in — and somehow I still missed the wave. That's the part that got to me. The people I rely on to stay ahead of the curve had been talking about this for months. The posts were there, the videos were there, and I had absorbed exactly none of it. There's something instructive in that. The tooling landscape moves faster than even attentive people can track, and the gap between knowing something exists and actually integrating it into how you work is wider than it looks from the outside. I had the information. I didn't have the context to understand what it meant until I used it.

Then I used it, and the gap closed violently.

What followed was a recalibration I didn't expect. Not just in how fast I could ship features, but in the quality of what I was shipping. Code audits, implementation reviews, testing workflows, refactors on old projects — all of it transformed. I went back to old repos and watched the model catch bugs I had completely missed. That part stings a little. But the sting is informative, because it points at something true about how we work as developers that we don't often say out loud.

As a human being, you cannot hold the entirety of a codebase in your head simultaneously. This sounds obvious until you sit with what it actually implies.

We don't experience our own codebases as incomplete maps. We experience them as known — familiar territory, navigable by intuition. You know roughly where things live, how features are wired together, which modules you can trust and which ones have landmines. But that felt sense of knowing is not the same as comprehensive understanding. It's more like expertise in a city: you know the routes you use, the landmarks on your commute, the back streets you discovered by accident. The parts of the city you've never had reason to visit exist in your mental model as approximate placeholders, not detailed maps.

Cognitive load research formalised this decades ago. Working memory is a bottleneck. The mental model you maintain of a complex system is always a compression — lossy by necessity, with the losses concentrated in precisely the places you haven't recently touched. Nuanced edge cases, the interaction between a module you wrote six months ago and the one you wrote last week, a subtle assumption buried three abstractions deep — these are the things that escape you not because you're careless but because the architecture of human memory makes it structurally impossible to keep everything live at once.

I can put a number on this. plutarc — the algorithmic trading platform I've been building solo — is currently sitting at around 82,000 lines of actual logic across 15 repos. Not counting closing braces, blank lines, comments, or imports. Just substance. No single human mind holds 82,000 lines simultaneously. I have a rough mental model of the system — I know the dependency graph, the interface contracts, the failure modes I've already encountered — but the model is always a compression, always lossy at the edges. Large-context models don't have this problem in the same way. A million-token window holds more of your codebase simultaneously than you ever could. It doesn't get tired. It doesn't have a commute it knows better than the rest of the city. This is the actual value proposition, and it's more interesting than "it writes code faster." It's a different cognitive profile brought to bear on the same material — one that compensates for the specific ways human attention degrades over a complex system.

This is the frame I now carry into every session, and it shapes the most important habit I've built: planning before touching a single line of code.

My flow for any meaningful change runs like this — structure a plan, audit the plan, implement, audit the implementation. Written out, it sounds bureaucratic. In practice, each step earns its place for reasons that only become obvious after you've skipped one.

The plan gives the model the context it needs to work intelligently rather than plausibly. There's a difference. A model with thin context will produce output that looks correct and holds together locally — it just won't be right for your system, your constraints, your existing architecture. The plan is the act of loading the map. The audit on the plan is a second, scoped pass over that map before anyone starts moving — it catches what the initial context-gathering missed, and it catches it cheaply, before implementation has committed you to anything. The final audit after implementation is the sweep for whatever slipped through anyway.

I arrived at this process not through foresight but through the specific failures that precede it. And I've since hardened it into something more formal. plutarc has ten exchange adapters — BitMEX, Bybit, Binance, Kraken, OKX, KuCoin, Deribit, Gemini, Phemex, WooX — each an independent package with its own WebSocket protocol, quirks, and failure modes. The repos had grown too large to audit manually in any reasonable time. So I built an audit pipeline: run an audit via Claude Code, audit the audit, implement the fixes, repeat the cycle until the only remaining issues are minor and non-blocking, then bump the version, commit, tag, publish the package, and audit its integration in the bot repo.

The pipeline is grounded in a schema — a formal document encoding behavioural contracts, implementation pitfalls, and exchange-specific caveats accumulated across every adapter that came before. Things like ordering guarantees between execution and order updates. The correct handling of position close events. The timing differences in listen key refresh patterns between exchanges. Not the kind of knowledge you find in API documentation — the kind you accumulate through hard experience and then codify so it doesn't have to be rediscovered from scratch on each new adapter. By the time I reached the tenth exchange, the schema had captured failure modes from the previous nine. The audit wasn't discovering problems; it was checking for known ones.

That's what the plan-audit cycle looks like at scale. It doesn't eliminate surprises, but it front-loads the thinking, encodes the institutional knowledge, and produces something noticeably more robust than diving in and hoping the model riffs correctly.

The plan, though, is only as good as the codebase it's reasoning about.

The second habit I've enforced — refactor often — is less obviously about AI than it sounds. Clean, modular code is good engineering practice for reasons that predate language models entirely. But working with AI tooling makes the cost of a messy codebase concrete in a new way. When the model has to wade through bloated files, dead code, and tangled abstractions to locate the relevant surface area, quality degrades. You can watch it happen. The output gets more hedged, more generic, more likely to produce something that fits the vibes of the surrounding code rather than the actual logic of the problem.

I made the decision to go multi-repo early in plutarc's development, separating the exchange adapters, the bot engine, the shared strategy components, the type definitions, and the orchestration layer into distinct packages with clean interface contracts. At five thousand lines it felt like over-engineering. At 82,000 it feels like foresight. Each repo has its own cognitive budget — the model can load what it needs without dragging in the accumulated weight of everything adjacent to it. The adapters all sit behind the same interface abstractions: IAccountApi, IMarketApi, ITradesApi. The orchestration layer doesn't need to care what's underneath. That structural decision, made early, is still paying dividends in the quality of every session I run today.

The context window is a shared workspace. What you put in it determines what you get back. Refactoring stops being a nice-to-have and becomes part of how you maintain the quality of the collaboration. The cleaner the workspace, the more precise the thinking.

This is the metaphor I keep returning to: context is the thing you're managing, always, at every level. In your own mind, in the model's window, in the structure of the codebase itself. The discipline of AI-assisted development turns out to be, in large part, the discipline of context hygiene — knowing what belongs in the frame and what doesn't, and maintaining that boundary actively.

Which brings me to MCP tooling, and a more counterintuitive lesson.

The promise is obvious — richer integrations, more capable agents, a model that can reach out and touch more of your environment. I came in expecting to use it heavily. What I found instead was that expanding the model's reach often came at the cost of the quality of its reasoning. More connections meant more surface area for distraction, more partial context from more sources, more opportunities for the model to produce something that addressed the shape of the problem without addressing the problem itself.

The analogy I'd reach for is the WooX adapter. WooX doesn't have a centralised, well-documented API — their v1 and v2 documentation lived on one page and didn't support 3-minute candles natively on the klines endpoints. So I implemented a 3-minute candle synthesiser. Then discovered, on a completely different area of their site, that WooX had a v3 API which supported 3-minute candles natively — and went through a full rewrite. The synthesiser had been technically responsive to the information available. It just wasn't correct, because the information available was incomplete. More surface area, partial context, plausible output, wrong answer.

That's what over-connected tooling does to a session. I now keep MCP tooling minimal, or skip it entirely. The best sessions I've had are ones where the context is deliberate and bounded — where I've made active choices about what the model needs to know, not passive ones about what it can reach. The constraint is the point.

The same logic governs how I maintain CLAUDE.md.

The temptation is to make it comprehensive — a full specification of how the model should behave, detailed instructions for every pattern and preference and implementation detail. I've landed in the opposite place. CLAUDE.md should be generic but current: high-level behavioural guidelines, updated regularly, without trying to capture everything. Implementation-specific details belong as comments next to the code they're relevant to, where the model actually encounters them in context. The exchange adapter schema I described earlier follows this principle — it's not in CLAUDE.md, it's loaded as explicit context for the audit sessions it's relevant to. Separating these two levels keeps both cleaner. A CLAUDE.md that tries to do everything becomes another source of noise, diluting the signal it's meant to provide.

Context management at the conversation level is probably the thing I've thought about most, and the thing with the least obvious right answer.

The principle I've arrived at has two sides that pull against each other. When a new feature is genuinely adjacent to something you've already built in a session, continue the conversation. The context is already loaded — the model understands the architecture, the decisions that have been made, the constraints that are live. Starting fresh and rebuilding that context for a small related change is wasteful, and the reconstruction is always slightly lossy. The existing thread is a resource.

But the inverse matters just as much. When you're starting something genuinely new, clear the window. Context windows accumulate cruft — old decisions, superseded implementations, threads that were relevant then and aren't now. An API key rotation implemented inside a chat whose entire context is structured around auth flows will produce a model reaching for auth-adjacent patterns that don't fit the actual problem. The accumulated frame steers output toward the wrong part of the map.

The discipline is recognising which situation you're in. Continue when the context is an asset. Clear when it's becoming an anchor.

There's a tension I haven't resolved, and I want to name it honestly.

The audit cycles I described — running two concurrently, cycling through passes until only minor issues remain — are efficient in ways that would have been impossible without this tooling. What took me days now takes hours. But I participate in every one of them manually. I made that choice deliberately. Fully automated overnight runs would be faster. I'd also lose the mental model of my own codebase. The day I stop understanding what's changed in the adapters is the day debugging a production issue at 2am becomes genuinely hard. More than that: there are decisions Claude Code makes that technically satisfy the schema contract but approach the problem in a way that doesn't fit the broader architecture philosophy. Those decisions don't surface in audit reports. They only surface if someone who understands the system is paying attention.

The model is a collaborator, not a replacement for ownership. I want to go faster. I also want to still be the person who built this. The manual participation is how I hold both.

What I didn't expect from that tension is how much it has sharpened my understanding of my own systems. Because the quality of what the model produces reflects the quality of the context I give it, I'm constantly being pushed to articulate decisions I had left implicit, clean up ambiguities I had been navigating by feel, maintain the kind of structural clarity that makes the collaboration actually work. The feedback loop turns out to be a form of discipline. The mirror is consistently, usefully honest.

Context is everything. In your head, in the window, in the code itself. Manage it deliberately, and the tool becomes an extension of how you think. Let it accumulate unchecked, and you're reasoning against your own noise.

Late to the Game, Changed by It

Related posts

Codifying Context

Still on Arch, btw

What a Platform Owes Itself