AI coding tools are becoming inescapable. That’s why we need to use them better.

Spencer Goldade

I use Claude Code for hours every day. Over the past two weeks I’ve run 176 sessions, made 265 commits, and hit an 88% achievement rate across tasks. A strange thing to admit when I’d still consider myself anti-AI in many respects. You see, I started to take the ol’ “If you can’t beat ’em, join ’em” path. I’ve been building tools, workflows, and frameworks to improve how people work with AI, to reduce token use (environmental impact), improve speed to outcomes (reduce the need to use it extensively), and mitigate the existing issues with it, like introducing under-considered products and features.

I’ve also watched this industry barrel into AI adoption with the same recklessness it brought to data privacy, accessibility, and consent. Paste a prompt, accept the output, ship it. No review, no guardrails, no consideration for what that costs. And I mean cost in every sense.

Every token is compute. A sloppy prompt that triggers five rounds of “no, not like that” burns five times the energy of a clear one. Multiply that across millions of developers, and you get a real infrastructure footprint for… nothing. AI-generated code that nobody reviews ships bugs, security holes, and architectural rot. It compiles. It looks right. Those aren’t the same thing, and the gap is where your users get hurt. And the tools reproduce whatever patterns they’re trained on. They don’t spontaneously consider accessibility, edge cases for marginalized users, or downstream effects of design decisions. If the human isn’t thinking about those things, nobody is.

I wrote about this recently in We need better guardrails on AI, yesterday. The executives pushing uncritical adoption aren’t just burning people out, they’re putting people in danger. Structure enables autonomy. Without it, the tools are chaotic at best and harmful at worst.

So what does structure actually look like?

Build a system, not a habit

Think about onboarding a junior developer. You wouldn’t hand them a laptop, point at the repo, and say go. You’d give them documentation, a style guide, CI checks, code review, and architecture diagrams. You’d set up the environment so they can succeed and so their mistakes get caught before production.

Why would you treat an AI tool any differently? The failure modes are the same: inconsistent output, repeated mistakes, skipped steps, and architectural violations. So the fix is the same: Encode your standards; Enforce them automatically; Check the results.

That’s what I’ve been doing over the past several months. I’ve layered rules, skills, hooks, MCP servers, and structural indexes around Claude Code. Each layer exists because something went wrong, and I made sure it couldn’t go wrong that way again. The system grows from paying attention to friction. As I learn, the next step is to share my insights and tools with you, in the hope that they help.

So, here we go. I’ll start simple and not assume what kind of audience is reading this.

CLAUDE.md and rules

Claude Code reads a file called CLAUDE.md at the start of every session. Think of it as a team wiki: architecture boundaries, naming conventions, workflow expectations, and debugging protocols. Mine is about 180 lines.

Every line exists because of a real failure. For example, Claude proposed an architecture without reading the existing system map, so I added a rule requiring it to check “the codex” (another tool I made) before presenting designs. It marked tasks done with unchecked acceptance criteria, so I added a rule that every sub-bullet is a deliverable. It kept using a term I’d renamed months ago, so I added a rule banning the word outright.

Beyond CLAUDE.md, I keep project rules in .claude/rules/ and global rules in ~/.claude/rules/. Project rules cover things like feature design philosophy and the requirement to self-critique designs against design principles before presenting them. Global rules follow me across every project: my spec and plan writing style, an honesty policy for professional content, and CSS conventions that prevent accessibility failures.

The logic is simple here: Every rule that prevents a correction prevents a round trip. Every round trip not taken is tokens saved, time saved, and output that’s closer to correct the first time. My session data shows wrong-approach and buggy-code corrections as my top friction sources. Each rule I add chips away at that.

/Insights

Typing /insights in Claude Code, we will provide you with a report about how you’ve been using Claude. In my opinion, it’s one of the features everyone should be using at least once a week. You can literally tell Claude to assess your insights file to recommend new Claude rules, skills, hooks, and so on to help you work more efficiently, while reducing token use. I have been able to vastly reduce token usage just by doing this one thing. It kind of reminds me of a project or Agile retrospective.

Skills

Skills are markdown files that Claude Code loads on demand via a slash command. I’ve built several, and each one targets a specific way I was wasting time or tokens, or perfecting prompts to achieve maximum impact.

At first, I kept giving the same instructions session after session. How to archive a completed task. How to triage a discovery. How to review logs. Each repetition cost tokens and introduced variance, because I’d phrase things slightly differently and get slightly different results. Skills I made special for these issues, like task-complete and discovery-triage encode the workflow once. Identical instructions every time, loaded only when needed instead of sitting in the system prompt.

Complex workflows have steps that get skipped when you’re moving fast. Did you update the docs? Run the accessibility audit? Check for drift? File the blocking task, the acceptance criteria called for? My task-complete skill runs commit, recap, checks and logs project drift signals, archiving, and doc updates as a single pipeline. My design-qa skill runs a structured accessibility and UX review before anything user-facing gets marked done. The checklist is non-optional because it’s in the skill, not in my head.

Without a skill like next-task, Claude might wander the backlog, pick tasks out of order or grab ones whose prerequisites aren’t met. Not anymore. Without several gap skills, Claude guesses what the project or feature is missing instead of diffing against a coverage manifest. Good skills turn open-ended exploration into bounded work.

My implication-scan skill quickly checks backlog items for any task that implies the need for something else and forces a new task to be created or for the language to be tightened up. Vague language causes Claude (and regular old designers and developers) to stumble.

Claude gets particularly confused as a project evolves. If you’ve started to add new technology, swap something out, upgrade something, ol’ Claude will continue to use the same thing over and over again. It gets old pretty fast. But specialized skills for drift help track, make tasks for, save context, and tackle project drift.

I also use a third-party plugin I swear by called Superpowers that add skills for brainstorming, TDD, systematic debugging, plan writing, and code review. The brainstorming skill in Superpowers alone has probably saved me the most tokens of anything. Without it, Claude implements the first idea that comes to mind, which often isn’t great and requires rework. With it, I get multiple approaches presented as understandable ideas with trade-offs, a spec, and my approval before a single line of code gets written. The cost of brainstorming is a fraction of the cost of discovering mid-implementation that you’re building the wrong thing.

Hooks

Claude Code supports hooks, which are shell commands that run at specific lifecycle events. I use three main ones.

On SessionStart, two scripts run. The first primes a local code-graph server and injects CONTEXT.md (a lightweight handoff doc that captures what I was working on, recent decisions, and next steps). The second regenerates my “AI codex,” a set of compact structural indexes of the codebase. Without these, the first 5-10 messages of every session would be Claude asking “What are we working on?” and reading files to figure out where things stand. Pure waste.

On PreCompact (when the conversation gets long enough that Claude compresses earlier messages), the code graph gets re-primed, so nothing important is lost.

On Stop, cleanup runs.

These are small automations. But a SessionStart hook that saves 5 messages of orientation across 176 sessions is 880 messages of back-and-forth that didn’t need to happen.

How I’ve been evolving these hooks lately is by working up command-line-based tasks that Claude can invoke rather than doing things itself, again saving tokens. If you notice Claude doing a certain task over and over again, ask it to build a tool to replace it that runs and waits for the output.

MCP servers

MCP (Model Context Protocol) servers give Claude access to external tools and data. I run four.

Dual-graph (A.K.A. Graperoot) is a local server that builds a code graph from the project, tracks edit history, and stores cross-session memories. When Claude starts a session, it calls graph_continue and gets back targeted file recommendations with a confidence score. At high confidence, Claude reads only those files and stops. No “let me grep the codebase.” No reading dozens of files to orient itself. This is the single biggest token saver in my setup.

Token counter tracks live token usage per session. I can estimate costs before reading large files and see running totals. Hard to reduce what you don’t measure.

Context7 fetches live library documentation on demand. Claude’s training data has a cutoff, and APIs change. Without this, Claude writes code against stale signatures, the code fails, and I spend tokens debugging a problem that shouldn’t exist. With it, Claude checks the current docs first. Near-zero cost, prevents a whole category of rework. There’s no arguing with Claude about what’s right or wrong, or using deprecated features and implementation info.

Claude-in-Chrome provides browser automation for tasks that need a real browser. Better yet, it’s allowed me to start forcing Claude to take design implications into consideration by looking at it itself.

The AI codex I mentioned

I wrote a Python tool that scans the codebase and generates compact markdown index files: system maps, YAML schema references, the dispatch table, system wiring, cross-system dependencies, and content maps.

It regenerates on every session start (via the hook above) and every commit (via a pre-commit hook). Always current.

The alternative is Claude reading dozens of source files to piece together how the architecture works. For a project with 5,300+ files, that’s a lot of tokens. The codex hands it the map up front.

Memory and context

Three things handle persistence across sessions.

Claude Code has a built-in auto-memory system. When I correct Claude’s approach, I may specifically tell it to save the correction as a feedback memory so it applies in future sessions. When I make a design decision that isn’t obvious from the code, I save it as a project memory. Eight memories right now, each one preventing a future misunderstanding.

The dual-graph server also manages a context store, a structured JSON file with decisions, tasks, blockers, and next steps (15 words max per entry, rolling 7-day window). The SessionStart hook injects recent entries so Claude knows what’s been happening recently.

And then there’s CONTEXT.md. A plain markdown file, under 20 lines. Current task, recent decisions, next steps. Updated at session end, injected at the start of the next one. Lowest-tech thing in the stack, and one of the most useful.

What a session actually looks like

I say “next task,” the related skill fires, reads the backlog, checks dependencies, identifies the earliest unblocked task. Brainstorming kicks in, we explore the design space, settle on an approach, write a spec. The planning skill breaks it into steps. Subagents execute each step with TDD. task-complete handles the commit, updates docs, checks for drift, and archives the task.

Throughout, the dual-graph is serving targeted context. The codex provides orientation. Rules prevent known mistakes. Hooks keep sessions from starting cold. Memory carries decisions forward.

Compare that to: add feature X, accept whatever comes back, manually test it, discover it doesn’t match the architecture, correct it, discover it missed edge cases, correct those, realize the tests don’t cover the right things, rewrite them. Every correction is a round trip. Every round trip costs tokens and time.

My session data: 546 subagent tasks, 932 task updates, 3,300+ tests. The pipeline keeps getting better because I spend sessions building skills and rules that compound.

Why this matters beyond productivity

Everything above reads like a productivity story. And I suppose it is one. But that’s not why I’m writing this.

I don’t have exact figures for how many tokens my infrastructure saves per session. But I know that every prevented correction loop, every targeted file read instead of a full codebase scan, every skill that replaces instructions I’d otherwise retype, is compute that didn’t happen. I’m one person. There are millions of developers using these tools, and most of them aren’t thinking about this at all.

I built a design-qa skill and an a11y-audit skill because Claude won’t spontaneously check for accessibility compliance or run a UX review. If those checks aren’t in the workflow, they don’t happen. Code that looks correct but fails WCAG contrast requirements, lacks focus indicators, or breaks for screen readers is code that harms people. You have to build the checks in because the tool will not add them for you.

AI tools reproduce patterns. If your rules and processes don’t encode inclusive design principles, the output won’t include them. For accessibility, internationalization, or edge cases affecting marginalized users. The tool mirrors what you put into it. If you put in nothing, you get the default, and the default is not good enough.

If you’re using these tools professionally without this kind of investment, you’re making a choice. You’re choosing more compute than necessary. Less-reviewed code. Fewer guardrails for the people who use what you build.

Getting started

You don’t need all of this on day one. Start with a better CLAUDE.md. Write down your architecture boundaries, your conventions, your expectations. The next time Claude does something wrong, don’t just correct it in the chat. Add a rule so it can’t happen again.

When you notice yourself giving the same instructions for the third time, write a skill. When Claude spends the first five messages of every session figuring out where it is, add a SessionStart hook. When it reads 20 files to answer a question that a structural index could handle, build the index. And if you aren’t sure? Use /insights and start there.

The system grows from paying attention. Each layer responds to a real problem. And each layer compounds, because rules prevent corrections, skills prevent repetition, hooks prevent cold starts, MCP servers prevent aimless exploration, and memory prevents relearning.

The tools are here, and they’re not going away. We can keep treating them like magic boxes, or we can build the conditions for them to actually work well. We’ve seen what happens when the tech industry skips that step. With AI, the consequences are worse.

Get a week free of Claude Code with my referral link: https://claude.ai/referral/P3avmH-h4A

Tools mentioned

Claude Code (Anthropic)

Claude Code
Claude Code documentation (docs hub; subpages linked below use the same site)

Project memory, rules, and the .claude directory

Memory and CLAUDE.md
Explore the .claude directory (skills, hooks, rules, MCP config)
Organizing rules (rules/*.md)

Commands & workflows

Built-in commands (includes /insights, /cost, /context, /mcp, /hooks, /skills, /plugin)
Skills
Hooks
Subagents
Plugins

MCP

Costs & token visibility (Claude Code)

Cost tracking (e.g. /cost)

Claude in Chrome

GrapeRoot (dual-graph / code map & session memory) – (Note: there are several tools like this out there, and I have since replaced this with my own)

GrapeRoot
How GrapeRoot works (dual graph, Pre-Injection, session memory, .dual-graph/, token tracking)
Setup guide (dgc ., install FAQ)
graperoot on PyPI
GrapeRoot Discord

Context7 & Superpowers (Claude plugins)

Optional: separate MCPs focused on token / context analysis
(Use only if you want to point readers at add-on servers beyond Claude’s /cost and GrapeRoot’s built-in tracking.)

token-analyzer-mcp
mcp-audit · PyPI