TL;DR
- I tracked 6 months of my own AI coding sessions in React Native. In my logs, 42% of AI-generated diffs contained at least one hallucinated import, fake API, or duplicate component.
- Token costs were the second tax. Re-loading project context every session cost roughly $135/month per developer at the model pricing I was using.
- Better prompts didnβt fix either problem. The AI didnβt need smarter instructions : it needed memory and a map.
- I built U-AMOS (Universal AI Memory Operating System): a 3-tier memory bank, a context map, a rule priority system that splits βwhat to doβ from βhow to do it,β a 7-point anti-hallucination checklist, and a plan/act workflow that runs before any code is generated.
- After deploying U-AMOS across my own projects over a 3-month tracking period: hallucinations dropped from 42% to 3%. Token costs dropped from $180/month to $18/month. Feature velocity increased roughly 5x. These are my internal numbers: Iβll note where external research reports similar magnitudes.
- The framework is open and documented. U-AMOS 2.0 also ships pre-configured inside AI Mobile Launcher for anyone who doesnβt want to build it from scratch.
A note on the numbers
Everything in this article that is quantified β the 42%, the $135/month, the 91% reduction β comes from 6 months of my own session logs across my React Native projects. I tracked hallucinations manually, counted tokens via API usage dashboards, and measured debugging time against my own estimates. These are not controlled experiments.
What I can say is that the direction of the results matches what external research is starting to report. Memory-system papers are showing 40β60% accuracy improvements and 60β90% token reductions when you introduce structured memory into LLM workflows. Mem0βs Claude Code integration reports roughly 90% lower token usage with persistent memory vs full-context prompting. The order of magnitude is consistent. The exact numbers are mine.
The moment I stopped pretending it was working
It was a Tuesday in October. I was building a functionality for my app. I asked Claude Code to add a Redux toolkit usage to manage user accounts. It generated something that looked correct. I committed it.
Twenty minutes later, the build failed.
The AI had been imported useRouter from next/router. In a React Native project. That hook doesnβt exist on mobile. It was a 30-second fix, but it wasnβt the first time. It was the fourth time that week.
I started keeping a log. Every wrong thing the AI generated, I wrote down. After a month, I had the data from my own sessions:
- 42% of AI-generated diffs had at least one hallucinated import, function, or component
- 25% of the components it created already existed in the codebase under a different name
- I was spending roughly 4 hours a week debugging things the AI had invented
- I was using Cursor much more than Claude that time, so with Cursor, I had analytics dashboard, an d confirm some of my thesis
The frustrating part was that I knew the AI wasnβt getting worse. I was paying for the best models. The prompts were detailed. The context windows were huge.
The problem wasnβt the model. The problem was that I was treating it like a senior developer when it was behaving like a junior with no memory of the project, and no map of the codebase.
I have played before by adding rules, memory bank,.. but there were always issues in grasping the whole context, and i need to remind him much more often.
The token tax nobody talks about
While I was tracking hallucinations, I also started tracking token usage. The numbers were uncomfortable.
Every session, I was loading the same context: project structure, architecture decisions, naming conventions, what components already existed. The AI had no memory between sessions, so I kept reexplaining everything. Worse, when I didnβt re-explain, the AI would explore : running directory listings, opening files at random, building up its own picture of the codebase by trial and error.
That exploration is where the worst of the token bleeding happens. Asking βwhere is the authentication logic?β can trigger 25,000 tokens of blind navigation through folders before the AI finds it.
The math, at the model pricing I was using at the time:
- Session 1: Re-load + explore project structure β 50,000 tokens
- Session 2: Re-load + explore project structure β 50,000 tokens
- Session 3: Re-load + explore project structure β 50,000 tokens
- Daily total: 150,000 tokens
- Monthly cost: ~$135/month per developer
(based on ~$30 per million tokens, prompt + completion)
Thatβs the invisible tax. Even when the AI was generating correct code, I was paying to give it the same context every time, plus paying for it to wander around the repo finding things it should already know about.
I do remember creating one file, that has architecture.md, where i put this type of context that i give each time, and then i created review_best_practices.md, to have the rules for the mistakes that he was repeating.
Then it comes the Claude Code best practices usage, I tried the obvious approaches first. Longer CLAUDE.md files. More detailed system prompts. Better instructions on what to remember.
None of it worked sustainably. The AI would hold context for a session or two, then drift. Because the problem wasnβt the prompt. It was the architecture.
The reframe that changed everything
The shift came when I stopped thinking of AI as a developer and started thinking of it as a system that needed memory built for it, and a map handed to it. I do remember watching an intreview by Thomas Dohmke, and he asked one of the best practices is to look at it as a colleague, not a tool.
A junior dev with no memory of your project would also generate hallucinated imports. Would also recreate components that already existed. Would also waste hours wandering through unfamiliar code looking for the right file. The AI wasnβt broken. The relationship was broken. I was asking it to behave like it had context it didnβt have.
A lot of content Iβve seen treats this as a prompting problem. Write a better system prompt. Use a longer context window. Be more specific in your instructions.
My experience, and increasingly what I see from teams whoβve shipped real production AI-assisted codebases, is that prompts plateau. Durable context compounds. The teams getting consistent AI output arenβt writing better prompts : theyβre building memory systems that load the right context at the right time and update automatically when something changes.
you can read this article about best prompt engineering approach here:
Essential guide of Prompt Engineering for Software Engineers
Malik CHOHRA Β· 17 November 2025
Read full story
Thatβs what I built. I called it U-AMOS.
What U-AMOS actually is
U-AMOS : Universal AI Memory Operating System, is a framework for managing AI-assisted development. It has five components, each solving a specific failure mode Iβd logged.
ββββββββββββββββββββββββ
β Memory Bank β
β (Cold / Warm / Hot) β
βββββββββββ¬βββββββββββββ
β
ββββββββββββββββββββββββ
β Context Map β
β (Index / Lookup) β
βββββββββββ¬βββββββββββββ
β
ββββββββββββββββββββββββ
β Plan Mode β
β (before execution) β
βββββββββββ¬βββββββββββββ
β
ββββββββββββββββββββββββ
β Validation Layer β
β (7-point checklist) β
βββββββββββ¬βββββββββββββ
β
ββββββββββββββββββββββββ
β Code Generation β
βββββββββββ¬βββββββββββββ
β
ββββββββββββββββββββββββ
β Progress Logging β
β (.memory updates) β
βββββββββββ¬βββββββββββββ
β
ββββββββ FEEDBACK LOOP βββββββ
1. The Memory Bank β three tiers, loaded on demand
Not all context is equally important for every task. So I tiered it.
Cold tier (project identity β loads rarely, ~10% of sessions):
-
00-description.mdβ what weβre building, in 500 words -
01-brief.mdβ non-negotiable constraints -
10-product.mdβ feature specs
Warm tier (architecture β loads on demand, ~30% of sessions):
-
20-system.mdβ how the system works -
30-tech.mdβ stack and dependencies -
60-decisions.mdβ why we chose what we chose -
70-knowledge.mdβ lessons learned
Hot tier (current state β loads every session, 100%):
-
40-active.mdβ what weβre working on right now (max 500 words) -
50-progress.mdβ what shipped recently
The hot tier is small (~2,000 tokens) and always loads. The warm tier loads when the task touches architecture (~5,000 tokens). The cold tier almost never loads during development β itβs the onboarding layer. A new developer (or a new AI agent starting a session) reads the cold tier once and understands the project without hunting through the entire repo.
The result: 2,000β10,000 tokens per session instead of 50,000. That assumes youβre maintaining the files actively β see the hygiene section below.
2. The Context Map β the exploration killer
This is the piece that does the most work for the lowest cost.
context_map.md is a single 500-token lookup file at the root of the project. It indexes everything: every feature, every service, every core UI component, with the entry path next to each one.
# Context Map
## Features (14)
| Feature | Entry Point | Purpose |
|----------------|----------------------------------|--------------------|
| auth | src/features/auth/index.ts | Authentication |
| onboarding | src/features/onboarding/index.ts | User onboarding |
| todos | src/features/todos/index.ts | Todo management |
## Services (15)
| Service | Path | Responsibility |
|----------------|----------------------------------|--------------------|
| logger | src/services/logging/logger.ts | Centralized logs |
| analytics | src/services/analytics/... | Firebase analytics |
## UI Components (40+)
| Category | Components |
|----------------|----------------------------------|
| Buttons | Button, IconButton, FAB |
| Forms | Input, ControlledInput, Switch |
When the AI starts a session and needs to know βwhere does authentication live?β, it reads one 500-token file instead of running directory listings, opening five files to compare them, and burning 25,000 tokens building its own mental model of the repo.
In my own logs, this single file removed roughly 60% of the per-session token consumption that wasnβt already covered by the memory bank. The math: 500 tokens replaces 25,000. Thatβs a 50x reduction on the most expensive part of every session : discovery.
3. The Rule Priority System β three tiers, with generators separate from rules
The same logic applies to coding rules.
Critical rules (always load, ~4,000 tokens):
- Meta-rules and session protocol
- Anti-hallucination checklist
- Common violations (no inline styles, no
console.log, no hardcoded strings, no API keys)
Important rules (task-specific, ~2,000 tokens each):
- Design system patterns: loads if working on UI
- State management rules: loads if working on the state
- i18n patterns : loads if adding translations
- Navigation patterns: loads if adding routes
Recommended rules (load if relevant):
- Performance optimizations
- Testing patterns
- Security and platform-specific privacy rules
The other architectural distinction that mattered: I separated generators from rules. They look similar but they solve different problems.
Generators answer what to do. Step-by-step implementation guides for recurring tasks: βadd a new language,β βadd a new screen,β βadd a paywall.β Theyβre workflow documents β copy this template, register here, run this script.
This one i include in my Ai react native boilerplate:
https://aimobilelauncher.com/, and i explained them there, you can check the code about different generators.Rules answer how to do it well. Code quality patterns and constraints: this is what good styling looks like; this is what the wrong import path looks like.
When you mix the two, when your βhow to add a languageβ doc also tries to explain every i18n best practice, the AI gets overwhelmed and follows neither cleanly. Splitting them means the AI reads the generator to know the steps, then reads the matching rule pack to write the code correctly. Two clean reads. No drift.
4. Concrete examples beat abstract rules
This is a philosophical point but itβs the reason U-AMOS rules actually work.
Most rule documents read like this: βUse proper styling conventions. Avoid inline styles where possible.β
Rules in U-AMOS read like this:
## Styling
### β WRONG β inline styles
<View style={{ marginTop: 20, padding: 16 }}>
### β
CORRECT β Restyle props
<Box marginTop="xl" padding="lg"/>
### Exception: unsupported properties
<Box marginTop="xl" style={{ opacity: 0.5 }}>
(opacity is not a Restyle prop, inline is acceptable here)
LLMs donβt generalize abstract principles well. They pattern-match. If you show them what wrong looks like next to what right looks like, they reliably produce the right pattern. If you tell them to βfollow good practices,β they produce whatever the training data nudged them toward last time.
Every rule pack in U-AMOS is built this way. β wrong β β correct β exception (if any). No paragraphs of theory. No abstract guidelines. Just visual diffs. This is the single biggest determinant of whether a rule actually changes the AIβs output or gets ignored.
5. The 7-Point Anti-Hallucination Checklist
Before any code is generated, the AI verifies:
- Does the file Iβm editing exist?
- Did I check the component inventory before creating something new?
- Did I check the service registry?
- Is the import path correct?
- Does the function Iβm calling actually exist in that file?
- Am I using the projectβs i18n pattern, not hardcoded strings?
- Am I using the projectβs logger, not
console.log?
If any answer is no, the AI stops and verifies before continuing.
The first week I deployed this, my hallucination rate in my own sessions dropped from 42% to under 5%. Not because the model improved. Because I made verification mandatory before generation.
Each of these rules is manually crafted.
6. Plan/Act Mode β no code without a plan
This is the piece I added after the initial U-AMOS deployment, and it might be the highest-leverage addition.
Before touching more than one file, the AI must:
- Read
.memory/40-active.md(current focus) - Draft an implementation plan in plain markdown
- Wait for my confirmation
- Execute only after approval
- Log what it actually shipped back into
.memory/50-progress.md
This sounds slow. Itβs actually faster because you catch architectural mistakes at the plan stage instead of the debugging stage. Tweagβs Agentic Coding Handbook and Lullabotβs memory bank guide both document the same pattern. Itβs becoming standard practice in teams using agentic coding seriously.
What changed after U-AMOS
I tracked the same metrics for 3 months after deploying U-AMOS across my own projects.
- Hallucinations (from my logs): 42% β 3% (93% reduction)
- Tokens per session (average): 48,000 β 4,200 (91% reduction)
- Token cost (at my model tier): ~$180/month β ~$18/month
- Time debugging AI errors: 4 hours/week β 20 minutes/week
- Duplicate components created: 23 in the 3 months before β 0 in the 3 months after
- Feature velocity: roughly 5x faster on features I tracked end-to-end
I also started tracking which rule packs loaded most often and which hallucination types were still slipping through. That observability layer is what tells you where the system needs a new rule file vs where the AI needs better examples.
Memory hygiene: pruning, plus living rules
The mistake I see in most memory bank setups is treating the files as append-only. Theyβre not. They need pruning.
My current hygiene routine:
-
40-active.mdupdates at the start of every work session (whatβs the actual focus today) -
50-progress.mdgets a new entry after every shipped feature : old entries archive monthly -
70-knowledge.mdgets pruned weekly : if a lesson is now in a rule file, it gets removed from the knowledge doc -
20-system.mdonly updates when architecture actually changes - If the AI proposes changes to any memory file, it does it as a plan diff I review : it never writes to memory silently
Thereβs one more file that prevents documentation rot: updated_rules.md. Itβs a changelog for rule exceptions.
When the team makes a real exception to a rule : for example, βwe never use inline styles, EXCEPT for the opacity prop because Restyle doesnβt support itβ : that exception goes in updated_rules.md with a date and a reason. Not into the main rule file.
# Updated Rules (Living Document)
## 2025-12-20 β Inline styles exception
**Original rule**: NO inline styles ever
**Updated rule**: NO inline styles EXCEPT for single properties not supported by Restyle (opacity)
**Why**: Restyle doesnβt support opacity prop
**Example**: β
<Box marginTop="xl" style={{ opacity: 0.5 }} />
Why this matters: rules become outdated quickly, and rewriting them every time creates drift. The living rules file lets the AI always check the latest guidance without losing the original logic. Exceptions are explicit and dated. Historical context is preserved. The main rule files stay clean.
The 2,000β10,000 token figure holds only if you maintain all of this. If you let the files grow unchecked, youβll hit 50,000 tokens again within two months. The context window isnβt the bottleneck : your maintenance habits are.
What still doesnβt work, and whatβs on the roadmap
This isnβt a finished system. Four things still fail or are incomplete:
Long sessions. Context degrades over multi-hour conversations. I re-attach memory bank files every 30β40 messages. A better solution is probably an MCP server that handles re-injection automatically, but I havenβt built it.
Performance edge cases. The AI generates working code that sometimes re-renders too aggressively. Architecture rules help, but donβt eliminate this. I m fixing this by creating performance rules for expo apps. i m using the official one from Expo, but it is not enough, and with the project architecture, it needs a lot of fixes and improvement.
Cross-project memory. U-AMOS handles per-project memory. The next layer β preferences and patterns that follow you across every project you touch β is what tools like Mem0βs MCP integration and Claude Codeβs own auto-memory system are starting to solve. If you find yourself re-teaching the same conventions in every new repo, cross-project memory is the fix. Iβm watching this space closely.
How to set up U-AMOS yourself
I have created a Prompt intialization for the system, i test it on some of my projects, and it was succefful. not so many rules though, but you can customize that part
You can check it here: link
Thanks for reading Code Meet AI: Stay relevant in the AI era! Subscribe for free to receive new posts and support my work.
Related work worth reading
U-AMOS didnβt emerge from a vacuum. These are the guides Iβve found most aligned with the same pattern:
- Tweagβs Agentic Coding Handbook: memory bank system and plan/act mode, well documented
- Mem0βs Claude Code integration: if you want cross-project memory on top of U-AMOS, this is the current best path
- Anthropicβs Claude Code best practices: the official guidance on CLAUDE.md structure, memory, and tool use
The pattern is converging across all of these. Structured memory, tiered loading, mandatory verification before generation, plan-before-execute. U-AMOS is my implementation of that pattern for React Native specifically, with the anti-hallucination rules, the context map, and the mobile-specific constraints built in.
Or, if you want it pre-configured
I built AI Mobile Launcher as the productized version of U-AMOS for React Native.
It ships with:
- The full 9-file memory bank is pre-structured for a new project
- A pre-built context map of every feature, service, and UI component
- All critical, important, and recommended rule packs β written as visual diffs, not paragraphs
- The split between generators (workflows) and rules (patterns) is already in place
- Pre-built component and service inventories
- Cursor and Claude Code entry points configured with plan/act mode
- Generators for common features (onboarding, paywalls, i18n, design system)
- The 7-point anti-hallucination checklist is embedded in every entry point
- A starter
updated_rules.mdready for your first exception
The Lite tier is free on GitHub. U-AMOS 2.0 ships fully configured in the Starter tier. If youβre starting a new React Native project and want the memory system running from day one without the setup work, thatβs the fastest path. aimobilelauncher.com
If youβre adding U-AMOS to an existing project, the steps above are enough to get started. The framework isnβt magic β itβs the result of 6 months of failed sessions, logged and analyzed, until the AI stopped fighting me and started shipping with me.
What I want you to take from this
The content I see most often on AI coding frames is this as a prompting problem. Use a better system prompt. Be more specific. Add more examples to your instructions.
My experience over 6 months of tracking my own sessions is that prompts hit a ceiling. Once youβve written a clear, specific prompt, the next 10 iterations give you marginal gains. Memory and structure compound differently . every lesson added to the memory bank improves every future session. Every entry in the context map saves another exploration loop. Every rule written as a visual diff prevents an entire category of hallucination permanently.
The AI isnβt a developer you prompt. Itβs a system you build context for. Build the memory. Hand it the map. Show it what wrong looks like next to what right looks like. Stop paying to re-explain the same architecture every day.
U-AMOS is how I did it. The principles work without my specific files. The files work better with the principles. Either way: fix the memory and the map first, then build the product.
I write Code Meet AI weekly β AI in mobile development, real tradeoffs, whatβs actually working in production. Next issue: agent-first mobile architecture and why most βAI featuresβ in apps are just bolted-on chatbots pretending to be product. β https://codemeetai.substack.com/














