You designed the best Agent memory layer. Now, if only it would just use it RIGHT!!!

You finally got your system to beat Mem0 on its own benchmark. Spin up a fresh DB. Things are good, confabs down, productivity is up. A week or two passes, and it's a goldfish. Open your store, and it's the Red Wedding in there. Your agent has either been saving nothing you want, half what you need, something about nothing, OR EVERYTHING! C'Est La Vie.

I'm going to try to convince you that I got it figured out; if not, maybe it will help you get your model under control. Cause I promise, I hit every failure mode building Recall, a local active memory outside of an agent's control.

The failure modes

Quietly not writing. You ask the model to remember something durable. It says "noted" and moves on. Nothing lands in the store. No error, no warning, just a turn that ended without a write. This is the most common one and the hardest to catch, because from inside the conversation, everything looks fine.
Half writing. The model writes one fact and drops the three that mattered as much. Or it writes the headline and not the reasoning behind it, so a later session gets a claim with no support. The store fills up, but with fragments you cannot act on.
Writing the wrong thing. If your memory is structured (required fields, typed records, confidence, evidence links the model fills the structure out wrong. It puts a passing observation where a decision should go, leaves the confidence blank, or points a "this corrects that" link at a free-text label instead of the actual record. The schema is satisfied on paper and is useless in practice.
Writing everything. The overcorrection. The model dumps the whole turn into the store: every aside, every dead end, and sometimes a secret it should never have persisted. Now you have a second problem on top of the first, because data buried is the same as data corrupted

Why this happens

The model has no stake in the future session. Inside a single turn, the context window already holds everything the model needs. Writing to an external store is, from the model's point of view, work that pays off for someone else: a future session it will never experience as itself. It optimizes for finishing the turn in front of it, and the write is the first thing to get skipped.

There is usually a competitor. If your agent runs inside a host like Claude Code, that host probably ships its own memory feature, wired into the base system prompt. When two "save this" pathways exist, the native one wins, because it is closer to the model's root instructions than your skill is. Your memory system can be fully armed and still lose every write to the built-in one. I confirmed this with a single-variable test: with the native feature on, the model wrote the user's facts to flat files every time, no matter how loudly my system asked for the structured store.

Writing is harder than reading. Reading is free-form: ask a question, get text. A structured write means satisfying a schema, and the moment the model meets friction, it takes the path of least resistance, which is to skip the write or to dump unstructured prose. Friction is not a small factor here.

There is no feedback in the loop. When the model writes the wrong structure and the write just fails silently, nothing teaches it otherwise. It shrugs and continues. Adherence with no signal is a coin flip; the model loses a little more often every turn.

Three solutions that do not work

Tell it harder in the prompt. The instinct is to add "ALWAYS write durable facts to memory" in capital letters and call it done. This is prompt-nagging. It competes with the native pathway and loses; it costs tokens on every turn, and it decays: the model obeys for a few turns, then rationalizes its way out ("this is just a simple note", "I will write it later"). It is also brittle across models, so the day you switch models, you start over.

Log everything and clean up later. If the model does not decide what is durable, make it write all of it and curate afterward. This trades the empty-store problem for a curation-debt problem, defeats the entire point of a schema, and is the exact path that leaks secrets into the store. You have not solved adherence. You have moved the failure downstream and added a cleanup job you will never get to.

Fine-tune a model to obey the schema. Reach for training, and you get a heavy, expensive fix that is brittle to schema changes, locks you to one model, and still does not address the competing native feature. It is a large hammer for what turns out to be a wiring problem, and the wiring problem is sitting right there, unsolved underneath it.

Two easy fixes that actually help

Turn off the competitor. This is the single change that helps most, and it is one line. If the host ships its own auto-memory, disable it so there is only one "save this" pathway in the building. In Claude Code that is CLAUDE_CODE_DISABLE_AUTO_MEMORY=1. With the competitor gone, a properly armed agent reaches for the structured store on its own, because nothing is shadowing it anymore. Most of the "quietly not writing" problem was never the model refusing. It was the model writing somewhere else.

Lower the write friction. Give the model a small helper that takes only a few inputs it can judge (the record type, a title, a body, a confidence, a couple of topics) and emits the schema-valid object for it. The model stops hand-assembling a structured payload and picks the two or three load-bearing fields instead. In Recall, this removed the schema-friction tax on the first write of every session, which was where most of the "writing the wrong thing" came from. The model was not being careless. It was being asked to do clerical work under load, and it cut corners exactly where you would expect.

These two get you a long way. They do not, by themselves, guarantee the write happens at the right moment, or that a correction supersedes the old value instead of sitting next to it. For that, you need the system, not the model, to carry the discipline.

The real fix: Ta dun Ta da hooks

The durable answer is to stop relying on the model and move the adherence burden onto hooks that trigger from events that perform actions between the beginning and end of that forward pass.

At the start of a turn, inject the memory. A hook on session start or on prompt submit that says, in-band, "the memory store exists, read it before you rely on recollection," and then hands the model a mini-index of what is already stored that is relevant to this prompt: ids and titles, nothing heavy. This does two things at once. It makes reading the default instead of an optional courtesy, and it kills the "assert from memory" and "ask the user a thing they already told you" failures by showing the model what is on the shelf. Reading first is also what makes writing meaningful: a model that has seen the current state writes the resolution, not a duplicate.

At write time, enforce the structure in-band. Put a validation gate in front of the store so a malformed or secret-shaped write bounces with a readable error the model can fix on the spot, instead of failing silently or corrupting the store. This is where "writing the wrong thing" and "writing everything" get caught. The schema stops being a thing the model has to remember to honor and becomes a thing the system guarantees. The same gate is where you reject secrets, so a leaked token never reaches the graph in the first place.

At the end of a substantive turn, nudge the write. A stop hook that checks whether the turn produced something durable and nothing got written, and prompts for it. This closes the "quietly not writing" gap from the other side: even if the model forgot, the system asks once before the turn ends.

The shape of the fix is the same in all three places. The model's job shrinks to the part only it can do, which is judging what is durable and how confident it is. Everything mechanical (when to read, when to write, what shape the write takes

There is a small equation hiding in here that I found the hard way. Obedience is the product of three things: the model's intent on the turn, the arming you put in place (the skill, the helper, the hooks). That is why "tell it harder" fails on its own; it is the factor most likely to be silently zero while you debug the other two.

What the future looks like

Business as usual, and your memory system fails in the most expensive way possible: it looks like it is working. The store exists, the writes occasionally happen, and you do not notice until a session confidently tells you something three versions out of date, or asks you a question you answered 10minutes prior, or starts cold and re-derives what the last run already knew. The store becomes a graveyard you stop trusting, and you quietly go back to pasting context in by hand. You are now maintaining a database for nothing, which is strictly worse than not having one.

Fix it, and the thing compounds. Sessions inherit. The model reads before it acts, writes the resolution when it corrects itself, and supersedes the old value instead of stacking a new one next to it, so the current answer is always on top and the history still survives underneath. The memory gets more useful the more you use it, because every correction makes the store sharper instead of noisier. You stop re-explaining your own project to your own tools. That was the entire promise of agentic memory,

I didn't talk about RAG, separate embedding models designed for retrieval, and only touched on automemory because. I'm saving some sauce for the ribs.

I've spent the better part of five or six months now putting the work in on , Recall, a push-style memory substrate for agents: structured records, computed and calibrated confidence, directional value updates with provenance and the hooks described above. It's open, any and all feedback of its behavior on other systems is appreciated. Thank you for your time and the read. https://github.com/H-XX-D/recall-memory-substrate