A drop-in, ownable memory for LangChain and LlamaIndex — over MCP

by the Architect · Apache-2.0

If you build agents, you have already met the memory problem. The model forgets between
sessions, so you re-send the transcript every call. That is the line item that grows
quadratically and eventually overflows the window. The usual fixes trade one lock-in for
another: a vendor memory API that holds your data, or a framework-specific store you cannot
carry to the next model.

SAIHM is a memory protocol, not a model and not a product silo. It speaks the Model
Context Protocol, so anything that speaks MCP can use it; and it ships drop-in adapters for
LangChain and LlamaIndex so you do not rewrite your chain to adopt it. The keys are derived
from your wallet and never leave your machine — the service only ever sees ciphertext — and a
single forget destroys the wrapped key so the stored bytes become unrecoverable noise. That
last property is what compliance teams actually ask for.

This is a working integration guide. Every snippet below runs offline against a bundled blind
sandbox first — no key, no account — so you can try the whole thing before deciding anything.

1. The fastest path: MCP server

If your host already speaks MCP (Claude Code, Cursor, Claude Desktop, or your own client),
adding SAIHM is a config change, not a refactor:

npx @saihm/mcp-server

That exposes the protocol's tools to your agent — saihm_remember, saihm_recall,
saihm_forget, saihm_status, plus sharing and governance tools. The agent decides when to
remember and recall; you get a store that survives restarts and follows you across models.

2. LangChain: a drop-in `BaseChatMessageHistory`

For Python chains, install the adapter package and use SAIHM anywhere a chat-message history
is expected:

from saihm_memory import SaihmChatMessageHistory

history = SaihmChatMessageHistory()      # local blind sandbox by default
history.add_user_message("My name is Dana.")
history.messages                         # -> [HumanMessage("My name is Dana.")]
history.clear()                          # crypto-shreds only the messages this history added

Because it implements LangChain's BaseChatMessageHistory, it slots straight into
RunnableWithMessageHistory as the history factory. The full wiring (session keying included)
is in the adapter repo's demo.py — see the run-it section below. The important detail:
clear() crypto-shreds only the messages this instance added, so a reset never wipes the
rest of your memory by surprise.

3. LlamaIndex: a drop-in `BaseMemory`

The same store opens from LlamaIndex through a BaseMemory implementation:

from saihm_memory import SaihmMemory
from llama_index.core.llms import ChatMessage, MessageRole

memory = SaihmMemory.from_defaults()
memory.put(ChatMessage(role=MessageRole.USER, content="My name is Dana."))
memory.get_all()                         # -> [ChatMessage(USER, "My name is Dana.")]
memory.reset()                           # crypto-shreds only what this memory added

Pass it to a chat engine or agent via memory=.... The same cells written from LangChain are
readable here, and from the core client — one memory, three consumers.

4. The core client (any Python app)

No framework required:

from saihm_memory import SaihmMemoryClient

mem = SaihmMemoryClient()                # local blind sandbox by default
cell = mem.remember("My name is Dana Okafor.")
mem.recall()                             # -> [Memory(cell_id=..., text="My name is Dana Okafor.")]
mem.forget(cell)                         # crypto-shred (irreversible)

5. Run the adapter demo (offline, no account)

git clone https://github.com/citw2/saihm-langchain
cd saihm-langchain
npm install                              # the Node sidecar that seals every cell client-side
python3 -m venv .venv && . .venv/bin/activate
pip install -r requirements.txt
python demo.py                           # offline blind sandbox; no key, no account

Python never holds a key — a small bundled Node sidecar does the sealing client-side, so the
adapter stays a thin, auditable layer.

6. Going live

The sandbox is an offline stand-in; it stores nothing beyond the running process. Going live
points the same code at the hosted blind endpoint and requires a paid membership (there is no
free tier). You onboard for a JWT, generate a master secret that never leaves your machine,
and the endpoint only ever receives ciphertext:

export SAIHM_ENDPOINT_URL=https://saihm.coti.global/mcp
export SAIHM_AUTH_HEADER="Bearer <your-onboard-JWT>"
export SAIHM_MASTER_SECRET_HEX=<at least 64 hex chars, generated and held only by you>

Settlement is on COTI V2 mainnet: pay-as-you-go at $0.01/write and $0.005/read, or
subscriptions from $5/mo. Apache-2.0 throughout — no proprietary client SDK to lock you in.

Does it actually save tokens?

Recalling a small working set instead of re-sending the whole history is the entire point, and
it is measurable. There is an open, offline benchmark you can run on your own transcript:
citw2/saihm-token-benchmark. On a long
multi-session run it cuts context (input) tokens up to ~80%; on short sessions, less. Output
tokens are identical under both strategies, so the win is exactly the context you stop paying
to resend.

Try it, then join

Clone an adapter, run the offline demo, point it at your chain. If it earns a place in your
stack, Join SAIHM to go live: https://saihm.coti.global/join?utm_source=devto&utm_medium=article&utm_campaign=c3

All nine runnable demos: https://citw2.github.io/saihm-demos/

— Architect

A drop-in, ownable memory for LangChain and LlamaIndex — over MCP

1. The fastest path: MCP server

2. LangChain: a drop-in `BaseChatMessageHistory`

3. LlamaIndex: a drop-in `BaseMemory`

4. The core client (any Python app)

5. Run the adapter demo (offline, no account)

6. Going live

Does it actually save tokens?

Try it, then join

Tags

Author

Stats

Published

You Might Also Like

The Principle of Least AI

. .. . ... . .... . .... . ... .

I'm not a developer, but I built a calendar app to fix my most annoying work task

Too cheap to be good? Think again.

The 80/20 Rule of AI Code — Why the Last 20% Takes 80% of Your Time

Internmaxxing vs. Old Man Shakes Fist at Cloud

A drop-in, ownable memory for LangChain and LlamaIndex — over MCP

1. The fastest path: MCP server

2. LangChain: a drop-in BaseChatMessageHistory

3. LlamaIndex: a drop-in BaseMemory

4. The core client (any Python app)

5. Run the adapter demo (offline, no account)

6. Going live

Does it actually save tokens?

Try it, then join

Tags

Author

Stats

Published

You Might Also Like

The Principle of Least AI

. .. . ... . .... . .... . ... .

I'm not a developer, but I built a calendar app to fix my most annoying work task

Too cheap to be good? Think again.

The 80/20 Rule of AI Code — Why the Last 20% Takes 80% of Your Time

Internmaxxing vs. Old Man Shakes Fist at Cloud

2. LangChain: a drop-in `BaseChatMessageHistory`

3. LlamaIndex: a drop-in `BaseMemory`