Open Source · Local-First · MCP + UI

mnemos

μνῆμος — of memory

Reliable scoped memory for coding agents.

Mnemos keeps project, workspace, and global memory separate, survives restarts in a single local SQLite file, ships a guided mnemos ui control plane for setup and host config, and plugs into Claude Code, Claude Desktop, generic MCP hosts, and documented Codex setups. Biomimetic retrieval and consolidation stay under the hood so memory stays compact and adaptive without extra services.

View on GitHub pip install mnemos-memory[mcp]

The Problem

Current agent memory breaks under scope and drift.

Most agent memory tools either dump every interaction into one growing log or append new facts forever without cleaning up old ones. That works for demos. It breaks when an agent moves between repos, drags stale instructions forward, or has to answer from memory under latency limits.

Mnemos v1 is aimed at a narrower problem: safe scoped memory for solo coding-agent workflows. Project, workspace, and global knowledge stay partitioned. Retrieval stays compact. Consolidation turns raw episodes into durable facts instead of one more append-only transcript.

Common Pattern

Append-Only Memory Layer

Mixed project context in one pool
Contradictions accumulate over time
Retrieval quality depends on transcript volume
Operational readiness is hard to inspect
Host support is often implied, not verified
Easy to demo, harder to trust every day

Mnemos v1 Target

Scoped, Local-First Memory

Project, workspace, and global scope boundaries
Reconsolidation rewrites stale facts
Single-file SQLite persistence with built-in graph edges
doctor and mnemos_health expose readiness
Claude Code, Claude Desktop, and generic MCP are Tier 1
Codex is documented via MCP plus AGENTS.md

Five Architectures

Biomimetic internals for practical agent memory.

The product promise is narrow: reliable scoped memory for solo coding-agent workflows. These neuroscience-inspired modules are how Mnemos keeps retrieval compact, adaptive, and less append-only than standard RAG.

MODULE 01

Surprisal Gate

Predictive coding (Active Inference) — the brain only permanently encodes prediction errors, ignoring what it already expects.

The AI Flaw

Vector databases encode everything equally. A mundane greeting consumes the same conceptual weight and storage as a critical instruction like "My production server is down."

How it Works

A fast local model runs as a background Prediction Engine, constantly predicting user intent. When input arrives, semantic divergence (cosine distance) is computed between the prediction and actual input. Low divergence is discarded. High divergence is stored with an elevated salience weight — only surprises become long-term memories.

MODULE 02

Mutable RAG

Memory reconsolidation — every time a human recall occurs, the memory enters an unstable labile state and is physically rewritten with current context before restoring.

The AI Flaw

RAG is append-only. If a user says "I use React" in 2025 and "I'm migrating to Rust" in 2026, standard RAG retrieves both facts, forcing the LLM to waste tokens resolving the contradiction in its context window.

How it Works

Retrieved memory chunks are flagged as "labile." After each conversational turn, an async background agent evaluates whether the retrieved fact has changed given the new context. If it has, the stored chunk is overwritten — not appended to — with a synthesized, updated chunk. The AI's beliefs naturally drift and adapt without accumulating contradictory junk.

MODULE 03

Affective Router

State-dependent memory (amygdala filter) — the brain retrieves past emotional states when the current state matches, naturally surfacing contextually relevant experiences.

The AI Flaw

Standard embedding models retrieve based purely on semantic text similarity. A critical, urgent constraint ("PROD IS DOWN") carries the same retrieval weight as a trivial passing comment.

How it Works

Every interaction is classified on three axes — Valence (−1.0 to 1.0), Arousal (0.0–1.0), and Complexity (0.0–1.0) — and appended as a CognitiveState metadata vector. The retrieval formula blends semantic similarity (70%) with affective state match (30%). A panicked user surfaces past crisis resolutions, not just semantically similar code snippets.

MODULE 04

Sleep Daemon

Hippocampal-neocortical transfer — during deep sleep, the brain replays episodic memories, extracts generalized semantic rules into the neocortex, then prunes the raw episodes to reclaim capacity.

The AI Flaw

Raw transcripts accumulate fast when every session is kept forever. Scoped stores get noisy, retrieval slows down, and the facts that matter stay buried inside turn-by-turn logs.

How it Works

A per-scope episodic buffer collects recent interactions. On idle, sleep consolidation extracts durable facts and preferences into long-term storage, preserves the originating scope, and prunes the episodic trace. The result is smaller, cleaner memory without cross-project bleed.

MODULE 05

Spreading Activation

Collins & Loftus associative networks — hearing "server" pre-activates "AWS," "downtime," and "Nginx" in human neural networks, priming associated concepts before they're consciously needed.

The AI Flaw

Vector search is a discrete point-in-space lookup. It retrieves exact mathematical matches but completely misses the broader associative "train of thought" — unless explicitly queried by name.

How it Works

When a node is retrieved via vector search, activation energy (1.0) is injected into it and propagates along graph edges to connected nodes, decaying by 20% per hop. The LLM receives the directly retrieved node plus all adjacent nodes above the activation threshold — creating a fluid, moving spotlight of context that delivers human-like associative intuition.

The Pipeline

Encode → Retrieve → Consolidate

The MnemosEngine composes all five modules into a coherent pipeline. Each module is independently usable — or let the engine orchestrate the full sequence.

Quick Start

Start local in minutes.

Start with the control plane. It writes canonical config around the built-in SQLite store, applies host setup for Claude Code, Cursor, or Codex, and runs the built-in smoke checks. Use the Python API directly only when you want a manual or embedded integration.

install & guided onboarding

# Install with MCP support
pip install "mnemos-memory[mcp]"

# Launch the control plane
mnemos ui

# In the UI:
# 1. Choose your model provider
# 2. Review the local SQLite memory path
# 3. Apply Claude Code, Cursor, or Codex config
# 4. Run the smoke check and start using Mnemos

python api (advanced / manual)

import asyncio
from mnemos import Interaction, MnemosEngine

async def main():
    engine = MnemosEngine()

    await engine.process(
        Interaction(role="user", content="Use uv for Python tooling in this repo.")
    )

    memories = await engine.retrieve("python tooling", top_k=3)
    for memory in memories:
        print(memory.content)

    await engine.consolidate()

asyncio.run(main())

scoped cli + codex workflow

mnemos-cli doctor

mnemos-cli store "Use uv for Python tooling" \
  --scope project --scope-id repo-alpha

mnemos-cli retrieve "tooling preferences" \
  --current-scope project \
  --scope-id repo-alpha \
  --allowed-scopes project,global

mnemos-cli antigravity codex
# add the generated policy to AGENTS.md
# keep Codex on retrieve -> work -> store -> consolidate

Release Readiness

Supported hosts and current release posture.

Tier 1 means real end-to-end validation in a supported workflow. Tier 2 means documented and usable, but not yet promoted to release-blocking support.

Client	Tier	Status	Notes
Claude Code	Tier 1	Supported	Primary install path via plugin; uses the built-in single-file SQLite store.
Claude Desktop	Tier 1	Supported	Minimal tested stdio config ships in the repo docs.
Generic MCP host	Tier 1	Supported	Verified against the live stdio server, not just static config.
Codex	Tier 2	Documented	MCP plus `AGENTS.md` setup is documented; promotion needs verified daily-use E2E validation.
Cursor	Tier 2	Best effort	Config and antigravity docs exist, but support is not release-blocking.
Windsurf	Tier 2	Best effort	Config is documented, but not yet part of the release-blocking validation set.

Mnemos is much closer to a public open-source release now, but the v1 claim gate still requires external pilot evidence: at least two pilot users or projects, two weeks of daily use, and zero blocker incidents for scope leakage, startup failure, or data corruption.