Open Source ยท Python ยท Neuroscience-Inspired

mnemos

ฮผฮฝแฟ†ฮผฮฟฯ‚ โ€” of memory

Biomimetic memory architectures for LLMs.

Five neuroscience-inspired modules that make AI memory work like a brain โ€” not a hard drive. Built for developers who are tired of context stuffing and append-only RAG.

Current LLM memory is broken.

The AI industry relies on context stuffing โ€” cramming millions of tokens into RAM โ€” or standard RAG โ€” an append-only vector database that accumulates contradictions over time. Both treat memory as static, lossless, and disorganized.

Human memory is efficient precisely because it is the opposite: reconstructive, heavily partitioned, state-dependent, and lossy by design. It actively forgets, shifts data structures during sleep, and runs on predictive error.

Hard Drive Approach
Flat, Append-Only Storage
  • Every token has equal weight
  • Contradictions accumulate forever
  • No emotional tagging or state
  • Linear, point-in-space lookup only
  • Raw logs grow until limits explode
  • No associative recall or priming
Brain Approach
Biomimetic Memory
  • Surprisal gates filter the mundane
  • Facts reconsolidate on recall
  • Affective state shapes retrieval
  • Graph activation spreads context
  • Sleep daemon prunes episodics
  • Associative networks prime related memory

Neuroscience meets engineering.

Each module targets a specific failure mode of standard LLM memory, modeled after a known neuroscience mechanism. Use independently or compose into the full pipeline.

MODULE 01

Surprisal Gate

Predictive coding (Active Inference) โ€” the brain only permanently encodes prediction errors, ignoring what it already expects.

Vector databases encode everything equally. A mundane greeting consumes the same conceptual weight and storage as a critical instruction like "My production server is down."

A fast local model runs as a background Prediction Engine, constantly predicting user intent. When input arrives, semantic divergence (cosine distance) is computed between the prediction and actual input. Low divergence is discarded. High divergence is stored with an elevated salience weight โ€” only surprises become long-term memories.

MODULE 02

Mutable RAG

Memory reconsolidation โ€” every time a human recall occurs, the memory enters an unstable labile state and is physically rewritten with current context before restoring.

RAG is append-only. If a user says "I use React" in 2025 and "I'm migrating to Rust" in 2026, standard RAG retrieves both facts, forcing the LLM to waste tokens resolving the contradiction in its context window.

Retrieved memory chunks are flagged as "labile." After each conversational turn, an async background agent evaluates whether the retrieved fact has changed given the new context. If it has, the original vector is overwritten โ€” not appended to โ€” with a synthesized, updated chunk. The AI's beliefs naturally drift and adapt without accumulating contradictory junk.

MODULE 03

Affective Router

State-dependent memory (amygdala filter) โ€” the brain retrieves past emotional states when the current state matches, naturally surfacing contextually relevant experiences.

Standard embedding models retrieve based purely on semantic text similarity. A critical, urgent constraint ("PROD IS DOWN") carries the same retrieval weight as a trivial passing comment.

Every interaction is classified on three axes โ€” Valence (โˆ’1.0 to 1.0), Arousal (0.0โ€“1.0), and Complexity (0.0โ€“1.0) โ€” and appended as a CognitiveState metadata vector. The retrieval formula blends semantic similarity (70%) with affective state match (30%). A panicked user surfaces past crisis resolutions, not just semantically similar code snippets.

MODULE 04

Sleep Daemon

Hippocampal-neocortical transfer โ€” during deep sleep, the brain replays episodic memories, extracts generalized semantic rules into the neocortex, then prunes the raw episodes to reclaim capacity.

AI developers fear data deletion, so raw conversation logs grow infinitely. Retrieval latency, storage costs, and context limits eventually explode โ€” and nothing gets abstracted into durable knowledge.

An "awake" hippocampus (Redis/SQLite) stores current-session conversational turns verbatim. On idle, a scheduled sleep process reads the day's episodic logs, extracts permanent facts and user preferences into a dense knowledge graph, then actively deletes the raw episodes. Optionally, it identifies repetitive reasoning patterns and codifies them as executable tool scripts.

MODULE 05

Spreading Activation

Collins & Loftus associative networks โ€” hearing "server" pre-activates "AWS," "downtime," and "Nginx" in human neural networks, priming associated concepts before they're consciously needed.

Vector search is a discrete point-in-space lookup. It retrieves exact mathematical matches but completely misses the broader associative "train of thought" โ€” unless explicitly queried by name.

When a node is retrieved via vector search, activation energy (1.0) is injected into it and propagates along graph edges to connected nodes, decaying by 20% per hop. The LLM receives the directly retrieved node plus all adjacent nodes above the activation threshold โ€” creating a fluid, moving spotlight of context that delivers human-like associative intuition.

Encode โ†’ Retrieve โ†’ Consolidate

The MnemosEngine composes all five modules into a coherent pipeline. Each module is independently usable โ€” or let the engine orchestrate the full sequence.

ENCODE Input interaction Surprisal Gate filters mundane Affective Router tags emotion Spreading Activation links graph Store memory RETRIEVE Query user prompt Spreading Activation graph traversal Affective Router re-ranks results Mutable RAG reconsolidates Results to LLM CONSOLIDATE Idle system quiet Sleep Daemon triggers Facts Extracted prefs + patterns Episodes Pruned Knowledge Graph updated

Up in 60 seconds.

A pure-Python library with no required external services. Start with the in-memory store, swap to SQLite for persistence, or scale to Neo4j + Qdrant for production.

install & basic usage
# Install
pip install mnemos-memory

# With MCP support
pip install 'mnemos[mcp]'

# Basic usage
import asyncio
from mnemos import MnemosEngine

async def main():
    engine = await MnemosEngine.create()

    # Store โ€” surprisal gate decides what's kept
    await engine.store("User prefers dark mode, uses Neovim")

    # Retrieve โ€” spreading activation + affective re-rank
    results = await engine.retrieve("editor preferences")
    print(results[0].content)

    # Consolidate โ€” sleep daemon runs
    await engine.consolidate()

asyncio.run(main())
claude code โ€” .claude/claude_desktop_config.json
{
  "mcpServers": {
    "mnemos": {
      "command": "mnemos-mcp",
      "env": {
        "MNEMOS_LLM_PROVIDER": "ollama",
        "MNEMOS_LLM_MODEL": "llama3",
        "MNEMOS_STORE_TYPE": "sqlite",
        "MNEMOS_SQLITE_PATH": "~/.mnemos/memory.db"
      }
    }
  }
}
mcp tools available to your agent
mnemos_store        โ†’ store through surprisal + affective
mnemos_retrieve     โ†’ spreading activation + re-rank
mnemos_consolidate  โ†’ sleep: episodic โ†’ semantic
mnemos_forget       โ†’ delete specific memory
mnemos_stats        โ†’ system-wide statistics
mnemos_inspect      โ†’ full details on a memory
mnemos_list         โ†’ list all stored memories

How mnemos stacks up.

The only open-source library that implements all five biomimetic memory mechanisms.

Feature mnemos Mem0 Zep LangMem MemGPT
Surprisal Gating โœ“ โ€” โ€” โ€” โ€”
Memory Reconsolidation โœ“ partial โ€” โ€” โ€”
Affective State Routing โœ“ โ€” โ€” โ€” โ€”
Sleep Consolidation โœ“ โ€” partial โ€” โ€”
Graph Spreading Activation โœ“ โ€” partial โ€” โ€”
MCP Server Support โœ“ โœ“ โœ“ โ€” โ€”
Open Source โœ“ โœ“ partial โœ“ โœ“