Hermes Agent already keeps a basic memory across sessions, but many users find it too shallow for serious work. They need a system that can store facts, retrieve relevant information quickly, and grow without blowing up the token budget. The new community project Memory OS addresses exactly these pain points by stacking six memory layers on top of Hermes Agent.
The first layer is the workspace, holding files like MEMORY.md, USER.md and CREATIVE.md that are injected into each system prompt. The second layer uses the existing SQLite session database with full‑text search to keep conversation history accessible. Layer three adds structured facts stored in a separate SQLite database, complete with trust scoring, entity resolution and a feedback loop that updates confidence over time. Layer four is a forked version of the Icarus Plugin, now called Fabric, which provides 16 tools for cross‑session recall, writing and briefing. Layer five introduces a vector database powered by Qdrant, using 4096‑dimensional cosine vectors combined with BM25 sparse search for hybrid retrieval. The sixth layer is an auto‑curated LLM wiki that continuously ingests concepts and entities back into the vector store.
During each interaction, Memory OS runs surgical recall before the LLM call, pulling from Fabric, Qdrant, the session store and the fact store. Each source is gated by a relevance threshold, duplicates are removed per session and trivial messages are filtered out. After the model responds and at session end, the system extracts new learnings and stores them automatically, aiming for token efficiency rather than simply stuffing the context window.
Retrieval uses a four‑level fallback: hybrid search, dense vectors, lexical search, then SQLite, ensuring recall continues even if one method falters. A weekly decay scanner ages out stale entries, and semantic deduplication merges near‑identical memories when cosine similarity exceeds 0.92, preventing unbounded growth.
All of this runs locally with Docker, Qdrant, Redis and Python 3.11+, keeping data on the user’s machine and avoiding any cloud memory subscription. It works with any LLM provider Hermes supports, including OpenRouter, OpenAI, Anthropic and Ollama. For teams with strict data‑residency rules, this local‑first approach offers a practical, provider‑agnostic way to give Hermes Agent a deep, reliable memory without sacrificing performance.
#AI #MachineLearning #LLM #AgentMemory #OpenSource #DevTools