Itinai.com mockup of branding agency website on laptop. moder 03f172b9 e6d0 45d8 b393 c8a3107c17e2 2
Itinai.com mockup of branding agency website on laptop. moder 03f172b9 e6d0 45d8 b393 c8a3107c17e2 2

Stop Retraining LLMs: MEMO Adds Memory No Parameter Changes

Large language models become static after pretraining, so their knowledge quickly falls behind the evolving world. Retraining a full model is prohibitively expensive, and fine‑tuning risks catastrophic forgetting, erasing previously learned abilities. Retrieval‑augmented generation (RAG) tries to fetch up‑to‑date information at inference time, but it is noisy, costly when the corpus grows, and struggles when answers require reasoning across many documents.

A new framework called MEMO (Memory as a Model) solves these problems by separating memory from reasoning. A small, dedicated MEMORY model is trained on a target corpus to internalize facts and cross‑document relationships. The main LLM, called the EXECUTIVE model, stays frozen and is only queried through a standard input‑output interface, meaning it can be a proprietary API or any black‑box model.

Training the MEMORY model follows a five‑step data synthesis pipeline: fact extraction, consolidation, verification and rewriting, entity surfacing, and crucially, cross‑document synthesis that builds question‑answer pairs spanning multiple documents. The MEMORY model is then fine‑tuned on this reflection QA dataset, learning to answer from its internal parameters without ever seeing source documents at inference.

At inference, the EXECUTIVE model uses a structured three‑turn protocol: first grounding the query into atomic sub‑questions, then iteratively narrowing down candidate entities, and finally seeking supporting facts and synthesizing a final answer. Because the MEMORY model’s responses are compact natural‑language snippets, inference cost does not increase with corpus size, unlike RAG.

Experiments show MEMO outperforms baselines on multi‑hop reasoning benchmarks such as NarrativeQA, MuSiQue, and BrowseComp‑Plus, even when the EXECUTIVE model is switched from an open‑source LLM to a closed‑source flash model without retraining the MEMORY component. Adding distractor documents barely affects performance, demonstrating robustness to retrieval noise. Moreover, when new knowledge arrives, a fresh MEMORY model can be merged with the existing one using techniques like TIES merging, cutting compute requirements by up to 5.5× compared with full retraining while maintaining strong accuracy.

MEMO thus offers a practical, cost‑effective way to keep language models up to date, preserve prior knowledge, handle complex cross‑document reasoning, and work with any LLM, including proprietary APIs.

#AI #MachineLearning #LLM #RAG #KnowledgeUpdate #ModelMerging

Itinai.com office ai background high tech quantum computing 0002ba7c e3d6 4fd7 abd6 cfe4e5f08aeb 0

Vladimir Dyachkov, Ph.D
Editor-in-Chief itinai.com

I believe that AI is only as powerful as the human insight guiding it.