Memory for AI

This post presents some ideas I worked on over a few months making a biologically inspired memory system for use by my AI agent system. It's designed to emulate characteristics of human cognitive processes, with the hope that it lends itself to the character and contextual awareness of the agents powered by it.

We model dynamic memory networks that facilitate inter-memory interactions with an entity management system for organizing knowledge about things– all with connected memories, cues and decay over time. I also recently added a "sleep phase" for memory consolidation, a time-aware retrieval step for foundational memories, a meta-memory system for higher-level memory management, and memory evolution processes that enable modifications over time.

Drawing from neuroscience, the system integrates multiple types of memory—including episodic, semantic, procedural, working, and several others—each modeled with properties and behaviors akin to their biological counterparts.

While none of this is required, this was an exploration in figuring out the effects of these ideas in a qualitative sense. Not all of it is essential. The system makes room for a wide range of mechanisms, but you only need a few pieces for practical use with different types of agents. An important consideration is that agents themselves are more like a cog in the system versus being the system. An organisation of agents, each specialized, each with the right shared context, is bound to be easier to configure, reason about, and validate.

It started as a thought exercise, tracing the path new memories takes from our sense, to how it's encoded in the hippocampus and everything that we know happens after.

How our brain makes memories

When memories are first formed, the hippocampus is heavily involved in their retrieval. It "indexes" or binds together different elements of a memory stored across the cortex (e.g., sights, sounds, emotions) and helps retrieve these components by reactivating the relevant cortical areas.

Over time, through a process called consolidation, memories become increasingly integrated into the cortex itself. This means that as memories age, the cortex takes on more responsibility for their storage and retrieval. The hippocampus becomes less critical for retrieving well-established, older memories, as these memories have stronger direct connections within cortical regions.

Once memories are fully consolidated, they can often be retrieved directly from the cortex with little or no hippocampal involvement, especially for semantic (factual) memories. However, episodic memories—those with rich context and detail—often continue to rely on the hippocampus even after being partially consolidated in the cortex.

While my initial curiosity led me to explore a fine-tuned embedding model focused on a dense retrieval pipeline, the most effective solution proved to be more comprehensive. In practice, a hybrid approach works best - one that combines dense retrieval with a retrieval system that accounts for factors like time-relevancy, popularity and other key variables related to the use case (more akin to sparse retrieval.)

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

2024 ~ Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, Zheng Liu

Designing a neural-inspired memory system

Replicating the intricacies of human memory is a formidable challenge. AI architectures often rely on static databases or representations in vector spaces, lacking the depth and dynamism inherent in human memory systems over semantic relationships.

The motivation is to enhance an agents' ability to learn in a manner resembling human cognition and create AI systems capable of forming, retrieving, and modifying memories in a way that enriches their learning and decision-making capabilities.

Memory system
├── Meta-memory system
├── Memory processes
│   ├── Memory consolidation (daily sleep phase)
│   ├── Time and popularity awareness
│   └── Memory evolution
├── Sensory memory (real-time world state)
├── Working memory
├── Long-term memory
│   ├── Declarative memory (explicit memory)
│   │   ├── Episodic memory
│   │   └── Semantic memory
│   └── Non-declarative memory (implicit memory)
│       └── Procedural memory
├── Entity system

Fine-tuning embedding models

Embedding models, which convert high-dimensional data into lower-dimensional vector spaces, are instrumental in enabling agents to understand and relate different pieces of information.

By fine-tuning an embedding model, we can move the model towards simulating how humans recall memories based on the associations and contextual cues we train it against. This approach allows the system to retrieve related memories more efficiently, leading to more coherent and contextually appropriate responses. Practically speaking, this is similar to using a sparse embedding model trained against a very specific set of meaningful features.

Fine-tuning an embedding model is a powerful, expensive and imperfect approach with some desired properties when it comes to retrieval. If you have strong and specific opinions about what the embedding space should look like, and are willing to create datasets that reinforce those patterns– I think it's worth some exploration.

Hand-tuned retrieval

You can also leverage LLMs to create additional detail for memories, and enrich them with properties that we could design traditional retrieval methods around. Much like the brain, we don't need to depend on a singular retrieval system– it is often useful to combine more traditional retrieval techniques, atleast for the sake of cost and efficiency.

You can do things like keep track of retrieval pattens, and re-use them as means of caching, or reinforce them– like during consolidation.

Between all the RAG techniques and setup of the overall retrieval system, you can configure essentially every aspect of our memory system– from the ways it draws relationships to how it recalls information.

Memory consolidation

Given two stages, working and long-term memory, the consolidator runs information between them in regular cycles. While this can happen daily, it's more practical to run between working sessions.

Initial encoding: The system starts by tagging memory fragments with temporal and contextual metadata before it's re-framed by an AI with related memories in context. A primary embedding is generated to act as the memory key. The goal isn't to store everything perfectly, but to remember what matters for its purpose.
Memory reframing: The key to re-framing memories well is amplifying discriminative features and pruning redundant ones. This helps keep things distinct and efficient.
Deduplication: After reframing, we can prune duplicate memories based on a threshold. It's more art than science, but what you want to do is keep track of recall patterns and use those to protect key memories while decaying memories that aren't accessed often.
Memory balancing: Sometimes it's necessary to rebalance memories, which primarily involves additional steps to prune duplicates and combine related ones. Think of it as memory housekeeping.
Protection: Preserve critical memories using special tagging– this applies to long-term personality concepts and ideas that are important to the AI agent in practice. These memories stick around and help maintain consistency. A good system prompt per agent could be used to ground this.

The whole process creates a memory system that feels more natural and leads to more consistent agent behavior over time. Much like our own memory, it's more about remembering the important bits than getting everything exactly right.

The effects of an expressive memory system on AI behavior have been fascinating to observe. The conversational agents demonstrate enhanced learning capabilities, forming connections between new information and existing memories in ways that feel more organic and less mechanical. Their responses become more adaptive, drawing from a rich tapestry of interconnected memories rather than isolated data points.

Perhaps most interestingly, it leads to more consistent behaviour over time. As the AI draws from its well-structured memory system, its decisions and actions maintain a coherent thread that reflects its configuration and accumulated experiences.

I tend to think of this stuff as a "soft-skills" framework for AI. While the memory system isn't geared toward acting as a database, it tells the agent who, what and where the agent is.

In practical applications, this system can significantly enhance the capabilities of conversational agents, enabling them to maintain more coherent, context-aware interactions over extended periods. For robotics, it provides a framework for soft learning and adaptation, allowing robots to build upon past experiences in a way that mirrors human cognitive processes.

My first instinct after working on this has been to put it into a game. The potential for creating more immersive and emotionally engaging gaming experiences through AI characters with authentic memory systems is particularly compelling. Imagine NPCs who genuinely remember your past actions, form lasting relationships, and evolve their behaviors based on accumulated experiences - all while maintaining consistent personalities shaped by their unique memory histories.

We'll see where this goes...

The art of forgetting

After months of using and testing this system across various applications, I've come to realize that the most critical design decisions aren't about what to remember–they're about what to forget. This insight runs counter to how most AI memory systems are built, but it's fundamental to how human memory actually works.

Human memory evolved with a clear survival imperative. From an evolutionary perspective, the purpose of memory "is not to allow us to sit back and say, 'Oh, do you remember that time?'" but rather "to help us make decisions." Our brains have dedicated molecular mechanisms for forgetting because without forgetting, we would have "an endless amount of useless stuff accumulating there constantly," interfering with our ability to think about things key to survival.

The breakthrough insight from my testing: it's almost more important to design around what is forgotten over what is remembered, but to do that well you need to be able to define the goal beforehand. This is easy for humans because memory's purpose is tied to survival. Every forgetting mechanism serves decision-making. AI systems lack this evolutionary constraint and need artificially designed objectives. Without clear goals, AI memory systems accumulate noise that degrades performance.

In practice, this means memory consolidation becomes as much about strategic deletion as preservation. During the sleep phase, the system doesn't just organize and strengthen important memories; it actively prunes information that no longer serves the agent's defined purpose. The deduplication and balancing steps become critical filtering mechanisms, keeping only what advances the agent's goals.

This forgetting-first approach transforms how we think about AI memory architecture. Instead of building systems that remember everything and hope to filter later, we design systems that forget strategically from the start– guided by clear objectives about what success looks like for that particular agent.

The result is more than efficiency gains. Agents with well designed forgetting mechanisms develop more coherent personalities and make better decisions because their memory reflects what actually matters for their role, not just what happened to occur.

Noise dulls intelligence, whether human or artificial. The art is in the forgetting.