Memory

A chat that forgets everything between sessions isn’t very useful. Users expect the AI to remember what they’ve talked about before — their preferences, past questions, and ongoing projects. The SDK provides on-demand memory that searches past conversations using semantic similarity, so the model can recall relevant context without you building a separate memory system.

How It Works

Memory in the SDK isn’t a separate database or extraction step. Your conversation messages are the memory. When useChatStorage saves a message, it automatically generates an embedding vector for that message and stores it alongside the text. Later, when the model needs to recall something, it searches across all stored messages by comparing embedding vectors using cosine similarity — finding past messages that are semantically related to the current question, even if they use different words.

This happens through createMemoryRetrievalTool, a client-side tool that the model can call during a conversation. When the model decides it needs past context (for example, “what did the user say about their budget?”), it calls the tool with a search query. The SDK embeds that query, compares it against stored message embeddings, and returns the most relevant matches. The model then incorporates those memories into its response.

Setup

Embedding generation is enabled by default in useChatStorage. Messages are embedded automatically after saving, as long as they meet a minimum content length (short messages like “ok” or “thanks” are skipped since they carry little semantic value).


const { sendMessage, createMemoryRetrievalTool } = useChatStorage({
  database,
  getToken,
  autoEmbedMessages: true, // default
});

To give the model access to memory, create the retrieval tool and pass it as a client tool when sending messages:


const memoryTool = createMemoryRetrievalTool({ limit: 5 });
 
await sendMessage({
  content: "What were we discussing last week?",
  clientTools: [memoryTool],
});

The model decides when to use the tool. If the user asks something that might benefit from past context, the model calls the memory tool, gets relevant messages back, and weaves that information into its response. If the question is self-contained, it skips the tool entirely.

Search Options

You can configure how memory search behaves through MemoryRetrievalSearchOptions. The limit parameter controls how many results come back (default 8), and minSimilarity sets a threshold between 0 and 1 for how closely a past message must match the query (default 0.3). Setting excludeConversationId to the current conversation prevents the model from “remembering” things that were just said moments ago — it’s already aware of those through conversation history.


const memoryTool = createMemoryRetrievalTool({
  limit: 5,
  minSimilarity: 0.4,
  excludeConversationId: conversationId,
});

Results can be sorted by similarity (most relevant first, the default) or chronological (oldest first), depending on whether recency or relevance matters more for your use case.