What is PersistMemory?

PersistMemory is the best AI memory platform that gives any AI assistant — ChatGPT, Claude, Cursor, Copilot, Windsurf, Cline, Gemini — persistent, searchable memory. It uses vector-powered semantic search, supports file processing (PDF, DOCX, images, audio), and works with any MCP-compatible tool.

How does AI memory work?

PersistMemory stores your conversations, documents, and knowledge as vector embeddings. When your AI needs context, it performs semantic search to find the most relevant memories. This gives your AI long-term memory that persists across sessions.

Which AI tools does PersistMemory support?

PersistMemory works with all major AI tools including ChatGPT, Claude, Claude Desktop, Cursor, GitHub Copilot, Windsurf, Cline, Gemini, and any MCP-compatible AI assistant. It works with every IDE including VS Code, JetBrains, Neovim, and Zed.

Is PersistMemory free?

Yes! PersistMemory is free to start. Create an account, get your API key instantly, and start giving your AI persistent memory. No credit card required.

What is MCP (Model Context Protocol)?

MCP (Model Context Protocol) is an open standard that allows AI assistants to connect to external tools and data sources. PersistMemory provides an MCP server that any compatible AI client can use to access persistent memory.

PythonTutorialBeginner

Add Persistent Memory to a Python AI Agent

March 10, 202612 min readBy Mohammad Saquib Daiyan

Most AI agents have the memory of a goldfish. They process a task, return a result, and forget everything the moment the session ends. In this tutorial, you will build a Python AI agent that remembers past interactions, learns from previous tasks, and retrieves relevant context automatically using PersistMemory's REST API.

By the end of this guide, you will have a fully working agent that stores observations as memories, searches for relevant context before responding, and organizes knowledge into isolated namespaces. The entire integration adds fewer than 80 lines of Python.

Prerequisites

Before you start, make sure you have the following ready:

Python 3.8+ installed on your system
requests library (we will install it in Step 1)
PersistMemory API key — sign up free at persistmemory.com
An OpenAI API key (or any LLM provider) for the agent's reasoning

Step 1: Install Dependencies

We only need two packages: requests for HTTP calls to the PersistMemory API and openai for the agent's LLM backbone. Install both in a virtual environment:

python -m venv agent-env
source agent-env/bin/activate   # Windows: agent-env\Scripts\activate

pip install requests openai

Step 2: Configure the Memory Client

Create a lightweight wrapper around the PersistMemory REST API. This class handles authentication, storing memories, and searching with semantic queries. Save it as memory_client.py:

import requests
from typing import Optional


class MemoryClient:
    """Thin wrapper around the PersistMemory REST API."""

    BASE_URL = "https://backend.persistmemory.com"

    def __init__(self, api_key: str, space: str = "default"):
        self.headers = {
            "Authorization": f"Bearer {api_key}",
            "Content-Type": "application/json",
        }
        self.space = space

    def store(self, content: str, metadata: Optional[dict] = None) -> dict:
        """Store a new memory. Returns the created memory object."""
        payload = {
            "space": self.space,
            "title": (content[:60] + "...") if len(content) > 60 else content,
            "text": content,
        }
        resp = requests.post(
            f"{self.BASE_URL}/mcp/addMemory",
            json=payload,
            headers=self.headers,
        )
        resp.raise_for_status()
        return resp.json()

    def search(self, query: str, limit: int = 5) -> list[dict]:
        """Semantic search across stored memories."""
        resp = requests.post(
            f"{self.BASE_URL}/mcp/search",
            json={"space": self.space, "q": query, "top_k": limit},
            headers=self.headers,
        )
        resp.raise_for_status()
        return resp.json().get("memories", [])

    def list_all(self, limit: int = 20) -> list[dict]:
        """List recent memories in this space."""
        resp = requests.get(
            f"{self.BASE_URL}/mcp/fetchMessages",
            params={"space": self.space, "limit": limit},
            headers=self.headers,
        )
        resp.raise_for_status()
        return resp.json().get("memories", [])

    def delete(self, space_id: str) -> None:
        """Delete a space by ID."""
        resp = requests.delete(
            f"{self.BASE_URL}/spaces/{space_id}",
            headers=self.headers,
        )
        resp.raise_for_status()

The client sends your API key as a Bearer token in every request. All memories are scoped to a space, which acts as a namespace. Different agents or projects can use separate spaces to keep their knowledge isolated.

Step 3: Store Memories

Memories are plain-text strings that PersistMemory automatically embeds into vectors for semantic search. You can attach optional metadata (tags, source, timestamps) for filtering later. Here is how to store an agent observation:

import os
from memory_client import MemoryClient

memory = MemoryClient(
    api_key=os.environ["PERSISTMEMORY_API_KEY"],
    space="research-agent",
)

# Store an observation with metadata
memory.store(
    content="User prefers concise answers with code examples. "
            "They work primarily in Python and deploy to AWS Lambda.",
    metadata={
        "type": "user_preference",
        "source": "onboarding_conversation",
        "confidence": 0.95,
    },
)

# Store a factual finding
memory.store(
    content="The company's production database is PostgreSQL 16 "
            "running on RDS in us-east-1. Connection pooling uses PgBouncer.",
    metadata={
        "type": "infrastructure",
        "project": "backend-api",
    },
)

Each call to memory.store() creates a new memory entry. PersistMemory handles embedding generation, vector indexing, and deduplication server-side, so you do not need to worry about managing embeddings yourself.

Step 4: Search Memories

The search endpoint uses vector similarity to find memories that are semantically related to your query, not just keyword matches. This means a search for "database setup" will find memories about PostgreSQL, RDS, and connection pooling even if those exact words are not in the query.

# Search for relevant context before the agent responds
results = memory.search("What database does the team use?", limit=3)

for mem in results:
    print(f"Score: {mem['score']:.3f}")
    print(f"Content: {mem['content']}")
    print(f"Metadata: {mem.get('metadata', {})}")
    print("---")

# Example output:
# Score: 0.924
# Content: The company's production database is PostgreSQL 16 ...
# Metadata: {'type': 'infrastructure', 'project': 'backend-api'}
# ---
# Score: 0.712
# Content: User prefers concise answers with code examples ...
# Metadata: {'type': 'user_preference', 'source': 'onboarding_conversation'}

Each result includes a score between 0 and 1 indicating semantic similarity. Scores above 0.8 are typically strong matches. You can use this score to filter out low-relevance results before injecting them into the agent's prompt.

Step 5: Build the Agent Loop

Now let's wire everything together into a complete agent. The pattern is simple: before responding to the user, the agent searches memory for relevant context. After responding, it stores any new information worth remembering.

import os
from openai import OpenAI
from memory_client import MemoryClient

llm = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
memory = MemoryClient(
    api_key=os.environ["PERSISTMEMORY_API_KEY"],
    space="assistant-agent",
)

SYSTEM_PROMPT = """You are a helpful assistant with persistent memory.
You have access to memories from previous conversations.
When you learn something new about the user or their projects,
indicate it with [REMEMBER: ...] so it can be stored.
"""


def build_context(user_message: str) -> str:
    """Search memory and format results as context."""
    results = memory.search(user_message, limit=5)
    if not results:
        return "No relevant memories found."
    lines = ["Relevant memories from previous sessions:"]
    for mem in results:
        if mem["score"] > 0.6:
            lines.append(f"- {mem['content']}")
    return "\n".join(lines)


def extract_memories(response_text: str) -> list[str]:
    """Pull out [REMEMBER: ...] tags from the agent response."""
    memories = []
    for line in response_text.split("\n"):
        if "[REMEMBER:" in line:
            start = line.index("[REMEMBER:") + len("[REMEMBER:")
            end = line.index("]", start)
            memories.append(line[start:end].strip())
    return memories


def chat(user_message: str) -> str:
    # 1. Retrieve relevant context
    context = build_context(user_message)

    # 2. Call the LLM with memory-augmented prompt
    response = llm.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "system", "content": context},
            {"role": "user", "content": user_message},
        ],
        temperature=0.7,
    )
    reply = response.choices[0].message.content

    # 3. Store any new memories the agent flagged
    for mem_content in extract_memories(reply):
        memory.store(
            content=mem_content,
            metadata={"type": "agent_observation", "source": "chat"},
        )

    # 4. Always store the interaction summary
    memory.store(
        content=f"User asked: {user_message[:200]}",
        metadata={"type": "interaction_log"},
    )

    return reply


# Run the agent in a loop
if __name__ == "__main__":
    print("Agent ready. Type 'quit' to exit.\n")
    while True:
        user_input = input("You: ")
        if user_input.lower() in ("quit", "exit"):
            break
        response = chat(user_input)
        print(f"\nAgent: {response}\n")

This agent follows a three-phase loop: retrieve relevant memories, reason with the LLM using those memories as context, and store new observations for future sessions. Each conversation makes the agent smarter because it accumulates knowledge over time.

Step 6: Namespace Isolation

If you run multiple agents or work on different projects, you can isolate memories using spaces. Each space is a completely separate namespace with its own vector index. Memories stored in one space are invisible to searches in another.

# Create separate memory clients per project
work_memory = MemoryClient(
    api_key=os.environ["PERSISTMEMORY_API_KEY"],
    space="work-projects",
)

personal_memory = MemoryClient(
    api_key=os.environ["PERSISTMEMORY_API_KEY"],
    space="personal-assistant",
)

# Memories are completely isolated
work_memory.store("Sprint 14 goal: migrate auth service to OAuth 2.1")
personal_memory.store("User's favorite restaurant is Sushi Nakazawa")

# This search only returns work-related memories
results = work_memory.search("authentication migration")

# This search only returns personal memories
results = personal_memory.search("dinner recommendations")

Spaces are created automatically when you first store a memory with a new space name. There is no limit on the number of spaces you can create. Use descriptive names likeproject-backend oragent-customer-support to keep things organized.

Next Steps

You now have a Python agent with persistent memory. Here are some ways to take it further:

MCP Server for Claude Desktop

Connect the same memory backend to Claude Desktop for a GUI experience.

API Quickstart

Full API reference with curl, Python, and TypeScript examples.

AI Agent Memory

Learn about memory architectures and patterns for production agents.

Deep Dive: Agent Memory

Comprehensive guide covering vector search, embeddings, and advanced patterns.

Start Building with PersistMemory

Create a free account, grab your API key, and give your Python agent the memory it deserves.

Get Your API Key Browse All Tutorials