Add Persistent Memory to a Python AI Agent
Most AI agents have the memory of a goldfish. They process a task, return a result, and forget everything the moment the session ends. In this tutorial, you will build a Python AI agent that remembers past interactions, learns from previous tasks, and retrieves relevant context automatically using PersistMemory's REST API.
By the end of this guide, you will have a fully working agent that stores observations as memories, searches for relevant context before responding, and organizes knowledge into isolated namespaces. The entire integration adds fewer than 80 lines of Python.
Prerequisites
Before you start, make sure you have the following ready:
- Python 3.8+ installed on your system
- requests library (we will install it in Step 1)
- PersistMemory API key — sign up free at persistmemory.com
- An OpenAI API key (or any LLM provider) for the agent's reasoning
Step 1: Install Dependencies
We only need two packages: requests for HTTP calls to the PersistMemory API and openai for the agent's LLM backbone. Install both in a virtual environment:
python -m venv agent-env source agent-env/bin/activate # Windows: agent-env\Scripts\activate pip install requests openai
Step 2: Configure the Memory Client
Create a lightweight wrapper around the PersistMemory REST API. This class handles authentication, storing memories, and searching with semantic queries. Save it as memory_client.py:
import requests
from typing import Optional
class MemoryClient:
"""Thin wrapper around the PersistMemory REST API."""
BASE_URL = "https://backend.persistmemory.com"
def __init__(self, api_key: str, space: str = "default"):
self.headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
}
self.space = space
def store(self, content: str, metadata: Optional[dict] = None) -> dict:
"""Store a new memory. Returns the created memory object."""
payload = {
"space": self.space,
"title": (content[:60] + "...") if len(content) > 60 else content,
"text": content,
}
resp = requests.post(
f"{self.BASE_URL}/mcp/addMemory",
json=payload,
headers=self.headers,
)
resp.raise_for_status()
return resp.json()
def search(self, query: str, limit: int = 5) -> list[dict]:
"""Semantic search across stored memories."""
resp = requests.post(
f"{self.BASE_URL}/mcp/search",
json={"space": self.space, "q": query, "top_k": limit},
headers=self.headers,
)
resp.raise_for_status()
return resp.json().get("memories", [])
def list_all(self, limit: int = 20) -> list[dict]:
"""List recent memories in this space."""
resp = requests.get(
f"{self.BASE_URL}/mcp/fetchMessages",
params={"space": self.space, "limit": limit},
headers=self.headers,
)
resp.raise_for_status()
return resp.json().get("memories", [])
def delete(self, space_id: str) -> None:
"""Delete a space by ID."""
resp = requests.delete(
f"{self.BASE_URL}/spaces/{space_id}",
headers=self.headers,
)
resp.raise_for_status()The client sends your API key as a Bearer token in every request. All memories are scoped to a space, which acts as a namespace. Different agents or projects can use separate spaces to keep their knowledge isolated.
Step 3: Store Memories
Memories are plain-text strings that PersistMemory automatically embeds into vectors for semantic search. You can attach optional metadata (tags, source, timestamps) for filtering later. Here is how to store an agent observation:
import os
from memory_client import MemoryClient
memory = MemoryClient(
api_key=os.environ["PERSISTMEMORY_API_KEY"],
space="research-agent",
)
# Store an observation with metadata
memory.store(
content="User prefers concise answers with code examples. "
"They work primarily in Python and deploy to AWS Lambda.",
metadata={
"type": "user_preference",
"source": "onboarding_conversation",
"confidence": 0.95,
},
)
# Store a factual finding
memory.store(
content="The company's production database is PostgreSQL 16 "
"running on RDS in us-east-1. Connection pooling uses PgBouncer.",
metadata={
"type": "infrastructure",
"project": "backend-api",
},
)Each call to memory.store() creates a new memory entry. PersistMemory handles embedding generation, vector indexing, and deduplication server-side, so you do not need to worry about managing embeddings yourself.
Step 4: Search Memories
The search endpoint uses vector similarity to find memories that are semantically related to your query, not just keyword matches. This means a search for "database setup" will find memories about PostgreSQL, RDS, and connection pooling even if those exact words are not in the query.
# Search for relevant context before the agent responds
results = memory.search("What database does the team use?", limit=3)
for mem in results:
print(f"Score: {mem['score']:.3f}")
print(f"Content: {mem['content']}")
print(f"Metadata: {mem.get('metadata', {})}")
print("---")
# Example output:
# Score: 0.924
# Content: The company's production database is PostgreSQL 16 ...
# Metadata: {'type': 'infrastructure', 'project': 'backend-api'}
# ---
# Score: 0.712
# Content: User prefers concise answers with code examples ...
# Metadata: {'type': 'user_preference', 'source': 'onboarding_conversation'}Each result includes a score between 0 and 1 indicating semantic similarity. Scores above 0.8 are typically strong matches. You can use this score to filter out low-relevance results before injecting them into the agent's prompt.
Step 5: Build the Agent Loop
Now let's wire everything together into a complete agent. The pattern is simple: before responding to the user, the agent searches memory for relevant context. After responding, it stores any new information worth remembering.
import os
from openai import OpenAI
from memory_client import MemoryClient
llm = OpenAI(api_key=os.environ["OPENAI_API_KEY"])
memory = MemoryClient(
api_key=os.environ["PERSISTMEMORY_API_KEY"],
space="assistant-agent",
)
SYSTEM_PROMPT = """You are a helpful assistant with persistent memory.
You have access to memories from previous conversations.
When you learn something new about the user or their projects,
indicate it with [REMEMBER: ...] so it can be stored.
"""
def build_context(user_message: str) -> str:
"""Search memory and format results as context."""
results = memory.search(user_message, limit=5)
if not results:
return "No relevant memories found."
lines = ["Relevant memories from previous sessions:"]
for mem in results:
if mem["score"] > 0.6:
lines.append(f"- {mem['content']}")
return "\n".join(lines)
def extract_memories(response_text: str) -> list[str]:
"""Pull out [REMEMBER: ...] tags from the agent response."""
memories = []
for line in response_text.split("\n"):
if "[REMEMBER:" in line:
start = line.index("[REMEMBER:") + len("[REMEMBER:")
end = line.index("]", start)
memories.append(line[start:end].strip())
return memories
def chat(user_message: str) -> str:
# 1. Retrieve relevant context
context = build_context(user_message)
# 2. Call the LLM with memory-augmented prompt
response = llm.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "system", "content": context},
{"role": "user", "content": user_message},
],
temperature=0.7,
)
reply = response.choices[0].message.content
# 3. Store any new memories the agent flagged
for mem_content in extract_memories(reply):
memory.store(
content=mem_content,
metadata={"type": "agent_observation", "source": "chat"},
)
# 4. Always store the interaction summary
memory.store(
content=f"User asked: {user_message[:200]}",
metadata={"type": "interaction_log"},
)
return reply
# Run the agent in a loop
if __name__ == "__main__":
print("Agent ready. Type 'quit' to exit.\n")
while True:
user_input = input("You: ")
if user_input.lower() in ("quit", "exit"):
break
response = chat(user_input)
print(f"\nAgent: {response}\n")This agent follows a three-phase loop: retrieve relevant memories, reason with the LLM using those memories as context, and store new observations for future sessions. Each conversation makes the agent smarter because it accumulates knowledge over time.
Step 6: Namespace Isolation
If you run multiple agents or work on different projects, you can isolate memories using spaces. Each space is a completely separate namespace with its own vector index. Memories stored in one space are invisible to searches in another.
# Create separate memory clients per project
work_memory = MemoryClient(
api_key=os.environ["PERSISTMEMORY_API_KEY"],
space="work-projects",
)
personal_memory = MemoryClient(
api_key=os.environ["PERSISTMEMORY_API_KEY"],
space="personal-assistant",
)
# Memories are completely isolated
work_memory.store("Sprint 14 goal: migrate auth service to OAuth 2.1")
personal_memory.store("User's favorite restaurant is Sushi Nakazawa")
# This search only returns work-related memories
results = work_memory.search("authentication migration")
# This search only returns personal memories
results = personal_memory.search("dinner recommendations")Spaces are created automatically when you first store a memory with a new space name. There is no limit on the number of spaces you can create. Use descriptive names likeproject-backend oragent-customer-support to keep things organized.
Next Steps
You now have a Python agent with persistent memory. Here are some ways to take it further:
MCP Server for Claude Desktop
Connect the same memory backend to Claude Desktop for a GUI experience.
API Quickstart
Full API reference with curl, Python, and TypeScript examples.
AI Agent Memory
Learn about memory architectures and patterns for production agents.
Deep Dive: Agent Memory
Comprehensive guide covering vector search, embeddings, and advanced patterns.
Start Building with PersistMemory
Create a free account, grab your API key, and give your Python agent the memory it deserves.