Memory is the cornerstone of intelligent agents. Without it, every interaction starts from zero. This skill covers the architecture of agent memory: short-term (context window), long-term (vector stores), and the cognitive architectures that organize them.
38
24%
Does it follow best practices?
Impact
Pending
No eval scenarios have been run
Passed
No known issues
Optimize this skill with Tessl
npx tessl skill review --optimize ./skills/antigravity-agent-memory-systems/SKILL.mdMemory is the cornerstone of intelligent agents. Without it, every interaction starts from zero. This skill covers the architecture of agent memory: short-term (context window), long-term (vector stores), and the cognitive architectures that organize them.
Key insight: Memory isn't just storage - it's retrieval. A million stored facts mean nothing if you can't find the right one. Chunking, embedding, and retrieval strategies determine whether your agent remembers or forgets.
The field is fragmented with inconsistent terminology. We use the CoALA cognitive architecture framework: semantic memory (facts), episodic memory (experiences), and procedural memory (how-to knowledge).
Choosing the right memory type for different information
When to use: Designing agent memory system
""" Three memory types for different purposes:
Semantic Memory: Facts and knowledge
Episodic Memory: Experiences and events
Procedural Memory: How to do things
""" from langmem import MemoryStore from langgraph.graph import StateGraph
memory = MemoryStore( connection_string=os.environ["POSTGRES_URL"] )
await memory.semantic.upsert( namespace="user_profile", key=user_id, content={ "name": "Alice", "preferences": ["dark mode", "concise responses"], "expertise_level": "developer", } )
await memory.episodic.add( namespace="conversations", content={ "timestamp": datetime.now(), "summary": "Helped debug authentication issue", "outcome": "resolved", "key_insights": ["Token expiry was root cause"], }, metadata={"user_id": user_id, "topic": "debugging"} )
await memory.procedural.add( namespace="skills", content={ "task_type": "debug_auth", "steps": ["Check token expiry", "Verify refresh flow"], "example_interaction": few_shot_example, } ) """
""" async def prepare_context(user_id, query): # Get user profile (semantic) profile = await memory.semantic.get( namespace="user_profile", key=user_id )
# Find relevant past experiences (episodic)
similar_experiences = await memory.episodic.search(
namespace="conversations",
query=query,
filter={"user_id": user_id},
limit=3
)
# Find relevant skills (procedural)
relevant_skills = await memory.procedural.search(
namespace="skills",
query=query,
limit=2
)
return {
"profile": profile,
"past_experiences": similar_experiences,
"relevant_skills": relevant_skills,
}"""
Choosing the right vector database for your use case
When to use: Setting up persistent memory storage
""" Decision matrix:
| Pinecone | Qdrant | Weaviate | ChromaDB | pgvector | |
|---|---|---|---|---|---|
| Scale | Billions | 100M+ | 100M+ | 1M | 1M |
| Managed | Yes | Both | Both | Self | Self |
| Filtering | Basic | Best | Good | Basic | SQL |
| Hybrid | No | Yes | Best | No | Yes |
| Cost | High | Medium | Medium | Free | Free |
| Latency | 5ms | 7ms | 10ms | 20ms | 15ms |
| """ |
""" from pinecone import Pinecone
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"]) index = pc.Index("agent-memory")
index.upsert( vectors=[ { "id": f"memory-{uuid4()}", "values": embedding, "metadata": { "user_id": user_id, "timestamp": datetime.now().isoformat(), "type": "episodic", "content": memory_text, } } ], namespace=namespace )
results = index.query( vector=query_embedding, filter={"user_id": user_id, "type": "episodic"}, top_k=5, include_metadata=True ) """
""" from qdrant_client import QdrantClient from qdrant_client.models import PointStruct, Filter, FieldCondition
client = QdrantClient(url="http://localhost:6333")
results = client.search( collection_name="agent_memory", query_vector=query_embedding, query_filter=Filter( must=[ FieldCondition(key="user_id", match={"value": user_id}), FieldCondition(key="type", match={"value": "semantic"}), ], should=[ FieldCondition(key="topic", match={"any": ["auth", "security"]}), ] ), limit=5 ) """
""" import chromadb
client = chromadb.PersistentClient(path="./memory_db") collection = client.get_or_create_collection("agent_memory")
collection.add( ids=[str(uuid4())], embeddings=[embedding], documents=[memory_text], metadatas=[{"user_id": user_id, "type": "episodic"}] )
results = collection.query( query_embeddings=[query_embedding], n_results=5, where={"user_id": user_id} ) """
Breaking documents into retrievable chunks
When to use: Processing documents for memory storage
""" The chunking dilemma:
Optimal chunk size depends on:
General guidance: 256-512 tokens for most use cases """
""" from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter( chunk_size=500, # Characters chunk_overlap=50, # Overlap prevents cutting sentences separators=["\n\n", "\n", ". ", " ", ""] # Priority order )
chunks = splitter.split_text(document) """
""" from langchain_experimental.text_splitter import SemanticChunker from langchain_openai import OpenAIEmbeddings
splitter = SemanticChunker( embeddings=OpenAIEmbeddings(), breakpoint_threshold_type="percentile", breakpoint_threshold_amount=95 )
chunks = splitter.split_text(document) """
""" from langchain.text_splitter import MarkdownHeaderTextSplitter
splitter = MarkdownHeaderTextSplitter( headers_to_split_on=[ ("#", "Header 1"), ("##", "Header 2"), ("###", "Header 3"), ] )
chunks = splitter.split_text(markdown_doc)
"""
"""
def add_context_to_chunk(chunk, document_summary): context_prompt = f''' Document summary: {document_summary}
The following is a chunk from this document:
{chunk}
'''
return context_promptfor chunk in chunks: contextualized = add_context_to_chunk(chunk, summary) embedding = embed(contextualized) store(chunk, embedding) # Store original, embed contextualized """
""" from langchain.text_splitter import Language, RecursiveCharacterTextSplitter
python_splitter = RecursiveCharacterTextSplitter.from_language( language=Language.PYTHON, chunk_size=1000, chunk_overlap=200 )
chunks = python_splitter.split_text(python_code) """
Processing memories asynchronously for better quality
When to use: You want higher recall without slowing interactions
""" Real-time memory extraction slows conversations and adds complexity to agent tool calls. Background processing after conversations yields higher quality memories.
Pattern: Subconscious memory formation """
""" from langgraph.graph import StateGraph from langgraph.checkpoint.postgres import PostgresSaver
async def background_memory_processor(thread_id: str): # Run after conversation ends or goes idle conversation = await load_conversation(thread_id)
# Extract insights without time pressure
insights = await llm.invoke('''
Analyze this conversation and extract:
1. Key facts learned about the user
2. User preferences revealed
3. Tasks completed or pending
4. Patterns in user behavior
Be thorough - this runs in background.
Conversation:
{conversation}
''')
# Store to long-term memory
for insight in insights:
await memory.semantic.upsert(
namespace="user_insights",
key=generate_key(insight),
content=insight,
metadata={"source_thread": thread_id}
)@on_conversation_idle(timeout_minutes=5) async def process_conversation(thread_id): await background_memory_processor(thread_id) """
"""
async def consolidate_memories(user_id: str): # Get all memories for user memories = await memory.semantic.list( namespace="user_insights", filter={"user_id": user_id} )
# Find similar memories (potential duplicates)
clusters = cluster_by_similarity(memories, threshold=0.9)
# Merge similar memories
for cluster in clusters:
if len(cluster) > 1:
merged = await llm.invoke(f'''
Consolidate these related memories into one:
{cluster}
Preserve all important information.
''')
await memory.semantic.upsert(
namespace="user_insights",
key=generate_key(merged),
content=merged
)
# Delete originals
for old in cluster:
await memory.semantic.delete(old.id)"""
Forgetting old, irrelevant memories
When to use: Memory grows large, retrieval slows down
""" Not all memories should live forever:
Implement intelligent decay based on:
""" from datetime import datetime, timedelta
async def decay_old_memories(namespace: str, max_age_days: int): cutoff = datetime.now() - timedelta(days=max_age_days)
old_memories = await memory.episodic.list(
namespace=namespace,
filter={"last_accessed": {"$lt": cutoff.isoformat()}}
)
for mem in old_memories:
# Soft delete (mark as archived)
await memory.episodic.update(
id=mem.id,
metadata={"archived": True, "archived_at": datetime.now()}
)"""
""" def calculate_memory_utility(memory): ''' Composite utility score inspired by cognitive science: - Recency: When was it last accessed? - Frequency: How often is it accessed? - Importance: How critical is this information? ''' now = datetime.now()
# Recency score (exponential decay with 72h half-life)
hours_since_access = (now - memory.last_accessed).total_seconds() / 3600
recency_score = 0.5 ** (hours_since_access / 72)
# Frequency score
frequency_score = min(memory.access_count / 10, 1.0)
# Importance (from metadata or heuristic)
importance = memory.metadata.get("importance", 0.5)
# Weighted combination
utility = (
0.4 * recency_score +
0.3 * frequency_score +
0.3 * importance
)
return utilityasync def prune_low_utility_memories(threshold=0.2): all_memories = await memory.list_all() for mem in all_memories: if calculate_memory_utility(mem) < threshold: await memory.archive(mem.id) """
Severity: CRITICAL
Situation: Processing documents for vector storage
Symptoms: Retrieval finds chunks but they don't make sense alone. Agent answers miss the big picture. "The function returns X" retrieved without knowing which function. References to "this" without knowing what "this" refers to.
Why this breaks: When we chunk for AI processing, we're breaking connections, reducing a holistic narrative to isolated fragments that often miss the big picture. A chunk about "the configuration" without context about what system is being configured is nearly useless.
Recommended fix:
def contextualize_chunk(chunk, document): summary = summarize(document)
# LLM generates context for chunk
context = llm.invoke(f'''
Document summary: {summary}
Generate a brief context statement for this chunk
that would help someone understand what it refers to:
{chunk}
''')
return f"{context}\n\n{chunk}"for chunk in chunks: contextualized = contextualize_chunk(chunk, full_doc) embedding = embed(contextualized) # Store original chunk, embed contextualized store(original=chunk, embedding=embedding)
chunks_small = split(doc, size=256) chunks_medium = split(doc, size=512) chunks_large = split(doc, size=1024)
Severity: HIGH
Situation: Configuring chunking for memory storage
Symptoms: High-quality documents produce low-quality retrievals. Simple questions miss relevant information. Complex questions get fragments instead of complete answers.
Why this breaks: Optimal chunk size depends on query patterns:
The sweet spot varies by document type and embedding model. Default 1000 characters works for nothing specific.
Recommended fix:
from sklearn.metrics import recall_score
def evaluate_chunk_size(documents, test_queries, chunk_size): chunks = split_documents(documents, size=chunk_size) index = build_index(chunks)
correct_retrievals = 0
for query, expected_chunk in test_queries:
results = index.search(query, k=5)
if expected_chunk in results:
correct_retrievals += 1
return correct_retrievals / len(test_queries)for size in [256, 512, 768, 1024]: recall = evaluate_chunk_size(docs, test_queries, size) print(f"Size {size}: Recall@5 = {recall:.2%}")
CHUNK_SIZES = { "documentation": 512, # Complete concepts "code": 1000, # Function-level "conversation": 256, # Turn-level "articles": 768, # Paragraph-level }
splitter = RecursiveCharacterTextSplitter( chunk_size=512, chunk_overlap=50, # 10% overlap )
Severity: HIGH
Situation: Querying memory for context
Symptoms: Agent retrieves memories that seem related but aren't useful. "Tell me about the user's preferences" returns conversation about preferences in general, not this user's. High similarity scores for wrong content.
Why this breaks: Semantic similarity isn't the same as relevance. "The user likes Python" and "Python is a programming language" are semantically similar but very different types of information. Without metadata filtering, retrieval is just word matching.
Recommended fix:
results = index.query( vector=query_embedding, top_k=5 )
results = index.query( vector=query_embedding, filter={ "user_id": current_user.id, "type": "preference", "created_after": cutoff_date, }, top_k=5 )
from qdrant_client import QdrantClient
client = QdrantClient(...)
results = client.search( collection_name="memories", query_vector=semantic_embedding, query_text=query, # Also keyword match fusion={"method": "rrf"}, # Reciprocal Rank Fusion )
from sentence_transformers import CrossEncoder
reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
candidates = index.query(query_embedding, top_k=20)
pairs = [(query, c.text) for c in candidates] scores = reranker.predict(pairs) reranked = sorted(zip(candidates, scores), key=lambda x: x[1], reverse=True)
Severity: HIGH
Situation: User preferences or facts change over time
Symptoms: Agent uses outdated preferences. "User prefers dark mode" from 6 months ago overrides recent "switch to light mode" request. Agent confidently uses stale data.
Why this breaks: Vector stores don't have temporal awareness by default. A memory from a year ago has the same retrieval weight as one from today. Recent information should generally override old information for preferences and mutable facts.
Recommended fix:
from datetime import datetime, timedelta
def time_decay_score(memory, half_life_days=30): age = (datetime.now() - memory.created_at).days decay = 0.5 ** (age / half_life_days) return decay
def retrieve_with_recency(query, user_id): # Get candidates candidates = index.query( vector=embed(query), filter={"user_id": user_id}, top_k=20 )
# Apply time decay
for candidate in candidates:
time_score = time_decay_score(candidate)
candidate.final_score = candidate.similarity * 0.7 + time_score * 0.3
# Re-sort by final score
return sorted(candidates, key=lambda x: x.final_score, reverse=True)[:5]async def update_preference(user_id, category, value): # Delete old preference await memory.delete( filter={"user_id": user_id, "type": "preference", "category": category} )
# Store new preference
await memory.upsert(
id=f"pref-{user_id}-{category}",
content={"category": category, "value": value},
metadata={"updated_at": datetime.now()}
)await memory.upsert( id=f"fact-{fact_id}-v{version}", content=new_fact, metadata={ "version": version, "supersedes": previous_id, "valid_from": datetime.now() } )
Severity: MEDIUM
Situation: User has changed preferences or provided conflicting info
Symptoms: Agent retrieves "user prefers dark mode" and "user prefers light mode" in same context. Gives inconsistent answers. Seems confused or forgetful to user.
Why this breaks: Without conflict resolution, both old and new information coexist. Semantic search might return both because they're both about the same topic (preferences). Agent has no way to know which is current.
Recommended fix:
async def store_with_conflict_check(memory, user_id): # Find potentially conflicting memories similar = await index.query( vector=embed(memory.content), filter={"user_id": user_id, "type": memory.type}, threshold=0.9, # Very similar top_k=5 )
for existing in similar:
if is_contradictory(memory.content, existing.content):
# Ask for resolution
resolution = await resolve_conflict(memory, existing)
if resolution == "replace":
await index.delete(existing.id)
elif resolution == "version":
await mark_superseded(existing.id, memory.id)
await index.upsert(memory)def is_contradictory(new_content, old_content): # Use LLM to detect contradiction result = llm.invoke(f''' Do these two statements contradict each other?
Statement 1: {old_content}
Statement 2: {new_content}
Respond with just YES or NO.
''')
return result.strip().upper() == "YES"async def consolidate_memories(user_id): all_memories = await index.list(filter={"user_id": user_id}) clusters = cluster_by_topic(all_memories)
for cluster in clusters:
if has_conflicts(cluster):
resolved = await llm.invoke(f'''
These memories may conflict. Create one consolidated
memory that represents the current truth:
{cluster}
''')
await replace_cluster(cluster, resolved)Severity: MEDIUM
Situation: Retrieving too many memories at once
Symptoms: Token limit errors. Agent truncates important information. System prompt gets cut off. Retrieved memories compete with user query for space.
Why this breaks: Retrieval typically returns top-k results. If k is too high or chunks are too large, retrieved context overwhelms the window. Critical information (system prompt, recent messages) gets pushed out.
Recommended fix:
TOKEN_BUDGET = { "system_prompt": 500, "user_profile": 200, "recent_messages": 2000, "retrieved_memories": 1000, "current_query": 500, "buffer": 300, # Safety margin }
def budget_aware_retrieval(query, context_limit=4000): remaining = context_limit - TOKEN_BUDGET["system_prompt"] - TOKEN_BUDGET["buffer"]
# Prioritize recent messages
recent = get_recent_messages(limit=TOKEN_BUDGET["recent_messages"])
remaining -= count_tokens(recent)
# Then user profile
profile = get_user_profile(limit=TOKEN_BUDGET["user_profile"])
remaining -= count_tokens(profile)
# Finally retrieved memories with remaining budget
memories = retrieve_memories(query, max_tokens=remaining)
return build_context(profile, recent, memories)def retrieve_with_budget(query, max_tokens=1000): avg_chunk_tokens = 150 # From your data max_k = max_tokens // avg_chunk_tokens
results = index.query(query, top_k=max_k)
# Trim if still over budget
total_tokens = 0
filtered = []
for result in results:
tokens = count_tokens(result.text)
if total_tokens + tokens <= max_tokens:
filtered.append(result)
total_tokens += tokens
else:
break
return filteredSeverity: MEDIUM
Situation: Upgrading embedding model or mixing providers
Symptoms: Retrieval quality suddenly drops. Relevant documents not found. Random results returned. Works for new documents, fails for old.
Why this breaks: Embedding models produce different vector spaces. A query embedded with text-embedding-3 won't match documents embedded with text-ada-002. Mixing models creates garbage similarity scores.
Recommended fix:
await index.upsert( id=doc_id, vector=embedding, metadata={ "embedding_model": "text-embedding-3-small", "embedding_version": "2024-01", "content": content } )
results = index.query( vector=query_embedding, filter={"embedding_model": current_model}, top_k=10 )
async def migrate_embeddings(old_model, new_model): # Get all documents with old model old_docs = await index.list(filter={"embedding_model": old_model})
for doc in old_docs:
# Re-embed with new model
new_embedding = await embed(doc.content, model=new_model)
# Update in place
await index.update(
id=doc.id,
vector=new_embedding,
metadata={"embedding_model": new_model}
)Severity: ERROR
In-memory stores lose data on restart
Message: In-memory store detected. Use persistent storage (Postgres, Qdrant, Pinecone) for production.
Severity: WARNING
Vectors should have metadata for filtering
Message: Vector upsert without metadata. Add user_id, type, timestamp for proper filtering.
Severity: ERROR
Queries should filter by user to prevent data leakage
Message: Vector query without user filtering. Always filter by user_id to prevent data leakage.
Severity: INFO
Chunk size should be tested and justified
Message: Hardcoded chunk size. Test different sizes for your content type and measure retrieval accuracy.
Severity: WARNING
Chunk overlap prevents boundary issues
Message: Text splitting without overlap. Add chunk_overlap (10-20%) to prevent boundary issues.
Severity: WARNING
Pure semantic search often returns irrelevant results
Message: Pure semantic search. Add metadata filters (user, type, time) for better relevance.
Severity: WARNING
Unbounded retrieval can overflow context
Message: Retrieval without limit. Set top_k to prevent context overflow.
Severity: WARNING
Track embedding model to handle migrations
Message: Store embedding model version in metadata to handle model migrations.
Severity: ERROR
Documents and queries must use same embedding model
Message: Ensure same embedding model for indexing and querying.
Works well with: autonomous-agents, multi-agent-orchestration, llm-architect, agent-tool-builder
636b862
If you maintain this skill, you can claim it as your own. Once claimed, you can manage eval scenarios, bundle related skills, attach documentation or rules, and ensure cross-agent compatibility.