Supabase + pgvector for Agent Memory: The Practical Setup Guide
You're building an AI agent. It needs memory. Every tutorial points you toward Pinecone, Weaviate, or Qdrant. But you already have a database. If you're running Postgres, you already have everything you need.
Supabase with pgvector gives you vector storage, hybrid search, and the relational model you actually understand. No new infrastructure. No vendor lock-in. Just Postgres doing what it does best: storing data and finding it fast.
Analogy: Adding vector memory to Postgres is like adding a spice rack to your kitchen. You don't need a separate storage unit across town. The spices go right where you're already cooking.
This guide walks through the practical setup: schema design, embedding strategy, hybrid search patterns, and when you actually need to graduate to a dedicated vector database.
Table of Contents
- Why Supabase and pgvector Work for Agent Memory
- Schema Design for Conversational Memory
- Embedding Strategy: What to Store and When
- Hybrid Search Patterns That Actually Work
- Performance Considerations and Index Configuration
- When to Graduate Off Supabase
- Implementation Example
Why Supabase and pgvector Work for Agent Memory
Most agent memory systems need four things:
- Vector similarity search for semantic recall
- Metadata filtering for context boundaries
- Relational joins for user sessions and conversation threads
- Hybrid queries that combine keyword and semantic search
Pinecone gives you the first two. Postgres gives you all four.
Supabase adds developer experience on top: client libraries that handle embeddings, built-in auth for multi-user agents, and Edge Functions for serverless processing. You get a complete stack without stitching together five different services.
The performance trade-off is real but overblown. For agents serving hundreds or thousands of users, pgvector performs fine. You hit scaling limits around 10 million vectors with complex filters. Most agent projects never get there.
Schema Design for Conversational Memory
Agent memory isn't just a flat vector store. You need structure for conversations, user context, and temporal boundaries.
Here's a working schema:
create extension if not exists vector;
create table conversations (
id uuid primary key default gen_random_uuid(),
user_id uuid references auth.users,
created_at timestamptz default now(),
metadata jsonb
);
create table memories (
id uuid primary key default gen_random_uuid(),
conversation_id uuid references conversations,
content text not null,
embedding vector(1536),
role text check (role in ('user', 'assistant', 'system')),
created_at timestamptz default now(),
metadata jsonb
);
create index on memories using ivfflat (embedding vector_cosine_ops)
with (lists = 100);
create index on memories (conversation_id, created_at desc);
create index on memories using gin (metadata);
The conversations table groups interactions. The memories table stores individual messages with embeddings. The JSONB metadata column holds custom fields without schema changes.
Three indexes matter:
- IVFFlat vector index for similarity search
- B-tree on (conversation_id, created_at) for recency filtering
- GIN on metadata for tag-based queries
Embedding Strategy: What to Store and When
Not every message needs an embedding. Embedding every "ok" or "thanks" wastes tokens and clutters search results.
Store embeddings for:
- User questions longer than 10 words
- Assistant responses with factual content
- System messages that define behavior or preferences
Skip embeddings for:
- Acknowledgments and short confirmations
- Error messages
- Pure navigation commands
Generate embeddings async. Insert the message immediately with a null embedding, then update it in a background job. Users don't wait for OpenAI's embedding API.
// Insert without blocking
const { data: memory } = await supabase
.from('memories')
.insert({
conversation_id: conversationId,
content: userMessage,
role: 'user',
embedding: null
})
.select()
.single();
// Generate embedding async
await generateAndUpdateEmbedding(memory.id, userMessage);
Use OpenAI's text-embedding-3-small for most cases. It's fast, cheap, and 1536 dimensions work fine for conversational recall.
Hybrid Search Patterns That Actually Work
Pure vector search misses exact keyword matches. Pure keyword search misses semantic similarity. Hybrid search combines both.
Here's the pattern:
| Search Type | When to Use | Performance |
|---|---|---|
| Vector only | Semantic questions, fuzzy recall | Fast with index |
| Keyword only | Exact names, IDs, codes | Very fast |
| Hybrid | Complex queries, high precision | Moderate |
| Recency filter | Recent context, session memory | Fast with B-tree |
Implement hybrid search with RRF (Reciprocal Rank Fusion):
with vector_search as (
select id, content,
1 - (embedding <=> $1::vector) as similarity,
row_number() over (order by embedding <=> $1::vector) as rank
from memories
where conversation_id = $2
limit 20
),
keyword_search as (
select id, content,
ts_rank(to_tsvector('english', content), plainto_tsquery('english', $3)) as relevance,
row_number() over (order by ts_rank(to_tsvector('english', content), plainto_tsquery('english', $3)) desc) as rank
from memories
where conversation_id = $2
and to_tsvector('english', content) @@ plainto_tsquery('english', $3)
limit 20
)
select
coalesce(v.id, k.id) as id,
coalesce(v.content, k.content) as content,
(coalesce(1.0 / (60 + v.rank), 0.0) + coalesce(1.0 / (60 + k.rank), 0.0)) as combined_score
from vector_search v
full outer join keyword_search k on v.id = k.id
order by combined_score desc
limit 10;
This query runs both searches, ranks results separately, then combines scores. The magic number 60 prevents division by zero and balances weights.
Performance Considerations and Index Configuration
IVFFlat indexes trade accuracy for speed. The lists parameter controls the trade-off.
| Lists Value | Vectors Supported | Query Speed | Recall Quality |
|---|---|---|---|
| 10 | < 10K | Fast | 85-90% |
| 100 | 10K-100K | Moderate | 90-95% |
| 1000 | 100K-1M | Slower | 95-98% |
Start with lists = 100. Increase if you pass 100K vectors and notice degraded recall.
Monitor query performance:
explain analyze
select content, 1 - (embedding <=> $1::vector) as similarity
from memories
where conversation_id = $2
order by embedding <=> $1::vector
limit 10;
If you see sequential scans instead of index scans, your index isn't being used. Check that:
- The vector column matches the index dimension (1536)
- You're using the right distance operator (
<=>for cosine) - The index has been built (run
reindex index memories_embedding_idx)
When to Graduate Off Supabase
You need a dedicated vector database when:
- You're storing 10M+ vectors with complex metadata filters
- Query latency consistently exceeds 200ms at p95
- You need vector updates at scale (pgvector doesn't handle updates well)
- You're running multi-modal embeddings or custom distance functions
You don't need to migrate if:
- You're under 1M vectors
- Queries return in under 100ms
- You filter by conversation or user context (high selectivity)
- Your embedding model is stable
Most agent projects stay on Postgres. The ones that graduate usually hit scaling issues from bad schema design, not vector limitations.
Implementation Example
Here's a complete memory retrieval function:
async function retrieveMemory(conversationId, query, limit = 5) {
// Generate query embedding
const embeddingResponse = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: query
});
const queryEmbedding = embeddingResponse.data[0].embedding;
// Hybrid search
const { data: memories, error } = await supabase.rpc('hybrid_search', {
query_embedding: queryEmbedding,
query_text: query,
conv_id: conversationId,
match_count: limit
});
if (error) throw error;
return memories.map(m => ({
content: m.content,
role: m.role,
similarity: m.combined_score,
timestamp: m.created_at
}));
}
Store the RPC function in Supabase:
create or replace function hybrid_search(
query_embedding vector(1536),
query_text text,
conv_id uuid,
match_count int
)
returns table (
id uuid,
content text,
role text,
combined_score float,
created_at timestamptz
)
language sql
as $$, paste hybrid search query from above
$$;
This setup handles most agent memory needs: semantic search, recency bias, metadata filtering, and reasonable performance up to hundreds of thousands of conversations.
Conclusion
You don't need Pinecone for agent memory. Supabase and pgvector give you vector search in the database you already understand. Start with a clean schema, embed selectively, use hybrid search for precision, and monitor performance as you scale.
Graduate to a dedicated vector database when you have evidence of performance problems, not because a benchmark blog post scared you. Most projects never need to.
Your agent memory should be boring infrastructure. Supabase makes it boring in the best way.