Supabase + pgvector for Agent Memory: The Practical Setup Guide

AIHelpTools TeamMay 21, 2026

supabasepgvectoragent-memorypostgresvector-search

Supabase + pgvector for Agent Memory: The Practical Setup Guide

You're building an AI agent. It needs memory. Every tutorial points you toward Pinecone, Weaviate, or Qdrant. But you already have a database. If you're running Postgres, you already have everything you need.

Supabase with pgvector gives you vector storage, hybrid search, and the relational model you actually understand. No new infrastructure. No vendor lock-in. Just Postgres doing what it does best: storing data and finding it fast.

Analogy: Adding vector memory to Postgres is like adding a spice rack to your kitchen. You don't need a separate storage unit across town. The spices go right where you're already cooking.

This guide walks through the practical setup: schema design, embedding strategy, hybrid search patterns, and when you actually need to graduate to a dedicated vector database.

Why Supabase and pgvector Work for Agent Memory
Schema Design for Conversational Memory
Embedding Strategy: What to Store and When
Hybrid Search Patterns That Actually Work
Performance Considerations and Index Configuration
When to Graduate Off Supabase
Implementation Example

Why Supabase and pgvector Work for Agent Memory

Most agent memory systems need four things:

Vector similarity search for semantic recall
Metadata filtering for context boundaries
Relational joins for user sessions and conversation threads
Hybrid queries that combine keyword and semantic search

Pinecone gives you the first two. Postgres gives you all four.

Supabase adds developer experience on top: client libraries that handle embeddings, built-in auth for multi-user agents, and Edge Functions for serverless processing. You get a complete stack without stitching together five different services.

The performance trade-off is real but overblown. For agents serving hundreds or thousands of users, pgvector performs fine. You hit scaling limits around 10 million vectors with complex filters. Most agent projects never get there.

Schema Design for Conversational Memory

Agent memory isn't just a flat vector store. You need structure for conversations, user context, and temporal boundaries.

Here's a working schema:

create extension if not exists vector;

create table conversations (
  id uuid primary key default gen_random_uuid(),
  user_id uuid references auth.users,
  created_at timestamptz default now(),
  metadata jsonb
);

create table memories (
  id uuid primary key default gen_random_uuid(),
  conversation_id uuid references conversations,
  content text not null,
  embedding vector(1536),
  role text check (role in ('user', 'assistant', 'system')),
  created_at timestamptz default now(),
  metadata jsonb
);

create index on memories using ivfflat (embedding vector_cosine_ops)
  with (lists = 100);

create index on memories (conversation_id, created_at desc);
create index on memories using gin (metadata);

The conversations table groups interactions. The memories table stores individual messages with embeddings. The JSONB metadata column holds custom fields without schema changes.

Three indexes matter:

IVFFlat vector index for similarity search
B-tree on (conversation_id, created_at) for recency filtering
GIN on metadata for tag-based queries

Embedding Strategy: What to Store and When

Not every message needs an embedding. Embedding every "ok" or "thanks" wastes tokens and clutters search results.

Store embeddings for:

User questions longer than 10 words
Assistant responses with factual content
System messages that define behavior or preferences

Skip embeddings for:

Acknowledgments and short confirmations
Error messages
Pure navigation commands

Generate embeddings async. Insert the message immediately with a null embedding, then update it in a background job. Users don't wait for OpenAI's embedding API.

// Insert without blocking
const { data: memory } = await supabase
  .from('memories')
  .insert({
    conversation_id: conversationId,
    content: userMessage,
    role: 'user',
    embedding: null
  })
  .select()
  .single();

// Generate embedding async
await generateAndUpdateEmbedding(memory.id, userMessage);

Use OpenAI's text-embedding-3-small for most cases. It's fast, cheap, and 1536 dimensions work fine for conversational recall.

Hybrid Search Patterns That Actually Work

Pure vector search misses exact keyword matches. Pure keyword search misses semantic similarity. Hybrid search combines both.

Here's the pattern:

Search Type	When to Use	Performance
Vector only	Semantic questions, fuzzy recall	Fast with index
Keyword only	Exact names, IDs, codes	Very fast
Hybrid	Complex queries, high precision	Moderate
Recency filter	Recent context, session memory	Fast with B-tree

Implement hybrid search with RRF (Reciprocal Rank Fusion):

with vector_search as (
  select id, content, 
    1 - (embedding <=> $1::vector) as similarity,
    row_number() over (order by embedding <=> $1::vector) as rank
  from memories
  where conversation_id = $2
  limit 20
),
keyword_search as (
  select id, content,
    ts_rank(to_tsvector('english', content), plainto_tsquery('english', $3)) as relevance,
    row_number() over (order by ts_rank(to_tsvector('english', content), plainto_tsquery('english', $3)) desc) as rank
  from memories
  where conversation_id = $2
    and to_tsvector('english', content) @@ plainto_tsquery('english', $3)
  limit 20
)
select 
  coalesce(v.id, k.id) as id,
  coalesce(v.content, k.content) as content,
  (coalesce(1.0 / (60 + v.rank), 0.0) + coalesce(1.0 / (60 + k.rank), 0.0)) as combined_score
from vector_search v
full outer join keyword_search k on v.id = k.id
order by combined_score desc
limit 10;

This query runs both searches, ranks results separately, then combines scores. The magic number 60 prevents division by zero and balances weights.

Performance Considerations and Index Configuration

IVFFlat indexes trade accuracy for speed. The lists parameter controls the trade-off.

Lists Value	Vectors Supported	Query Speed	Recall Quality
10	< 10K	Fast	85-90%
100	10K-100K	Moderate	90-95%
1000	100K-1M	Slower	95-98%

Start with lists = 100. Increase if you pass 100K vectors and notice degraded recall.

Monitor query performance:

explain analyze
select content, 1 - (embedding <=> $1::vector) as similarity
from memories
where conversation_id = $2
order by embedding <=> $1::vector
limit 10;

If you see sequential scans instead of index scans, your index isn't being used. Check that:

The vector column matches the index dimension (1536)
You're using the right distance operator (<=> for cosine)
The index has been built (run reindex index memories_embedding_idx)

When to Graduate Off Supabase

You need a dedicated vector database when:

You're storing 10M+ vectors with complex metadata filters
Query latency consistently exceeds 200ms at p95
You need vector updates at scale (pgvector doesn't handle updates well)
You're running multi-modal embeddings or custom distance functions

You don't need to migrate if:

You're under 1M vectors
Queries return in under 100ms
You filter by conversation or user context (high selectivity)
Your embedding model is stable

Most agent projects stay on Postgres. The ones that graduate usually hit scaling issues from bad schema design, not vector limitations.

Implementation Example

Here's a complete memory retrieval function:

async function retrieveMemory(conversationId, query, limit = 5) {
  // Generate query embedding
  const embeddingResponse = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: query
  });
  const queryEmbedding = embeddingResponse.data[0].embedding;

  // Hybrid search
  const { data: memories, error } = await supabase.rpc('hybrid_search', {
    query_embedding: queryEmbedding,
    query_text: query,
    conv_id: conversationId,
    match_count: limit
  });

  if (error) throw error;

  return memories.map(m => ({
    content: m.content,
    role: m.role,
    similarity: m.combined_score,
    timestamp: m.created_at
  }));
}

Store the RPC function in Supabase:

create or replace function hybrid_search(
  query_embedding vector(1536),
  query_text text,
  conv_id uuid,
  match_count int
)
returns table (
  id uuid,
  content text,
  role text,
  combined_score float,
  created_at timestamptz
)
language sql
as $$, paste hybrid search query from above
$$;

This setup handles most agent memory needs: semantic search, recency bias, metadata filtering, and reasonable performance up to hundreds of thousands of conversations.

Conclusion

You don't need Pinecone for agent memory. Supabase and pgvector give you vector search in the database you already understand. Start with a clean schema, embed selectively, use hybrid search for precision, and monitor performance as you scale.

Graduate to a dedicated vector database when you have evidence of performance problems, not because a benchmark blog post scared you. Most projects never need to.

Your agent memory should be boring infrastructure. Supabase makes it boring in the best way.

Supabase + pgvector for Agent Memory: The Practical Setup Guide

Supabase + pgvector for Agent Memory: The Practical Setup Guide

Table of Contents

Why Supabase and pgvector Work for Agent Memory

Schema Design for Conversational Memory

Embedding Strategy: What to Store and When

Hybrid Search Patterns That Actually Work

Performance Considerations and Index Configuration

When to Graduate Off Supabase

Implementation Example

Conclusion