Postgres Semantic Search: pgvector Guide

The standard PostgreSQL Full Text Search (tsvector) is powerful, but it has a flaw: it relies on exact keyword matching. If a user searches for “automobile” but your database only contains “car,” standard search returns nothing.

Enter Semantic Search. By converting text into vectors (arrays of numbers representing meaning) and storing them in Postgres, we can find records that are conceptually similar, even if they don’t share a single word.

Here is how to implement this in Supabase using the pgvector extension.

Step 1: Enable the Extension

First, we need to enable pgvector in our Postgres instance. In the Supabase SQL editor, run:

create extension if not exists vector;

This unlocks the vector data type, which stores arrays of floating-point numbers efficiently.

Step 2: Define the Schema

We need a column to store the vector embeddings. If you are using OpenAI’s text-embedding-3-small model, the output dimension is 1536.

Option A: Create a new table from scratch:

create table documents (
  id bigserial primary key,
  content text,
  embedding vector(1536) -- Matches OpenAI embedding size
);

Option B: Add to an existing table:

alter table blog_posts
add column embedding vector(1536);

Step 3: Indexing for Performance

Searching through thousands of vectors can be slow. We can speed this up using an HNSW (Hierarchical Navigable Small World) index, which is highly efficient for approximate nearest neighbor search.

create index on documents using hnsw (embedding vector_cosine_ops);

This index enables fast similarity searches even with millions of vectors. The vector_cosine_ops operator class optimizes for cosine distance, which is the standard metric for semantic similarity.

Step 4: Generate Embeddings

We need to convert our blog post content into vectors. This is typically done via an Edge Function that calls OpenAI’s API:

import { OpenAI } from "npm:openai@4";

const openai = new OpenAI({
  apiKey: Deno.env.get("OPENAI_API_KEY"),
});

const response = await openai.embeddings.create({
  model: "text-embedding-3-small",
  input: blogPostContent,
});

const embedding = response.data[0].embedding;

Step 5: Store the Embeddings

Insert the embedding alongside your post:

update blog_posts
set embedding = $1
where id = $2;

For new records, you can include the embedding during insertion:

insert into documents (content, embedding)
values ($1, $2);

Step 6: Query by Similarity (Cosine Distance)

Now the magic happens. When a user searches, we convert their query into an embedding and find the closest matches.

To find similar content, we compare the user’s search query vector against our stored document vectors. We want the items with the closest distance.

In Postgres, the <=> operator represents cosine distance. We wrap this in a Remote Procedure Call (RPC) for easy access via the Supabase JS Client:

create or replace function match_documents (
  query_embedding vector(1536),
  match_threshold float,
  match_count int
)
returns setof documents
language plpgsql
as $$
begin
  return query
  select *
  from documents
  where 1 - (documents.embedding <=> query_embedding) > match_threshold
  order by documents.embedding <=> query_embedding
  limit match_count;
end;
$$;

Now you can call this from your frontend:

const queryEmbedding = await getEmbedding(userSearchQuery);

const { data, error } = await supabase.rpc('match_documents', {
  query_embedding: queryEmbedding,
  match_threshold: 0.78,
  match_count: 10
});

The function returns documents ordered by similarity, with the most relevant results first.

Performance Considerations

Cost: OpenAI charges per token. Cache embeddings aggressively.
Latency: Pre-compute embeddings during content creation, not at query time.
Hybrid Search: Combine semantic search with traditional filters (date, category) for best results.

Conclusion

Semantic search transforms user experience by understanding intent, not just keywords. With pgvector, Postgres becomes a vector database without needing a separate service.

In production, consider implementing incremental embedding updates and monitoring vector index performance as your dataset grows.

Beyond Keyword Search: Implementing Semantic Search in Postgres using pgvector

Step 1: Enable the Extension

Step 2: Define the Schema

Step 3: Indexing for Performance

Step 4: Generate Embeddings

Step 5: Store the Embeddings

Step 6: Query by Similarity (Cosine Distance)

Performance Considerations

Conclusion

Tags

Comments

Join 500+ Devs Scaling Supabase Right Now

Step 1: Enable the Extension

Step 2: Define the Schema

Step 3: Indexing for Performance

Step 4: Generate Embeddings

Step 5: Store the Embeddings

Step 6: Query by Similarity (Cosine Distance)

Performance Considerations

Conclusion

Join 500+ Devs Scaling Supabase Right Now

Tags

Share this article

Comments

Related Articles

Understanding Graph Neural Networks (GNNs) and Their Role in Fraud Detection

LLMs on Your Laptop: A Technical Guide to Quantization and Local Deployment