语义搜索

使用 pgvector 和 Supabase 边缘函数实现语义搜索

语义搜索能够理解用户查询背后的含义，而非仅匹配精确的关键词。它通过机器学习捕捉查询的意图和上下文，处理语言中的细微差异如同义词、表达变化和词语关联。

自Supabase Edge Runtime v1.36.0起，您可以直接在Supabase边缘函数中本地运行gte-small模型，无需任何外部依赖！这使得生成文本嵌入向量时无需调用外部API！

本教程将实现三个部分：

一个generate-embedding数据库webhook边缘函数，当public.embeddings表中新增（或更新）内容行时生成嵌入向量
一个query_embeddings Postgres函数，允许我们通过远程过程调用(RPC)从边缘函数执行相似性搜索
一个search边缘函数，为搜索词生成嵌入向量，通过RPC函数调用执行相似性搜索并返回结果

完整示例代码可在GitHub上查看

创建数据库表和Webhook

1
2
3
4
5
6
7
8
9
10
create extension if not exists vector with schema extensions;create table embeddings (  id bigint primary key generated always as identity,  content text not null,  embedding vector (384));alter table embeddings enable row level security;create index on embeddings using hnsw (embedding vector_ip_ops);

您可以部署以下边缘函数作为数据库webhook，为插入表中的任何文本内容生成嵌入向量：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
const model = new Supabase.ai.Session('gte-small')Deno.serve(async (req) => {  const payload: WebhookPayload = await req.json()  const { content, id } = payload.record  // 生成嵌入向量  const embedding = await model.run(content, {    mean_pool: true,    normalize: true,  })  // 存储到数据库  const { error } = await supabase    .from('embeddings')    .update({ embedding: JSON.stringify(embedding) })    .eq('id', id)  if (error) console.warn(error.message)  return new Response('ok')})

创建数据库函数和RPC调用

现在嵌入向量已存储在您的Postgres数据库表中，您可以通过远程过程调用(RPC)从Supabase边缘函数中查询它们。

给定以下Postgres函数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
-- 使用向量相似性搜索在嵌入向量上匹配文档片段---- 返回setof embeddings以便我们可以使用PostgREST资源嵌入(与其他表连接)-- 可以在此函数调用后链式添加额外的过滤条件如limit等create or replace function query_embeddings(embedding vector(384), match_threshold float)returns setof embeddingslanguage plpgsqlas $$begin  return query  select *  from embeddings  -- 内积为负值，因此我们对match_threshold取负  where embeddings.embedding <#> embedding < -match_threshold  -- 我们的嵌入向量已归一化为长度1，因此余弦相似度  -- 和内积将产生相同的查询结果。  -- 使用计算速度更快的内积方法。  --  -- 关于不同的距离函数，请参阅 https://github.com/pgvector/pgvector  order by embeddings.embedding <#> embedding;end;$$;

在 Supabase 边缘函数中查询向量

您可以使用 supabase-js 首先生成搜索词的嵌入向量，然后直接从您的 Supabase 边缘函数调用 Postgres 函数来从存储的嵌入向量中查找相关结果：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
const model = new Supabase.ai.Session('gte-small')Deno.serve(async (req) => {  const { search } = await req.json()  if (!search) return new Response('请提供搜索参数！')  // 为搜索词生成嵌入向量  const embedding = await model.run(search, {    mean_pool: true,    normalize: true,  })  // 查询嵌入向量  const { data: result, error } = await supabase    .rpc('query_embeddings', {      embedding,      match_threshold: 0.8,    })    .select('content')    .limit(3)  if (error) {    return Response.json(error)  }  return Response.json({ search, result })})

现在您已经搭建了无需任何外部依赖的 AI 语义搜索系统！只需要您、pgvector 和 Supabase 边缘函数！

语义搜索

使用 pgvector 和 Supabase 边缘函数实现语义搜索

创建数据库表和Webhook#

创建数据库函数和RPC调用#

在 Supabase 边缘函数中查询向量#

创建数据库表和Webhook

创建数据库函数和RPC调用

在 Supabase 边缘函数中查询向量