运行AI模型

如何在边缘函数中运行AI模型

Supabase Edge Runtime 内置了运行AI模型的API。您可以使用该API生成嵌入向量、构建对话式工作流，以及在边缘函数中执行其他AI相关任务。

设置

无需安装任何外部依赖或软件包即可启用该API。

您可以通过以下方式创建新的推理会话：

1
const model = new Supabase.ai.Session('model-name')

要获取API的类型提示和检查，您可以在文件顶部从functions-js导入类型：

1
import 'jsr:@supabase/functions-js/edge-runtime.d.ts'

运行模型推理

会话实例化后，您可以通过输入参数调用它来执行推理。根据运行的模型不同，您可能需要提供不同的选项（下文会讨论）。

1
const output = await model.run(input, options)

如何生成文本嵌入向量

现在让我们看看如何使用Supabase.ai API编写边缘函数来生成文本嵌入向量。目前，Supabase.ai API仅支持gte-small模型。

gte-small模型仅适用于英文文本，且任何长文本都会被截断至最多512个token。虽然您可以提供超过512个token的输入，但截断可能会影响准确性。

1
2
3
4
5
6
7
8
9
10
11
12
13
const model = new Supabase.ai.Session('gte-small')Deno.serve(async (req: Request) => {  const params = new URL(req.url).searchParams  const input = params.get('input')  const output = await model.run(input, { mean_pool: true, normalize: true })  return new Response(JSON.stringify(output), {    headers: {      'Content-Type': 'application/json',      Connection: 'keep-alive',    },  })})

使用大型语言模型(LLM)

通过Ollama和Mozilla Llamafile支持使用大型模型进行推理。在初始版本中，您可以搭配自托管的Ollama或Llamafile服务器使用。我们正在逐步推出托管解决方案的支持。如需申请早期访问权限，请填写此表单。

本地运行

安装 Ollama 并拉取 Mistral 模型

1
ollama pull mistral

本地运行 Ollama 服务

1
ollama serve

设置名为 AI_INFERENCE_API_HOST 的函数密钥指向 Ollama 服务

1
echo "AI_INFERENCE_API_HOST=http://host.docker.internal:11434" >> supabase/functions/.env

创建新函数并添加以下代码

1
supabase functions new ollama-test

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import 'jsr:@supabase/functions-js/edge-runtime.d.ts'const session = new Supabase.ai.Session('mistral')Deno.serve(async (req: Request) => {  const params = new URL(req.url).searchParams  const prompt = params.get('prompt') ?? ''  // 获取流式输出  const output = await session.run(prompt, { stream: true })  const headers = new Headers({    'Content-Type': 'text/event-stream',    Connection: 'keep-alive',  })  // 创建流  const stream = new ReadableStream({    async start(controller) {      const encoder = new TextEncoder()      try {        for await (const chunk of output) {          controller.enqueue(encoder.encode(chunk.response ?? ''))        }      } catch (err) {        console.error('Stream error:', err)      } finally {        controller.close()      }    },  })  // 将流返回给用户  return new Response(stream, {    headers,  })})

运行函数

1
supabase functions serve --env-file supabase/functions/.env

执行函数

1
2
3
curl --get "http://localhost:54321/functions/v1/ollama-test" \--data-urlencode "prompt=write a short rap song about Supabase, the Postgres Developer platform, as sung by Nicki Minaj" \-H "Authorization: $ANON_KEY"

部署到生产环境

当函数在本地运行正常后，就可以部署到生产环境了。

部署一个 Ollama 或 Llamafile 服务器，并设置一个名为 AI_INFERENCE_API_HOST 的函数密钥指向部署的服务器：

1
supabase secrets set AI_INFERENCE_API_HOST=https://path-to-your-llm-server/

部署 Supabase 函数：

1
supabase functions deploy

执行函数：

1
2
3
curl --get "https://project-ref.supabase.co/functions/v1/ollama-test" \ --data-urlencode "prompt=write a short rap song about Supabase, the Postgres Developer platform, as sung by Nicki Minaj" \ -H "Authorization: $ANON_KEY"

如上方视频所示，在本地运行 Ollama 通常比在配备专用 GPU 的服务器上运行要慢。我们正在与 Ollama 团队合作以提升本地性能。

未来，Supabase 平台将提供托管的 LLM API 服务。Supabase 会为您扩展和管理 API 及 GPU 资源。如需申请早期访问权限，请填写此表单。