语义缓存 - 使用Higress - paulwong

语义缓存 - 使用Higress

与大模型的对话，如果之后其他人非当前用户的问题如果与之前的用户问的问题类似，可迅速从缓存中取出，无需再走LLM。

使用ai网关Higress，此动作在服务端Higress中完成，客户端无需任何代码。

在milvus的vector db中新加collection，名称：ai_higress_cache，和以下字段：

Field, Type, Index Name, Index Type, Index Parameters
id,auto id, Int64
vector, FloatVector(4096), vector, metric_type:COSINE
question, VarChar(5000)
answer, VarChar(5000)
#这三个字段vector, question, answer是必需的，且名字不能改

前期需要配置做embedding的服务，VECTOR DB的服务，均可在服务来源中完成。

在“ai路由管理”中，点击某个路由的策略，点击配置，输入以下yaml配置

embedding:
  apiKey: "sk-xxxxxxx"
  model: "nvidia/llama-embed-nemotron-8b"
  path: "/v1/embeddings"
  serviceName: "llm-vllm-nvidia--llama-embed-nemotron-8b.internal.static"
  servicePort: 80
  type: "openai"
vector:
  apiKey: "empty-key"
  collectionID: "ai_higress_cache"
  serviceName: "my-milvus.static"
  servicePort: 80
  type: "milvus"
cacheKeyFrom: "messages.@reverse.0.content"
cacheKeyPrefix: "openai_gpt_oss_20b_"
cacheStreamValueFrom: "choices.0.delta.content"
cacheValueFrom: "choices.0.message.content"
returnResponseTemplate: |
  {"id":"from-cache","choices":[{"index":0,"message":{"role":"assistant","content":"%s"},"finish_reason":"stop"}],"model":"gpt-4o","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
returnStreamResponseTemplate: |-
  data:{"id":"from-cache","choices":[{"index":0,"delta":{"role":"assistant","content":"%s"},"finish_reason":"stop"}],"model":"gpt-4o","object":"chat.completion","usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
  data:[DONE]

参照：
https://higress.ai/docs/latest/user/plugins/ai/api-provider/ai-cache/

posted on 2026-03-11 18:07 paulwong 阅读(52) 评论(0) 编辑收藏所属分类: AI-LLM

新用户注册刷新评论列表


只有注册用户登录后才能发表评论。




网站导航: 博客园博客园最新博文博问管理
相关文章: 语义缓存 - 使用Higress AI 模型广场 Open WebUI + N8N 流式输出支持 A 股、港股！AI 投资炒股「智能体」开源，太绝了。保险核保系统设计百炼大模型支持深度思考创建数据集的资源足球数据资源大模型训练的几个阶段大模型微调后的评估指标

paulwong

My Links

Blog Stats

常用链接

留言簿(68)

随笔分类(1445)

随笔档案(1194)

文章分类(7)

文章档案(10)

相册

收藏夹(2)

AI

Develop

E-BOOK

Other

养生

微服务

搜索

最新评论

阅读排行榜

评论排行榜

60天内阅读排行

语义缓存 - 使用Higress