What is Pinecone Inference — Hosted Embeddings & Reranking API?

Pinecone Inference is a managed embedding + reranking endpoint. Use llama-text-embed-v2 or other models without managing GPU infrastructure.

Is Pinecone Inference — Hosted Embeddings & Reranking API free to use?

Yes. Pinecone Inference — Hosted Embeddings & Reranking API is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Pinecone Inference — Hosted Embeddings & Reranking API?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Pinecone Inference — Hosted Embeddings & Reranking API

Name: Pinecone Inference — Hosted Embeddings & Reranking API
Author: Pinecone

简介

Pinecone Inference 是托管的 embedding + 重排层，跟 Pinecone 向量索引互补。用 llama-text-embed-v2、multilingual-e5、或可插拔第三方模型生成 embedding，不用自己跑 GPU。Reranking 端点用 bge-reranker 给候选文档打分，提高 RAG 准度。适合在用 Pinecone 又不想自己跑 embedding 服务的人。兼容 Pinecone Python / TypeScript SDK + REST API。装机时间 2 分钟。

生成 embedding

from pinecone import Pinecone

pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

embeddings = pc.inference.embed(
    model="llama-text-embed-v2",
    inputs=[
        "Pinecone is a managed vector database",
        "Weaviate is also a vector database",
    ],
    parameters={"input_type": "passage", "truncate": "END"},
)

# 直接喂 Pinecone 索引
index = pc.Index("my-index")
index.upsert(vectors=[
    {"id": "doc1", "values": embeddings[0].values, "metadata": {"text": "..."}},
    {"id": "doc2", "values": embeddings[1].values, "metadata": {"text": "..."}},
])

一步完成 embed + 查询

# embed query
query_emb = pc.inference.embed(
    model="llama-text-embed-v2",
    inputs=["What is a managed vector database?"],
    parameters={"input_type": "query"},
)

# 搜索索引
results = index.query(
    vector=query_emb[0].values,
    top_k=10,
    include_metadata=True,
)

重排候选文档

reranked = pc.inference.rerank(
    model="bge-reranker-v2-m3",
    query="What is a managed vector database?",
    documents=[r.metadata["text"] for r in results.matches],
    top_n=5,
    return_documents=True,
)

# reranked.data 含 top 5 最相关，分数 0-1
for r in reranked.data:
    print(r.score, r.document.text)

为啥用 Inference 不自己跑 embedding 服务

不用管 GPU —— Pinecone 托管 + 扩展模型
跟索引一样的 SDK（不用额外鉴权 / 计费）
Inference 包含在 Pinecone Standard / Enterprise 套餐里
延迟为配 Pinecone 索引优化（同一网络）

FAQ

Q: Pinecone Inference 免费吗？ A: 有免费档（每月 2K embedding）。超出按量付费打包进 Pinecone Standard 套餐。测试免费，随索引使用量扩展。

Q: 有哪些模型？ A: llama-text-embed-v2（1024 维）、multilingual-e5-large、pinecone-sparse-english-v0、bge-reranker-v2-m3（重排）。Pinecone 定期加新模型 —— 看官方 docs 获取最新列表。

Q: 不用 Pinecone 索引能用 Inference 吗？ A: 能 —— Inference 是独立 API。生成 embedding 后存任何地方（Postgres pgvector / 你自己的 DB）。捆绑用法（同一个 Pinecone 账号 embed + 索引）只是便利。

Pinecone Inference — Hosted Embeddings & Reranking API

这个资产可以被 Agent 直接读取和安装

简介

生成 embedding

一步完成 embed + 查询

重排候选文档

为啥用 Inference 不自己跑 embedding 服务

FAQ

来源与感谢

讨论

相关资产

Text Embeddings Inference — High-Performance Embedding Server by Hugging Face

Pinecone — Managed Vector Database for Production AI

Pinecone Assistant — Managed RAG Service with Auto-Indexing

KoboldCpp — Single-File Local LLM Inference Engine