Esta página se muestra en inglés. Una traducción al español está en curso.
WorkflowsMay 7, 2026·4 min de lectura

Pinecone Assistant — Managed RAG Service with Auto-Indexing

Pinecone Assistant is the fully managed RAG product on Pinecone. Upload PDFs, query with natural language, get cited answers — no chunking pipeline.

Listo para agents

Este activo puede ser leído e instalado directamente por agents

TokRepo expone un comando CLI universal, contrato de instalación, metadata JSON, plan según adaptador y contenido raw para que los agents evalúen compatibilidad, riesgo y próximos pasos.

Stage only · 17/100Stage only
Superficie agent
Cualquier agent MCP/CLI
Tipo
Skill
Instalación
Stage only
Confianza
Confianza: New
Entrada
Asset
Comando CLI universal
npx tokrepo install 63b22f3a-181d-4032-bfa8-3be176e193df
Introducción

Pinecone Assistant is the fully managed RAG product — upload PDFs, Word docs, or text, and get a chat endpoint that answers with citations. Pinecone handles chunking, embedding, retrieval, prompt construction, and citation rendering. Best for: teams who want RAG over their docs without building chunking + embedding + prompt-construction layers themselves. Works with: Pinecone Python / Node SDK, REST API, Pinecone Console. Setup time: 5 minutes.


Create an assistant + upload files

from pinecone import Pinecone

pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

# Create
assistant = pc.assistant.create_assistant(
    assistant_name="acme-docs",
    instructions="You are an Acme product support assistant. Cite sources.",
)

# Upload files
assistant.upload_file(file_path="./manual.pdf")
assistant.upload_file(file_path="./faq.md")
assistant.upload_file(file_path="./troubleshooting.docx")

Pinecone chunks each document, embeds the chunks, stores them in a hidden vector index, and indexes the metadata.

Chat with citations

from pinecone_plugins.assistant.models.chat import Message

messages = [Message(role="user", content="How do I reset the device?")]
response = assistant.chat(messages=messages, model="claude-3-5-sonnet")

print(response.message.content)
# "To reset the device, hold the power button for 10 seconds [1]. After the
#  light blinks blue, release. The device will return to factory settings [2]."

for citation in response.citations:
    print(citation.references[0].file.name, citation.references[0].pages)
# manual.pdf [page 12]
# manual.pdf [page 13]

Streaming responses

for chunk in assistant.chat_stream(messages=messages):
    print(chunk.message.content, end="", flush=True)

Filter retrieval by metadata

# Tag files at upload
assistant.upload_file(
    file_path="./internal-only.pdf",
    metadata={"audience": "internal", "version": "2.0"},
)

# Filter at query time
response = assistant.chat(
    messages=messages,
    filter={"audience": {"$eq": "public"}},
)

When to use Assistant vs roll-your-own

Use Assistant Roll your own
Want RAG working in 1 hour Need full control of chunking strategy
OK with Pinecone's chunking Specialized doc types (legal, medical)
Few hundred MB of docs TB-scale corpora
Need cited answers out of the box Custom prompt + citation format

FAQ

Q: Is Pinecone Assistant free? A: There's a free tier (2 assistants, limited queries). Paid plans bundle more queries and storage. Underlying LLM (Claude / GPT) costs are billed by Pinecone with a small markup over direct usage.

Q: Which LLMs can the Assistant use? A: GPT-4o, Claude 3.5 Sonnet, and other models Pinecone keeps adding. You pick at chat time via model=. Pinecone handles the API key + routing.

Q: How does this differ from a custom RAG with Pinecone Index? A: Custom RAG: you build chunking, embedding, retrieval, prompt construction, citations. Assistant: Pinecone builds them and exposes a single chat() endpoint. For 80% of use cases, Assistant is faster to ship; for the long tail of custom needs, build it yourself.


Quick Use

  1. Sign up at app.pinecone.io → copy API key
  2. pip install "pinecone[assistant]"
  3. pc.assistant.create_assistant(...), upload files, call assistant.chat(messages=...)

Intro

Pinecone Assistant is the fully managed RAG product — upload PDFs, Word docs, or text, and get a chat endpoint that answers with citations. Pinecone handles chunking, embedding, retrieval, prompt construction, and citation rendering. Best for: teams who want RAG over their docs without building chunking + embedding + prompt-construction layers themselves. Works with: Pinecone Python / Node SDK, REST API, Pinecone Console. Setup time: 5 minutes.


Create an assistant + upload files

from pinecone import Pinecone

pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

# Create
assistant = pc.assistant.create_assistant(
    assistant_name="acme-docs",
    instructions="You are an Acme product support assistant. Cite sources.",
)

# Upload files
assistant.upload_file(file_path="./manual.pdf")
assistant.upload_file(file_path="./faq.md")
assistant.upload_file(file_path="./troubleshooting.docx")

Pinecone chunks each document, embeds the chunks, stores them in a hidden vector index, and indexes the metadata.

Chat with citations

from pinecone_plugins.assistant.models.chat import Message

messages = [Message(role="user", content="How do I reset the device?")]
response = assistant.chat(messages=messages, model="claude-3-5-sonnet")

print(response.message.content)
# "To reset the device, hold the power button for 10 seconds [1]. After the
#  light blinks blue, release. The device will return to factory settings [2]."

for citation in response.citations:
    print(citation.references[0].file.name, citation.references[0].pages)
# manual.pdf [page 12]
# manual.pdf [page 13]

Streaming responses

for chunk in assistant.chat_stream(messages=messages):
    print(chunk.message.content, end="", flush=True)

Filter retrieval by metadata

# Tag files at upload
assistant.upload_file(
    file_path="./internal-only.pdf",
    metadata={"audience": "internal", "version": "2.0"},
)

# Filter at query time
response = assistant.chat(
    messages=messages,
    filter={"audience": {"$eq": "public"}},
)

When to use Assistant vs roll-your-own

Use Assistant Roll your own
Want RAG working in 1 hour Need full control of chunking strategy
OK with Pinecone's chunking Specialized doc types (legal, medical)
Few hundred MB of docs TB-scale corpora
Need cited answers out of the box Custom prompt + citation format

FAQ

Q: Is Pinecone Assistant free? A: There's a free tier (2 assistants, limited queries). Paid plans bundle more queries and storage. Underlying LLM (Claude / GPT) costs are billed by Pinecone with a small markup over direct usage.

Q: Which LLMs can the Assistant use? A: GPT-4o, Claude 3.5 Sonnet, and other models Pinecone keeps adding. You pick at chat time via model=. Pinecone handles the API key + routing.

Q: How does this differ from a custom RAG with Pinecone Index? A: Custom RAG: you build chunking, embedding, retrieval, prompt construction, citations. Assistant: Pinecone builds them and exposes a single chat() endpoint. For 80% of use cases, Assistant is faster to ship; for the long tail of custom needs, build it yourself.


Source & Thanks

Built by Pinecone. Commercial product with free tier.

docs.pinecone.io/assistant — Assistant docs

🙏

Fuente y agradecimientos

Built by Pinecone. Commercial product with free tier.

docs.pinecone.io/assistant — Assistant docs

Discusión

Inicia sesión para unirte a la discusión.
Aún no hay comentarios. Sé el primero en compartir tus ideas.

Activos relacionados