Cette page est affichée en anglais. Une traduction française est en cours.
WorkflowsMay 7, 2026·4 min de lecture

Pinecone Assistant — Managed RAG Service with Auto-Indexing

Pinecone Assistant is the fully managed RAG product on Pinecone. Upload PDFs, query with natural language, get cited answers — no chunking pipeline.

Pinecone
Pinecone · Community
Prêt pour agents

Cet actif peut être lu et installé directement par les agents

TokRepo expose une commande CLI universelle, un contrat d'installation, le metadata JSON, un plan selon l'adaptateur et le contenu raw pour aider les agents à juger l'adaptation, le risque et les prochaines actions.

Stage only · 17/100Stage only
Surface agent
Tout agent MCP/CLI
Type
Skill
Installation
Stage only
Confiance
Confiance : New
Point d'entrée
Asset
Commande CLI universelle
npx tokrepo install 63b22f3a-181d-4032-bfa8-3be176e193df
Introduction

Pinecone Assistant is the fully managed RAG product — upload PDFs, Word docs, or text, and get a chat endpoint that answers with citations. Pinecone handles chunking, embedding, retrieval, prompt construction, and citation rendering. Best for: teams who want RAG over their docs without building chunking + embedding + prompt-construction layers themselves. Works with: Pinecone Python / Node SDK, REST API, Pinecone Console. Setup time: 5 minutes.


Create an assistant + upload files

from pinecone import Pinecone

pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

# Create
assistant = pc.assistant.create_assistant(
    assistant_name="acme-docs",
    instructions="You are an Acme product support assistant. Cite sources.",
)

# Upload files
assistant.upload_file(file_path="./manual.pdf")
assistant.upload_file(file_path="./faq.md")
assistant.upload_file(file_path="./troubleshooting.docx")

Pinecone chunks each document, embeds the chunks, stores them in a hidden vector index, and indexes the metadata.

Chat with citations

from pinecone_plugins.assistant.models.chat import Message

messages = [Message(role="user", content="How do I reset the device?")]
response = assistant.chat(messages=messages, model="claude-3-5-sonnet")

print(response.message.content)
# "To reset the device, hold the power button for 10 seconds [1]. After the
#  light blinks blue, release. The device will return to factory settings [2]."

for citation in response.citations:
    print(citation.references[0].file.name, citation.references[0].pages)
# manual.pdf [page 12]
# manual.pdf [page 13]

Streaming responses

for chunk in assistant.chat_stream(messages=messages):
    print(chunk.message.content, end="", flush=True)

Filter retrieval by metadata

# Tag files at upload
assistant.upload_file(
    file_path="./internal-only.pdf",
    metadata={"audience": "internal", "version": "2.0"},
)

# Filter at query time
response = assistant.chat(
    messages=messages,
    filter={"audience": {"$eq": "public"}},
)

When to use Assistant vs roll-your-own

Use Assistant Roll your own
Want RAG working in 1 hour Need full control of chunking strategy
OK with Pinecone's chunking Specialized doc types (legal, medical)
Few hundred MB of docs TB-scale corpora
Need cited answers out of the box Custom prompt + citation format

FAQ

Q: Is Pinecone Assistant free? A: There's a free tier (2 assistants, limited queries). Paid plans bundle more queries and storage. Underlying LLM (Claude / GPT) costs are billed by Pinecone with a small markup over direct usage.

Q: Which LLMs can the Assistant use? A: GPT-4o, Claude 3.5 Sonnet, and other models Pinecone keeps adding. You pick at chat time via model=. Pinecone handles the API key + routing.

Q: How does this differ from a custom RAG with Pinecone Index? A: Custom RAG: you build chunking, embedding, retrieval, prompt construction, citations. Assistant: Pinecone builds them and exposes a single chat() endpoint. For 80% of use cases, Assistant is faster to ship; for the long tail of custom needs, build it yourself.


Quick Use

  1. Sign up at app.pinecone.io → copy API key
  2. pip install "pinecone[assistant]"
  3. pc.assistant.create_assistant(...), upload files, call assistant.chat(messages=...)

Intro

Pinecone Assistant is the fully managed RAG product — upload PDFs, Word docs, or text, and get a chat endpoint that answers with citations. Pinecone handles chunking, embedding, retrieval, prompt construction, and citation rendering. Best for: teams who want RAG over their docs without building chunking + embedding + prompt-construction layers themselves. Works with: Pinecone Python / Node SDK, REST API, Pinecone Console. Setup time: 5 minutes.


Create an assistant + upload files

from pinecone import Pinecone

pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

# Create
assistant = pc.assistant.create_assistant(
    assistant_name="acme-docs",
    instructions="You are an Acme product support assistant. Cite sources.",
)

# Upload files
assistant.upload_file(file_path="./manual.pdf")
assistant.upload_file(file_path="./faq.md")
assistant.upload_file(file_path="./troubleshooting.docx")

Pinecone chunks each document, embeds the chunks, stores them in a hidden vector index, and indexes the metadata.

Chat with citations

from pinecone_plugins.assistant.models.chat import Message

messages = [Message(role="user", content="How do I reset the device?")]
response = assistant.chat(messages=messages, model="claude-3-5-sonnet")

print(response.message.content)
# "To reset the device, hold the power button for 10 seconds [1]. After the
#  light blinks blue, release. The device will return to factory settings [2]."

for citation in response.citations:
    print(citation.references[0].file.name, citation.references[0].pages)
# manual.pdf [page 12]
# manual.pdf [page 13]

Streaming responses

for chunk in assistant.chat_stream(messages=messages):
    print(chunk.message.content, end="", flush=True)

Filter retrieval by metadata

# Tag files at upload
assistant.upload_file(
    file_path="./internal-only.pdf",
    metadata={"audience": "internal", "version": "2.0"},
)

# Filter at query time
response = assistant.chat(
    messages=messages,
    filter={"audience": {"$eq": "public"}},
)

When to use Assistant vs roll-your-own

Use Assistant Roll your own
Want RAG working in 1 hour Need full control of chunking strategy
OK with Pinecone's chunking Specialized doc types (legal, medical)
Few hundred MB of docs TB-scale corpora
Need cited answers out of the box Custom prompt + citation format

FAQ

Q: Is Pinecone Assistant free? A: There's a free tier (2 assistants, limited queries). Paid plans bundle more queries and storage. Underlying LLM (Claude / GPT) costs are billed by Pinecone with a small markup over direct usage.

Q: Which LLMs can the Assistant use? A: GPT-4o, Claude 3.5 Sonnet, and other models Pinecone keeps adding. You pick at chat time via model=. Pinecone handles the API key + routing.

Q: How does this differ from a custom RAG with Pinecone Index? A: Custom RAG: you build chunking, embedding, retrieval, prompt construction, citations. Assistant: Pinecone builds them and exposes a single chat() endpoint. For 80% of use cases, Assistant is faster to ship; for the long tail of custom needs, build it yourself.


Source & Thanks

Built by Pinecone. Commercial product with free tier.

docs.pinecone.io/assistant — Assistant docs

🙏

Source et remerciements

Built by Pinecone. Commercial product with free tier.

docs.pinecone.io/assistant — Assistant docs

Fil de discussion

Connectez-vous pour rejoindre la discussion.
Aucun commentaire pour l'instant. Soyez le premier à partager votre avis.

Actifs similaires