How do I install Pinecone Assistant — Managed RAG Service with Auto-Indexing?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

Pinecone Assistant — Managed RAG Service with Auto-Indexing

Name: Pinecone Assistant — Managed RAG Service with Auto-Indexing
Author: Pinecone

from pinecone import Pinecone pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"]) # Create assistant = pc.assistant.create_assistant( assistant_name="acme-docs", instructions="You are an Acme product support assistant. Cite sources.", ) # Upload files assistant.upload_file(file_path="./manual.pdf") assistant.upload_file(file_path="./faq.md") assistant.upload_file(file_path="./troubleshooting.docx")

from pinecone_plugins.assistant.models.chat import Message messages = [Message(role="user", content="How do I reset the device?")] response = assistant.chat(messages=messages, model="claude-3-5-sonnet") print(response.message.content) # "To reset the device, hold the power button for 10 seconds [1]. After the # light blinks blue, release. The device will return to factory settings [2]." for citation in response.citations: print(citation.references[0].file.name, citation.references[0].pages) # manual.pdf [page 12] # manual.pdf [page 13]

# Tag files at upload assistant.upload_file( file_path="./internal-only.pdf", metadata={"audience": "internal", "version": "2.0"}, ) # Filter at query time response = assistant.chat( messages=messages, filter={"audience": {"$eq": "public"}}, )

Use Assistant

Roll your own

Want RAG working in 1 hour

Need full control of chunking strategy

OK with Pinecone's chunking

Specialized doc types (legal, medical)

Few hundred MB of docs

TB-scale corpora

Need cited answers out of the box

Custom prompt + citation format

Quick Use

Sign up at app.pinecone.io → copy API key
pip install "pinecone[assistant]"
pc.assistant.create_assistant(...), upload files, call assistant.chat(messages=...)

Intro

Pinecone Assistant is the fully managed RAG product — upload PDFs, Word docs, or text, and get a chat endpoint that answers with citations. Pinecone handles chunking, embedding, retrieval, prompt construction, and citation rendering. Best for: teams who want RAG over their docs without building chunking + embedding + prompt-construction layers themselves. Works with: Pinecone Python / Node SDK, REST API, Pinecone Console. Setup time: 5 minutes.

Create an assistant + upload files

from pinecone import Pinecone

pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

# Create
assistant = pc.assistant.create_assistant(
    assistant_name="acme-docs",
    instructions="You are an Acme product support assistant. Cite sources.",
)

# Upload files
assistant.upload_file(file_path="./manual.pdf")
assistant.upload_file(file_path="./faq.md")
assistant.upload_file(file_path="./troubleshooting.docx")

Pinecone chunks each document, embeds the chunks, stores them in a hidden vector index, and indexes the metadata.

Chat with citations

from pinecone_plugins.assistant.models.chat import Message

messages = [Message(role="user", content="How do I reset the device?")]
response = assistant.chat(messages=messages, model="claude-3-5-sonnet")

print(response.message.content)
# "To reset the device, hold the power button for 10 seconds [1]. After the
#  light blinks blue, release. The device will return to factory settings [2]."

for citation in response.citations:
    print(citation.references[0].file.name, citation.references[0].pages)
# manual.pdf [page 12]
# manual.pdf [page 13]

Streaming responses

for chunk in assistant.chat_stream(messages=messages):
    print(chunk.message.content, end="", flush=True)

Filter retrieval by metadata

# Tag files at upload
assistant.upload_file(
    file_path="./internal-only.pdf",
    metadata={"audience": "internal", "version": "2.0"},
)

# Filter at query time
response = assistant.chat(
    messages=messages,
    filter={"audience": {"$eq": "public"}},
)

When to use Assistant vs roll-your-own

Use Assistant	Roll your own
Want RAG working in 1 hour	Need full control of chunking strategy
OK with Pinecone's chunking	Specialized doc types (legal, medical)
Few hundred MB of docs	TB-scale corpora
Need cited answers out of the box	Custom prompt + citation format

FAQ

Q: Is Pinecone Assistant free? A: There's a free tier (2 assistants, limited queries). Paid plans bundle more queries and storage. Underlying LLM (Claude / GPT) costs are billed by Pinecone with a small markup over direct usage.

Q: Which LLMs can the Assistant use? A: GPT-4o, Claude 3.5 Sonnet, and other models Pinecone keeps adding. You pick at chat time via model=. Pinecone handles the API key + routing.

Q: How does this differ from a custom RAG with Pinecone Index? A: Custom RAG: you build chunking, embedding, retrieval, prompt construction, citations. Assistant: Pinecone builds them and exposes a single chat() endpoint. For 80% of use cases, Assistant is faster to ship; for the long tail of custom needs, build it yourself.

Source & Thanks

Built by Pinecone. Commercial product with free tier.

docs.pinecone.io/assistant — Assistant docs

Pinecone Assistant — Managed RAG Service with Auto-Indexing

Este activo puede ser leído e instalado directamente por agents

Create an assistant + upload files

Chat with citations

Streaming responses

Filter retrieval by metadata

When to use Assistant vs roll-your-own

FAQ

Quick Use

Intro

Create an assistant + upload files

Chat with citations

Streaming responses

Filter retrieval by metadata

When to use Assistant vs roll-your-own

FAQ

Source & Thanks

Fuente y agradecimientos

Discusión

Activos relacionados

Pinecone — Managed Vector Database for Production AI

Pinecone Inference — Hosted Embeddings & Reranking API

Cohere Rerank — Boost RAG Accuracy with Rerank-3

OpenRouter Auto Routing — Pick the Best Model per Query