Quick Use
- Sign up at app.pinecone.io → copy API key
pip install "pinecone[assistant]"pc.assistant.create_assistant(...), upload files, callassistant.chat(messages=...)
Intro
Pinecone Assistant is the fully managed RAG product — upload PDFs, Word docs, or text, and get a chat endpoint that answers with citations. Pinecone handles chunking, embedding, retrieval, prompt construction, and citation rendering. Best for: teams who want RAG over their docs without building chunking + embedding + prompt-construction layers themselves. Works with: Pinecone Python / Node SDK, REST API, Pinecone Console. Setup time: 5 minutes.
Create an assistant + upload files
from pinecone import Pinecone
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
# Create
assistant = pc.assistant.create_assistant(
assistant_name="acme-docs",
instructions="You are an Acme product support assistant. Cite sources.",
)
# Upload files
assistant.upload_file(file_path="./manual.pdf")
assistant.upload_file(file_path="./faq.md")
assistant.upload_file(file_path="./troubleshooting.docx")Pinecone chunks each document, embeds the chunks, stores them in a hidden vector index, and indexes the metadata.
Chat with citations
from pinecone_plugins.assistant.models.chat import Message
messages = [Message(role="user", content="How do I reset the device?")]
response = assistant.chat(messages=messages, model="claude-3-5-sonnet")
print(response.message.content)
# "To reset the device, hold the power button for 10 seconds [1]. After the
# light blinks blue, release. The device will return to factory settings [2]."
for citation in response.citations:
print(citation.references[0].file.name, citation.references[0].pages)
# manual.pdf [page 12]
# manual.pdf [page 13]Streaming responses
for chunk in assistant.chat_stream(messages=messages):
print(chunk.message.content, end="", flush=True)Filter retrieval by metadata
# Tag files at upload
assistant.upload_file(
file_path="./internal-only.pdf",
metadata={"audience": "internal", "version": "2.0"},
)
# Filter at query time
response = assistant.chat(
messages=messages,
filter={"audience": {"$eq": "public"}},
)When to use Assistant vs roll-your-own
| Use Assistant | Roll your own |
|---|---|
| Want RAG working in 1 hour | Need full control of chunking strategy |
| OK with Pinecone's chunking | Specialized doc types (legal, medical) |
| Few hundred MB of docs | TB-scale corpora |
| Need cited answers out of the box | Custom prompt + citation format |
FAQ
Q: Is Pinecone Assistant free? A: There's a free tier (2 assistants, limited queries). Paid plans bundle more queries and storage. Underlying LLM (Claude / GPT) costs are billed by Pinecone with a small markup over direct usage.
Q: Which LLMs can the Assistant use?
A: GPT-4o, Claude 3.5 Sonnet, and other models Pinecone keeps adding. You pick at chat time via model=. Pinecone handles the API key + routing.
Q: How does this differ from a custom RAG with Pinecone Index?
A: Custom RAG: you build chunking, embedding, retrieval, prompt construction, citations. Assistant: Pinecone builds them and exposes a single chat() endpoint. For 80% of use cases, Assistant is faster to ship; for the long tail of custom needs, build it yourself.
Source & Thanks
Built by Pinecone. Commercial product with free tier.
docs.pinecone.io/assistant — Assistant docs