Pinecone — Managed Vector Database for Production AI
Fully managed vector database for production AI search. Pinecone offers serverless scaling, hybrid search, metadata filtering, and enterprise security with zero infrastructure.
Ready-to-run agent install
This asset can be installed after the agent chooses its runtime, checks the plan, and runs the matching command.
npx -y tokrepo@latest install 0fc5f7e8-439d-414f-bdaf-b09e05e1af49 --target codexRun after dry-run confirms the install plan.
What it is
Pinecone is a fully managed vector database built for production AI applications. It stores vector embeddings and provides fast similarity search for use cases like semantic search, recommendation systems, and retrieval-augmented generation (RAG). Pinecone handles scaling, indexing, and infrastructure so you focus on your application logic.
Pinecone is designed for AI engineers and product teams building search, recommendation, or RAG features who need a production-ready vector store without managing infrastructure.
How it saves time or tokens
Self-hosting a vector database (Milvus, Weaviate, Qdrant) requires provisioning servers, managing indexes, tuning performance, and handling scaling. Pinecone eliminates all operational overhead. You create an index, upsert vectors, and query, all through a simple SDK. The serverless architecture scales automatically based on usage, and you pay only for what you store and query. For RAG applications, Pinecone's low-latency retrieval means you can fetch relevant context quickly, reducing the need for large context windows.
How to use
- Install the Pinecone SDK:
pip install pinecone
- Create a serverless index and upsert vectors:
from pinecone import Pinecone, ServerlessSpec
pc = Pinecone(api_key='your-api-key')
pc.create_index(
name='docs',
dimension=1536,
metric='cosine',
spec=ServerlessSpec(cloud='aws', region='us-east-1'),
)
index = pc.Index('docs')
index.upsert(vectors=[
('doc-1', [0.1, 0.2, ...], {'source': 'readme', 'topic': 'setup'}),
('doc-2', [0.3, 0.4, ...], {'source': 'api-docs', 'topic': 'auth'}),
])
- Query with metadata filtering:
results = index.query(
vector=[0.1, 0.2, ...],
top_k=5,
filter={'topic': {'$eq': 'auth'}},
include_metadata=True,
)
Example
A RAG pipeline using Pinecone for context retrieval:
from openai import OpenAI
from pinecone import Pinecone
openai = OpenAI()
pc = Pinecone(api_key='...')
index = pc.Index('knowledge-base')
def ask(question: str) -> str:
# Embed the question
embedding = openai.embeddings.create(
input=question, model='text-embedding-3-small'
).data[0].embedding
# Retrieve relevant context
results = index.query(vector=embedding, top_k=3, include_metadata=True)
context = '\n'.join([m['metadata']['text'] for m in results['matches']])
# Generate answer with context
response = openai.chat.completions.create(
model='gpt-4',
messages=[
{'role': 'system', 'content': f'Answer using this context:\n{context}'},
{'role': 'user', 'content': question},
],
)
return response.choices[0].message.content
Related on TokRepo
- RAG tools — Browse retrieval-augmented generation tools
- Database tools — Explore database solutions for AI
Common pitfalls
- Using the wrong embedding dimension. Your index dimension must match the output dimension of your embedding model. OpenAI text-embedding-3-small produces 1536 dimensions; other models differ.
- Not using metadata filtering for hybrid search. Pinecone supports filtering by metadata fields alongside vector similarity. Without filters, you get pure similarity results which may include irrelevant matches.
- Creating too many indexes instead of using namespaces. Pinecone namespaces let you partition data within a single index, which is more cost-effective than creating separate indexes for each data source.
Frequently Asked Questions
Pinecone serverless charges based on storage (per GB), reads (per million queries), and writes (per million upserts). There is a free tier for small projects. Pricing scales with usage, so you pay proportionally to your application's demand.
Yes. Pinecone supports real-time upserts and deletes. New vectors are searchable within seconds of being upserted. This makes it suitable for applications where the knowledge base changes frequently.
Hybrid search combines vector similarity with metadata filtering. You query by vector similarity and simultaneously filter results by metadata fields (like category, date, or source). This produces more relevant results than pure vector search.
Yes. Pinecone namespaces provide logical isolation within a single index. Each tenant's data lives in a separate namespace, and queries are scoped to a namespace. This is the recommended approach for multi-tenant applications.
Pinecone eliminates operational overhead (scaling, indexing, backups) at the cost of vendor dependency and per-query pricing. Self-hosted options like Qdrant or Milvus give you more control and can be cheaper at scale, but require infrastructure management.
Citations (3)
- Pinecone Documentation— Pinecone is a managed vector database
- Pinecone Serverless— Serverless vector database architecture
- Pinecone Filtering Docs— Hybrid search with metadata filtering
Related on TokRepo
Source & Thanks
Created by Pinecone.
pinecone.io — Managed vector database
Discussion
Related Assets
Weaviate — Open-Source Vector Database at Scale
Weaviate is an open-source vector database for semantic search at scale. 15.9K+ GitHub stars. Hybrid search (vector + BM25), built-in RAG, reranking, multi-tenancy, and horizontal scaling. BSD 3-Claus
Turbopuffer — Serverless Vector DB for AI Search
Serverless vector database built for AI search at scale. Turbopuffer offers sub-millisecond queries, automatic scaling, and pay-per-query pricing with zero infrastructure.
Qdrant — High-Performance Vector Database
Vector database and search engine for AI applications. Handles billion-scale similarity search with filtering, sparse vectors, and multi-tenancy. Rust-powered. 30K+ stars.
Milvus — Cloud-Native Vector Database at Scale
Milvus is a high-performance cloud-native vector database for scalable AI search. 43.5K+ GitHub stars. Hybrid search (dense + sparse + full-text), GPU-accelerated indexing, multi-tenancy, distributed