Together AI Embeddings & Reranking Skill for Agents
Skill that teaches Claude Code Together AI's embeddings and reranking API. Covers dense vector generation, semantic search, RAG pipelines, and result reranking patterns.
What it is
This skill teaches Claude Code how to use Together AI's embeddings and reranking API. It covers dense vector generation for semantic search, building RAG (Retrieval-Augmented Generation) pipelines, and reranking search results for better relevance.
The skill targets developers building search and retrieval systems who want to use Together AI's hosted embedding models and reranking endpoints instead of running models locally.
How it saves time or tokens
Together AI provides hosted embedding models that generate dense vectors without managing GPU infrastructure. The reranking API improves search quality by re-scoring initial retrieval results, reducing the number of irrelevant documents passed to the LLM.
Better retrieval means fewer tokens wasted on irrelevant context in RAG pipelines. The reranker filters out noise before the LLM processes the results.
How to use
- Install the Together AI SDK:
pip install together
- Generate embeddings:
from together import Together
client = Together(api_key='your-api-key')
response = client.embeddings.create(
model='togethercomputer/m2-bert-80M-8k-retrieval',
input=['How to build a RAG pipeline', 'What is semantic search?']
)
for embedding in response.data:
print(f'Vector dimension: {len(embedding.embedding)}')
- Use embeddings for semantic search by comparing cosine similarity between query and document vectors.
- Rerank results for better relevance:
response = client.rerank.create(
model='Salesforce/Llama-Rank-V1',
query='best practices for RAG',
documents=['Document about RAG...', 'Document about CSS...', 'Document about retrieval...']
)
Example
# Complete RAG pipeline with Together AI
import numpy as np
from together import Together
client = Together(api_key='your-key')
# Step 1: Embed documents
docs = ['RAG improves LLM accuracy', 'CSS Grid layout tutorial', 'Vector search with FAISS']
doc_embeddings = client.embeddings.create(
model='togethercomputer/m2-bert-80M-8k-retrieval', input=docs
)
# Step 2: Embed query
query_embedding = client.embeddings.create(
model='togethercomputer/m2-bert-80M-8k-retrieval', input=['How does RAG work?']
)
# Step 3: Rerank top results
reranked = client.rerank.create(
model='Salesforce/Llama-Rank-V1',
query='How does RAG work?',
documents=docs
)
Related on TokRepo
- AI Tools for RAG — RAG pipeline tools and components
- AI Tools for Research — AI-powered search and research tools
Common pitfalls
- Not normalizing embeddings before cosine similarity. Some models output unnormalized vectors. Normalize to unit length for correct similarity scores.
- Skipping the reranking step. Initial embedding-based retrieval is fast but approximate. Reranking significantly improves relevance for the top-k results passed to the LLM.
- Using the wrong embedding model for your use case. Retrieval models (m2-bert-retrieval) are optimized for search. Code models are better for code search. Match the model to your domain.
- Failing to review community discussions and changelogs before upgrading. Breaking changes in major versions can disrupt existing workflows. Pin versions in production and test upgrades in staging first.
Frequently Asked Questions
Together AI hosts multiple embedding models including M2-BERT for general retrieval, BGE models for multilingual embeddings, and specialized models for code and scientific text. Check the Together AI documentation for the current model catalog.
Reranking takes an initial set of search results (from embedding similarity) and re-scores them using a more powerful model that considers query-document relevance more deeply. It improves precision by pushing the most relevant results to the top.
Text is converted into dense vectors (embeddings) that capture semantic meaning. Similar texts have similar vectors. To search, embed the query, compute cosine similarity against document vectors, and return the closest matches. This works across paraphrases and synonyms.
Yes. Together AI generates standard dense vectors that work with any vector store: Pinecone, Weaviate, Chroma, Qdrant, Milvus, pgvector, or FAISS. Generate embeddings with Together AI and store them in your preferred vector database.
The skill teaches Claude Code the correct API patterns for Together AI's embeddings and reranking endpoints. When you ask Claude Code to build a semantic search feature or RAG pipeline using Together AI, it generates correct code based on the skill's patterns.
Citations (3)
- Together AI Documentation— Together AI embeddings and reranking API
- RAG Paper (arXiv)— Dense retrieval and reranking for RAG
- Together AI Models— Embedding models for semantic search
Related on TokRepo
Source & Thanks
Part of togethercomputer/skills — MIT licensed.
Discussion
Related Assets
Claude-Flow — Multi-Agent Orchestration for Claude Code
Layers swarm and hive-mind multi-agent orchestration on top of Claude Code with 64 specialized agents, SQLite memory, and parallel execution.
SuperClaude — Workflow Framework for Claude Code
Adds 16+ slash commands, 9 cognitive personas, and a smart flag system to Claude Code in one pipx install.
Claudia — Tauri Desktop GUI for Claude Code
Open-source Tauri/Rust desktop app for managing Claude Code sessions, custom agents, sandboxed execution, and checkpoints.