Together AI Embeddings & Reranking Skill for Agents
Skill that teaches Claude Code Together AI's embeddings and reranking API. Covers dense vector generation, semantic search, RAG pipelines, and result reranking patterns.
Agent 可直接安装
这个资产可安装;Agent 先选择当前运行时、检查安装计划,再运行匹配命令。
npx -y tokrepo@latest install da3bf81c-8928-41ba-b5c4-457355af582d --target codex先 dry-run 确认安装计划,再运行此命令。
What it is
This skill teaches Claude Code how to use Together AI's embeddings and reranking API. It covers dense vector generation for semantic search, building RAG (Retrieval-Augmented Generation) pipelines, and reranking search results for better relevance.
The skill targets developers building search and retrieval systems who want to use Together AI's hosted embedding models and reranking endpoints instead of running models locally.
How it saves time or tokens
Together AI provides hosted embedding models that generate dense vectors without managing GPU infrastructure. The reranking API improves search quality by re-scoring initial retrieval results, reducing the number of irrelevant documents passed to the LLM.
Better retrieval means fewer tokens wasted on irrelevant context in RAG pipelines. The reranker filters out noise before the LLM processes the results.
How to use
- Install the Together AI SDK:
pip install together
- Generate embeddings:
from together import Together
client = Together(api_key='your-api-key')
response = client.embeddings.create(
model='togethercomputer/m2-bert-80M-8k-retrieval',
input=['How to build a RAG pipeline', 'What is semantic search?']
)
for embedding in response.data:
print(f'Vector dimension: {len(embedding.embedding)}')
- Use embeddings for semantic search by comparing cosine similarity between query and document vectors.
- Rerank results for better relevance:
response = client.rerank.create(
model='Salesforce/Llama-Rank-V1',
query='best practices for RAG',
documents=['Document about RAG...', 'Document about CSS...', 'Document about retrieval...']
)
Example
# Complete RAG pipeline with Together AI
import numpy as np
from together import Together
client = Together(api_key='your-key')
# Step 1: Embed documents
docs = ['RAG improves LLM accuracy', 'CSS Grid layout tutorial', 'Vector search with FAISS']
doc_embeddings = client.embeddings.create(
model='togethercomputer/m2-bert-80M-8k-retrieval', input=docs
)
# Step 2: Embed query
query_embedding = client.embeddings.create(
model='togethercomputer/m2-bert-80M-8k-retrieval', input=['How does RAG work?']
)
# Step 3: Rerank top results
reranked = client.rerank.create(
model='Salesforce/Llama-Rank-V1',
query='How does RAG work?',
documents=docs
)
Related on TokRepo
- AI Tools for RAG — RAG pipeline tools and components
- AI Tools for Research — AI-powered search and research tools
Common pitfalls
- Not normalizing embeddings before cosine similarity. Some models output unnormalized vectors. Normalize to unit length for correct similarity scores.
- Skipping the reranking step. Initial embedding-based retrieval is fast but approximate. Reranking significantly improves relevance for the top-k results passed to the LLM.
- Using the wrong embedding model for your use case. Retrieval models (m2-bert-retrieval) are optimized for search. Code models are better for code search. Match the model to your domain.
- Failing to review community discussions and changelogs before upgrading. Breaking changes in major versions can disrupt existing workflows. Pin versions in production and test upgrades in staging first.
常见问题
Together AI hosts multiple embedding models including M2-BERT for general retrieval, BGE models for multilingual embeddings, and specialized models for code and scientific text. Check the Together AI documentation for the current model catalog.
Reranking takes an initial set of search results (from embedding similarity) and re-scores them using a more powerful model that considers query-document relevance more deeply. It improves precision by pushing the most relevant results to the top.
Text is converted into dense vectors (embeddings) that capture semantic meaning. Similar texts have similar vectors. To search, embed the query, compute cosine similarity against document vectors, and return the closest matches. This works across paraphrases and synonyms.
Yes. Together AI generates standard dense vectors that work with any vector store: Pinecone, Weaviate, Chroma, Qdrant, Milvus, pgvector, or FAISS. Generate embeddings with Together AI and store them in your preferred vector database.
The skill teaches Claude Code the correct API patterns for Together AI's embeddings and reranking endpoints. When you ask Claude Code to build a semantic search feature or RAG pipeline using Together AI, it generates correct code based on the skill's patterns.
引用来源 (3)
- Together AI Documentation— Together AI embeddings and reranking API
- RAG Paper (arXiv)— Dense retrieval and reranking for RAG
- Together AI Models— Embedding models for semantic search
来源与感谢
togethercomputer/skills — MIT
讨论
相关资产
Together AI Dedicated Containers Skill for Agents
Skill that teaches Claude Code Together AI's container deployment API. Run custom Docker inference workers on managed GPU infrastructure with full environment control.
Together AI Dedicated Endpoints Skill for Agents
Skill that teaches Claude Code Together AI's dedicated endpoints API. Deploy single-tenant GPU inference with autoscaling, no rate limits, and custom model configurations.
Together AI Image Generation Skill for Claude Code
Skill that teaches Claude Code Together AI's image generation API. Covers FLUX and Kontext models for text-to-image, image editing, and style transfer with correct parameters.
Together AI Video Generation Skill for Claude Code
Skill that teaches Claude Code Together AI's video generation API. Covers text-to-video, image-to-video, and keyframe control for AI-powered video creation workflows.