SkillsApr 8, 2026·1 min read

Together AI Embeddings & Reranking Skill for Agents

Skill that teaches Claude Code Together AI's embeddings and reranking API. Covers dense vector generation, semantic search, RAG pipelines, and result reranking patterns.

TL;DR
Claude Code skill covering Together AI's embeddings and reranking API for semantic search, RAG pipelines, and result reranking patterns.
§01

What it is

This skill teaches Claude Code how to use Together AI's embeddings and reranking API. It covers dense vector generation for semantic search, building RAG (Retrieval-Augmented Generation) pipelines, and reranking search results for better relevance.

The skill targets developers building search and retrieval systems who want to use Together AI's hosted embedding models and reranking endpoints instead of running models locally.

§02

How it saves time or tokens

Together AI provides hosted embedding models that generate dense vectors without managing GPU infrastructure. The reranking API improves search quality by re-scoring initial retrieval results, reducing the number of irrelevant documents passed to the LLM.

Better retrieval means fewer tokens wasted on irrelevant context in RAG pipelines. The reranker filters out noise before the LLM processes the results.

§03

How to use

  1. Install the Together AI SDK:
pip install together
  1. Generate embeddings:
from together import Together

client = Together(api_key='your-api-key')

response = client.embeddings.create(
    model='togethercomputer/m2-bert-80M-8k-retrieval',
    input=['How to build a RAG pipeline', 'What is semantic search?']
)

for embedding in response.data:
    print(f'Vector dimension: {len(embedding.embedding)}')
  1. Use embeddings for semantic search by comparing cosine similarity between query and document vectors.
  1. Rerank results for better relevance:
response = client.rerank.create(
    model='Salesforce/Llama-Rank-V1',
    query='best practices for RAG',
    documents=['Document about RAG...', 'Document about CSS...', 'Document about retrieval...']
)
§04

Example

# Complete RAG pipeline with Together AI
import numpy as np
from together import Together

client = Together(api_key='your-key')

# Step 1: Embed documents
docs = ['RAG improves LLM accuracy', 'CSS Grid layout tutorial', 'Vector search with FAISS']
doc_embeddings = client.embeddings.create(
    model='togethercomputer/m2-bert-80M-8k-retrieval', input=docs
)

# Step 2: Embed query
query_embedding = client.embeddings.create(
    model='togethercomputer/m2-bert-80M-8k-retrieval', input=['How does RAG work?']
)

# Step 3: Rerank top results
reranked = client.rerank.create(
    model='Salesforce/Llama-Rank-V1',
    query='How does RAG work?',
    documents=docs
)
§05

Related on TokRepo

§06

Common pitfalls

  • Not normalizing embeddings before cosine similarity. Some models output unnormalized vectors. Normalize to unit length for correct similarity scores.
  • Skipping the reranking step. Initial embedding-based retrieval is fast but approximate. Reranking significantly improves relevance for the top-k results passed to the LLM.
  • Using the wrong embedding model for your use case. Retrieval models (m2-bert-retrieval) are optimized for search. Code models are better for code search. Match the model to your domain.
  • Failing to review community discussions and changelogs before upgrading. Breaking changes in major versions can disrupt existing workflows. Pin versions in production and test upgrades in staging first.

Frequently Asked Questions

What embedding models does Together AI offer?+

Together AI hosts multiple embedding models including M2-BERT for general retrieval, BGE models for multilingual embeddings, and specialized models for code and scientific text. Check the Together AI documentation for the current model catalog.

What is reranking?+

Reranking takes an initial set of search results (from embedding similarity) and re-scores them using a more powerful model that considers query-document relevance more deeply. It improves precision by pushing the most relevant results to the top.

How do embeddings work for semantic search?+

Text is converted into dense vectors (embeddings) that capture semantic meaning. Similar texts have similar vectors. To search, embed the query, compute cosine similarity against document vectors, and return the closest matches. This works across paraphrases and synonyms.

Can I use Together AI embeddings with any vector store?+

Yes. Together AI generates standard dense vectors that work with any vector store: Pinecone, Weaviate, Chroma, Qdrant, Milvus, pgvector, or FAISS. Generate embeddings with Together AI and store them in your preferred vector database.

How does this skill help Claude Code?+

The skill teaches Claude Code the correct API patterns for Together AI's embeddings and reranking endpoints. When you ask Claude Code to build a semantic search feature or RAG pipeline using Together AI, it generates correct code based on the skill's patterns.

Citations (3)
🙏

Source & Thanks

Part of togethercomputer/skills — MIT licensed.

Discussion

Sign in to join the discussion.
No comments yet. Be the first to share your thoughts.

Related Assets