Is Together AI Embeddings & Reranking Skill for Agents free to use?

Yes. Together AI Embeddings & Reranking Skill for Agents is freely available on TokRepo. Check the Source & Thanks section on the asset page for the specific open-source license.

How do I install Together AI Embeddings & Reranking Skill for Agents?

Visit the asset page on TokRepo and click "Copy for agent" to get the installation instructions. Most assets can be installed with a single command.

SkillsApr 8, 2026·1 min read

Together AI Embeddings & Reranking Skill for Agents

Name: Together AI Embeddings & Reranking Skill for Agents
Author: Prompt Lab

Skill that teaches Claude Code Together AI's embeddings and reranking API. Covers dense vector generation, semantic search, RAG pipelines, and result reranking patterns.

Prompt Lab · Community

TL;DR

Claude Code skill covering Together AI's embeddings and reranking API for semantic search, RAG pipelines, and result reranking patterns.

§01

What it is

This skill teaches Claude Code how to use Together AI's embeddings and reranking API. It covers dense vector generation for semantic search, building RAG (Retrieval-Augmented Generation) pipelines, and reranking search results for better relevance.

The skill targets developers building search and retrieval systems who want to use Together AI's hosted embedding models and reranking endpoints instead of running models locally.

§02

How it saves time or tokens

Together AI provides hosted embedding models that generate dense vectors without managing GPU infrastructure. The reranking API improves search quality by re-scoring initial retrieval results, reducing the number of irrelevant documents passed to the LLM.

Better retrieval means fewer tokens wasted on irrelevant context in RAG pipelines. The reranker filters out noise before the LLM processes the results.

§03

How to use

Install the Together AI SDK:

pip install together

Generate embeddings:

from together import Together

client = Together(api_key='your-api-key')

response = client.embeddings.create(
    model='togethercomputer/m2-bert-80M-8k-retrieval',
    input=['How to build a RAG pipeline', 'What is semantic search?']
)

for embedding in response.data:
    print(f'Vector dimension: {len(embedding.embedding)}')

Use embeddings for semantic search by comparing cosine similarity between query and document vectors.

Rerank results for better relevance:

response = client.rerank.create(
    model='Salesforce/Llama-Rank-V1',
    query='best practices for RAG',
    documents=['Document about RAG...', 'Document about CSS...', 'Document about retrieval...']
)

§04

Example

# Complete RAG pipeline with Together AI
import numpy as np
from together import Together

client = Together(api_key='your-key')

# Step 1: Embed documents
docs = ['RAG improves LLM accuracy', 'CSS Grid layout tutorial', 'Vector search with FAISS']
doc_embeddings = client.embeddings.create(
    model='togethercomputer/m2-bert-80M-8k-retrieval', input=docs
)

# Step 2: Embed query
query_embedding = client.embeddings.create(
    model='togethercomputer/m2-bert-80M-8k-retrieval', input=['How does RAG work?']
)

# Step 3: Rerank top results
reranked = client.rerank.create(
    model='Salesforce/Llama-Rank-V1',
    query='How does RAG work?',
    documents=docs
)

§05

Related on TokRepo

AI Tools for RAG — RAG pipeline tools and components
AI Tools for Research — AI-powered search and research tools

§06

Common pitfalls

Not normalizing embeddings before cosine similarity. Some models output unnormalized vectors. Normalize to unit length for correct similarity scores.
Skipping the reranking step. Initial embedding-based retrieval is fast but approximate. Reranking significantly improves relevance for the top-k results passed to the LLM.
Using the wrong embedding model for your use case. Retrieval models (m2-bert-retrieval) are optimized for search. Code models are better for code search. Match the model to your domain.
Failing to review community discussions and changelogs before upgrading. Breaking changes in major versions can disrupt existing workflows. Pin versions in production and test upgrades in staging first.

Frequently Asked Questions

What embedding models does Together AI offer?+

Together AI hosts multiple embedding models including M2-BERT for general retrieval, BGE models for multilingual embeddings, and specialized models for code and scientific text. Check the Together AI documentation for the current model catalog.

What is reranking?+

Reranking takes an initial set of search results (from embedding similarity) and re-scores them using a more powerful model that considers query-document relevance more deeply. It improves precision by pushing the most relevant results to the top.

How do embeddings work for semantic search?+

Text is converted into dense vectors (embeddings) that capture semantic meaning. Similar texts have similar vectors. To search, embed the query, compute cosine similarity against document vectors, and return the closest matches. This works across paraphrases and synonyms.

Can I use Together AI embeddings with any vector store?+

Yes. Together AI generates standard dense vectors that work with any vector store: Pinecone, Weaviate, Chroma, Qdrant, Milvus, pgvector, or FAISS. Generate embeddings with Together AI and store them in your preferred vector database.

How does this skill help Claude Code?+

The skill teaches Claude Code the correct API patterns for Together AI's embeddings and reranking endpoints. When you ask Claude Code to build a semantic search feature or RAG pipeline using Together AI, it generates correct code based on the skill's patterns.

Citations (3)

Together AI Documentation— Together AI embeddings and reranking API
RAG Paper (arXiv)— Dense retrieval and reranking for RAG
Together AI Models— Embedding models for semantic search

Related on TokRepo

RAG Tools Research Tools Featured Workflows

🙏

Source & Thanks

Part of togethercomputer/skills — MIT licensed.

Discussion

No comments yet. Be the first to share your thoughts.

Together AI Embeddings & Reranking Skill for Agents

What it is

How it saves time or tokens

How to use

Example

Related on TokRepo

Common pitfalls

Frequently Asked Questions

Citations (3)

Related on TokRepo

Source & Thanks

Discussion

Related Assets

Claude-Flow — Multi-Agent Orchestration for Claude Code

SuperClaude — Workflow Framework for Claude Code

Claudia — Tauri Desktop GUI for Claude Code