Skills2026年4月8日·1 分钟阅读

Together AI Embeddings & Reranking Skill for Agents

Skill that teaches Claude Code Together AI's embeddings and reranking API. Covers dense vector generation, semantic search, RAG pipelines, and result reranking patterns.

Together AI · Community

Agent 就绪

Agent 可直接安装

这个资产可安装；Agent 先选择当前运行时、检查安装计划，再运行匹配命令。

Native · 98/100策略：允许

Agent 入口

任意 MCP/CLI Agent

类型

Skill

安装

Single

信任

信任等级：Community

入口

Together AI Embeddings & Reranking Skill for Agents

直接安装命令

npx -y tokrepo@latest install da3bf81c-8928-41ba-b5c4-457355af582d --target codex

先 dry-run 确认安装计划，再运行此命令。

TL;DR

Claude Code skill covering Together AI's embeddings and reranking API for semantic search, RAG pipelines, and result reranking patterns.

§01

What it is

This skill teaches Claude Code how to use Together AI's embeddings and reranking API. It covers dense vector generation for semantic search, building RAG (Retrieval-Augmented Generation) pipelines, and reranking search results for better relevance.

The skill targets developers building search and retrieval systems who want to use Together AI's hosted embedding models and reranking endpoints instead of running models locally.

§02

How it saves time or tokens

Together AI provides hosted embedding models that generate dense vectors without managing GPU infrastructure. The reranking API improves search quality by re-scoring initial retrieval results, reducing the number of irrelevant documents passed to the LLM.

Better retrieval means fewer tokens wasted on irrelevant context in RAG pipelines. The reranker filters out noise before the LLM processes the results.

§03

How to use

Install the Together AI SDK:

pip install together

Generate embeddings:

from together import Together

client = Together(api_key='your-api-key')

response = client.embeddings.create(
    model='togethercomputer/m2-bert-80M-8k-retrieval',
    input=['How to build a RAG pipeline', 'What is semantic search?']
)

for embedding in response.data:
    print(f'Vector dimension: {len(embedding.embedding)}')

Use embeddings for semantic search by comparing cosine similarity between query and document vectors.

Rerank results for better relevance:

response = client.rerank.create(
    model='Salesforce/Llama-Rank-V1',
    query='best practices for RAG',
    documents=['Document about RAG...', 'Document about CSS...', 'Document about retrieval...']
)

§04

Example

# Complete RAG pipeline with Together AI
import numpy as np
from together import Together

client = Together(api_key='your-key')

# Step 1: Embed documents
docs = ['RAG improves LLM accuracy', 'CSS Grid layout tutorial', 'Vector search with FAISS']
doc_embeddings = client.embeddings.create(
    model='togethercomputer/m2-bert-80M-8k-retrieval', input=docs
)

# Step 2: Embed query
query_embedding = client.embeddings.create(
    model='togethercomputer/m2-bert-80M-8k-retrieval', input=['How does RAG work?']
)

# Step 3: Rerank top results
reranked = client.rerank.create(
    model='Salesforce/Llama-Rank-V1',
    query='How does RAG work?',
    documents=docs
)

§05

Related on TokRepo

AI Tools for RAG — RAG pipeline tools and components
AI Tools for Research — AI-powered search and research tools

§06

Common pitfalls

Not normalizing embeddings before cosine similarity. Some models output unnormalized vectors. Normalize to unit length for correct similarity scores.
Skipping the reranking step. Initial embedding-based retrieval is fast but approximate. Reranking significantly improves relevance for the top-k results passed to the LLM.
Using the wrong embedding model for your use case. Retrieval models (m2-bert-retrieval) are optimized for search. Code models are better for code search. Match the model to your domain.
Failing to review community discussions and changelogs before upgrading. Breaking changes in major versions can disrupt existing workflows. Pin versions in production and test upgrades in staging first.

常见问题

What embedding models does Together AI offer?+

Together AI hosts multiple embedding models including M2-BERT for general retrieval, BGE models for multilingual embeddings, and specialized models for code and scientific text. Check the Together AI documentation for the current model catalog.

What is reranking?+

Reranking takes an initial set of search results (from embedding similarity) and re-scores them using a more powerful model that considers query-document relevance more deeply. It improves precision by pushing the most relevant results to the top.

How do embeddings work for semantic search?+

Text is converted into dense vectors (embeddings) that capture semantic meaning. Similar texts have similar vectors. To search, embed the query, compute cosine similarity against document vectors, and return the closest matches. This works across paraphrases and synonyms.

Can I use Together AI embeddings with any vector store?+

Yes. Together AI generates standard dense vectors that work with any vector store: Pinecone, Weaviate, Chroma, Qdrant, Milvus, pgvector, or FAISS. Generate embeddings with Together AI and store them in your preferred vector database.

How does this skill help Claude Code?+

The skill teaches Claude Code the correct API patterns for Together AI's embeddings and reranking endpoints. When you ask Claude Code to build a semantic search feature or RAG pipeline using Together AI, it generates correct code based on the skill's patterns.

引用来源 (3)

Together AI Documentation— Together AI embeddings and reranking API
RAG Paper (arXiv)— Dense retrieval and reranking for RAG
Together AI Models— Embedding models for semantic search

🙏

来源与感谢

togethercomputer/skills — MIT

讨论

登录后参与讨论。

还没有评论，来写第一条吧。

Together AI Embeddings & Reranking Skill for Agents

Agent 可直接安装

What it is

How it saves time or tokens

How to use

Example

Related on TokRepo

Common pitfalls

常见问题

引用来源 (3)

TokRepo 相关

来源与感谢

讨论

相关资产

Together AI Dedicated Containers Skill for Agents

Together AI Dedicated Endpoints Skill for Agents

Together AI Image Generation Skill for Claude Code

Together AI Video Generation Skill for Claude Code