# LLMLingua — Compress Prompts 20x with Minimal Loss

> Microsoft research tool for prompt compression. Reduce token usage up to 20x while maintaining LLM performance. Solves lost-in-the-middle for RAG. MIT, 6,000+ stars.

## Install

Save as a script file and run:

## Quick Use

1. Install:
```bash
pip install llmlingua
```

2. Compress a prompt:
```python
from llmlingua import PromptCompressor

compressor = PromptCompressor()
result = compressor.compress_prompt(
    context=["Your long context here..."],
    instruction="Summarize the key points.",
    target_token=500
)
print(result["compressed_prompt"])
```

---

## Intro

LLMLingua is Microsoft Research's prompt compression toolkit with 6,000+ GitHub stars, published at EMNLP 2023 and ACL 2024. It reduces prompt length by up to 20x while preserving LLM performance, saving significant API costs. LLMLingua-2 offers 3-6x speed improvement over the original through GPT-4 data distillation. Especially effective for RAG pipelines where long retrieved contexts cause the "lost-in-the-middle" problem. Best for developers building production LLM apps who need to optimize token usage and costs.

See also: [TokenCost for tracking LLM spending](https://tokrepo.com/en/workflows/) on TokRepo.

---

## LLMLingua — Prompt Compression by Microsoft Research

### The Problem

LLM API costs are directly tied to token count. Long contexts in RAG pipelines, multi-document QA, and chain-of-thought prompting can consume thousands of tokens per request. Additionally, LLMs suffer from the "lost-in-the-middle" problem — they focus on the beginning and end of long contexts, missing information in the middle.

### The Solution

LLMLingua uses a small language model to identify and remove non-essential tokens from prompts, achieving up to 20x compression with minimal performance loss.

### Three Methods

| Method | Paper | Compression | Speed |
|--------|-------|-------------|-------|
| **LLMLingua** | EMNLP 2023 | Up to 20x | Baseline |
| **LongLLMLingua** | ACL 2024 | 4x (+ 21.4% RAG improvement) | Same |
| **LLMLingua-2** | ACL 2024 Findings | Up to 20x | 3-6x faster |

### Installation

```bash
pip install llmlingua
```

### Usage Examples

**Basic compression:**
```python
from llmlingua import PromptCompressor

compressor = PromptCompressor(
    model_name="microsoft/llmlingua-2-bert-base-multilingual-cased-meetingbank"
)

compressed = compressor.compress_prompt(
    context=["Long document text here..."],
    instruction="Answer the question based on the context.",
    question="What are the key findings?",
    target_token=200
)

print(f"Original: {compressed['origin_tokens']} tokens")
print(f"Compressed: {compressed['compressed_tokens']} tokens")
print(f"Ratio: {compressed['ratio']}")
print(compressed["compressed_prompt"])
```

**For RAG pipelines (LongLLMLingua):**
```python
from llmlingua import PromptCompressor

compressor = PromptCompressor()

# Multiple retrieved documents
contexts = [
    "Document 1: ...",
    "Document 2: ...",
    "Document 3: ..."
]

compressed = compressor.compress_prompt(
    context=contexts,
    instruction="Answer based on the provided documents.",
    question="What is the main conclusion?",
    target_token=500,
    use_context_level_filter=True  # LongLLMLingua feature
)
```

### Performance Benchmarks

- **20x compression** on general prompts with <2% performance drop
- **21.4% improvement** on RAG tasks using only 1/4 of tokens (LongLLMLingua)
- **3-6x speed improvement** with LLMLingua-2 (uses data distillation from GPT-4)

### FAQ

**Q: What is LLMLingua?**
A: A Microsoft Research toolkit for compressing LLM prompts by up to 20x while maintaining performance, reducing API costs and solving the lost-in-the-middle problem in long contexts.

**Q: Is LLMLingua free?**
A: Yes, fully open-source under the MIT license.

**Q: Does LLMLingua work with any LLM?**
A: Yes, LLMLingua compresses prompts before they are sent to any LLM. It works with OpenAI, Claude, Gemini, and any other model.

---

## Source & Thanks

> Created by [Microsoft Research](https://github.com/microsoft). Licensed under MIT.
>
> [LLMLingua](https://github.com/microsoft/LLMLingua) — ⭐ 6,000+

Thanks to Huiqiang Jiang, Qianhui Wu, and the Microsoft Research team for advancing prompt compression.

---

<!-- ZH -->

## 快速使用

1. 安装：
```bash
pip install llmlingua
```

2. 压缩提示词：
```python
from llmlingua import PromptCompressor

compressor = PromptCompressor()
result = compressor.compress_prompt(
    context=["你的长文本..."],
    instruction="总结要点。",
    target_token=500
)
print(result["compressed_prompt"])
```

---

## 简介

LLMLingua 是微软研究院的提示词压缩工具包，GitHub 6,000+ star，发表于 EMNLP 2023 和 ACL 2024。最高可将提示词压缩 20 倍且性能损失极小，显著节省 API 成本。对 RAG 管道中的"中间丢失"问题尤其有效。适合构建生产级 LLM 应用、需要优化 token 用量和成本的开发者。

---

## LLMLingua — 微软研究院提示词压缩

### 三种方法

| 方法 | 论文 | 压缩率 | 速度 |
|------|------|--------|------|
| LLMLingua | EMNLP 2023 | 最高 20x | 基准 |
| LongLLMLingua | ACL 2024 | 4x（RAG 提升 21.4%） | 相同 |
| LLMLingua-2 | ACL 2024 | 最高 20x | 快 3-6x |

### 性能基准

- 通用提示词 20x 压缩，性能下降 <2%
- RAG 任务用 1/4 token 提升 21.4%
- LLMLingua-2 速度提升 3-6x

### FAQ

**Q: LLMLingua 是什么？**
A: 微软研究院的提示词压缩工具包，最高 20 倍压缩率，几乎不影响 LLM 性能。

**Q: 免费吗？**
A: 完全免费开源，MIT 许可。

---

## 来源与感谢

> Created by [Microsoft Research](https://github.com/microsoft). Licensed under MIT.
>
> [LLMLingua](https://github.com/microsoft/LLMLingua) — ⭐ 6,000+


---
Source: https://tokrepo.com/en/workflows/1510da0c-33d7-11f1-9bc6-00163e2b0d79
Author: Script Depot