# LLMLingua — Compress Prompts 20x with Minimal Loss > Microsoft research tool for prompt compression. Reduce token usage up to 20x while maintaining LLM performance. Solves lost-in-the-middle for RAG. MIT, 6,000+ stars. ## Install Save as a script file and run: ## Quick Use 1. Install: ```bash pip install llmlingua ``` 2. Compress a prompt: ```python from llmlingua import PromptCompressor compressor = PromptCompressor() result = compressor.compress_prompt( context=["Your long context here..."], instruction="Summarize the key points.", target_token=500 ) print(result["compressed_prompt"]) ``` --- ## Intro LLMLingua is Microsoft Research's prompt compression toolkit with 6,000+ GitHub stars, published at EMNLP 2023 and ACL 2024. It reduces prompt length by up to 20x while preserving LLM performance, saving significant API costs. LLMLingua-2 offers 3-6x speed improvement over the original through GPT-4 data distillation. Especially effective for RAG pipelines where long retrieved contexts cause the "lost-in-the-middle" problem. Best for developers building production LLM apps who need to optimize token usage and costs. See also: [TokenCost for tracking LLM spending](https://tokrepo.com/en/workflows/) on TokRepo. --- ## LLMLingua — Prompt Compression by Microsoft Research ### The Problem LLM API costs are directly tied to token count. Long contexts in RAG pipelines, multi-document QA, and chain-of-thought prompting can consume thousands of tokens per request. Additionally, LLMs suffer from the "lost-in-the-middle" problem — they focus on the beginning and end of long contexts, missing information in the middle. ### The Solution LLMLingua uses a small language model to identify and remove non-essential tokens from prompts, achieving up to 20x compression with minimal performance loss. ### Three Methods | Method | Paper | Compression | Speed | |--------|-------|-------------|-------| | **LLMLingua** | EMNLP 2023 | Up to 20x | Baseline | | **LongLLMLingua** | ACL 2024 | 4x (+ 21.4% RAG improvement) | Same | | **LLMLingua-2** | ACL 2024 Findings | Up to 20x | 3-6x faster | ### Installation ```bash pip install llmlingua ``` ### Usage Examples **Basic compression:** ```python from llmlingua import PromptCompressor compressor = PromptCompressor( model_name="microsoft/llmlingua-2-bert-base-multilingual-cased-meetingbank" ) compressed = compressor.compress_prompt( context=["Long document text here..."], instruction="Answer the question based on the context.", question="What are the key findings?", target_token=200 ) print(f"Original: {compressed['origin_tokens']} tokens") print(f"Compressed: {compressed['compressed_tokens']} tokens") print(f"Ratio: {compressed['ratio']}") print(compressed["compressed_prompt"]) ``` **For RAG pipelines (LongLLMLingua):** ```python from llmlingua import PromptCompressor compressor = PromptCompressor() # Multiple retrieved documents contexts = [ "Document 1: ...", "Document 2: ...", "Document 3: ..." ] compressed = compressor.compress_prompt( context=contexts, instruction="Answer based on the provided documents.", question="What is the main conclusion?", target_token=500, use_context_level_filter=True # LongLLMLingua feature ) ``` ### Performance Benchmarks - **20x compression** on general prompts with <2% performance drop - **21.4% improvement** on RAG tasks using only 1/4 of tokens (LongLLMLingua) - **3-6x speed improvement** with LLMLingua-2 (uses data distillation from GPT-4) ### FAQ **Q: What is LLMLingua?** A: A Microsoft Research toolkit for compressing LLM prompts by up to 20x while maintaining performance, reducing API costs and solving the lost-in-the-middle problem in long contexts. **Q: Is LLMLingua free?** A: Yes, fully open-source under the MIT license. **Q: Does LLMLingua work with any LLM?** A: Yes, LLMLingua compresses prompts before they are sent to any LLM. It works with OpenAI, Claude, Gemini, and any other model. --- ## Source & Thanks > Created by [Microsoft Research](https://github.com/microsoft). Licensed under MIT. > > [LLMLingua](https://github.com/microsoft/LLMLingua) — ⭐ 6,000+ Thanks to Huiqiang Jiang, Qianhui Wu, and the Microsoft Research team for advancing prompt compression. --- ## 快速使用 1. 安装: ```bash pip install llmlingua ``` 2. 压缩提示词: ```python from llmlingua import PromptCompressor compressor = PromptCompressor() result = compressor.compress_prompt( context=["你的长文本..."], instruction="总结要点。", target_token=500 ) print(result["compressed_prompt"]) ``` --- ## 简介 LLMLingua 是微软研究院的提示词压缩工具包,GitHub 6,000+ star,发表于 EMNLP 2023 和 ACL 2024。最高可将提示词压缩 20 倍且性能损失极小,显著节省 API 成本。对 RAG 管道中的"中间丢失"问题尤其有效。适合构建生产级 LLM 应用、需要优化 token 用量和成本的开发者。 --- ## LLMLingua — 微软研究院提示词压缩 ### 三种方法 | 方法 | 论文 | 压缩率 | 速度 | |------|------|--------|------| | LLMLingua | EMNLP 2023 | 最高 20x | 基准 | | LongLLMLingua | ACL 2024 | 4x(RAG 提升 21.4%) | 相同 | | LLMLingua-2 | ACL 2024 | 最高 20x | 快 3-6x | ### 性能基准 - 通用提示词 20x 压缩,性能下降 <2% - RAG 任务用 1/4 token 提升 21.4% - LLMLingua-2 速度提升 3-6x ### FAQ **Q: LLMLingua 是什么?** A: 微软研究院的提示词压缩工具包,最高 20 倍压缩率,几乎不影响 LLM 性能。 **Q: 免费吗?** A: 完全免费开源,MIT 许可。 --- ## 来源与感谢 > Created by [Microsoft Research](https://github.com/microsoft). Licensed under MIT. > > [LLMLingua](https://github.com/microsoft/LLMLingua) — ⭐ 6,000+ --- Source: https://tokrepo.com/en/workflows/1510da0c-33d7-11f1-9bc6-00163e2b0d79 Author: Script Depot