Scripts2026年4月9日·1 分钟阅读

LLMLingua — Compress Prompts 20x with Minimal Loss

Microsoft research tool for prompt compression. Reduce token usage up to 20x while maintaining LLM performance. Solves lost-in-the-middle for RAG. MIT, 6,000+ stars.

Introduction

LLMLingua is Microsoft Research's prompt compression toolkit, with 6,000+ GitHub stars and papers at EMNLP 2023 and ACL 2024. Compress prompts up to 20x with minimal performance loss, dramatically reducing API cost. Especially effective against the "lost in the middle" problem in RAG pipelines. Ideal for developers building production-grade LLM apps who want to optimize token usage and cost.


LLMLingua — Microsoft Research Prompt Compression

Three Methods

Method Paper Compression Speed
LLMLingua EMNLP 2023 Up to 20x Baseline
LongLLMLingua ACL 2024 4x (RAG +21.4%) Same
LLMLingua-2 ACL 2024 Up to 20x 3–6x faster

Performance Benchmarks

  • 20x compression on general prompts with <2% performance drop
  • 21.4% improvement on RAG tasks using 1/4 the tokens
  • LLMLingua-2 is 3–6x faster

FAQ

Q: What is LLMLingua? A: Microsoft Research's prompt compression toolkit — up to 20x compression with negligible impact on LLM performance.

Q: Is it free? A: Completely free and open source under the MIT license.


🙏

来源与感谢

Created by Microsoft Research. Licensed under MIT.

LLMLingua — ⭐ 6,000+

讨论

登录后参与讨论。
还没有评论,来写第一条吧。

相关资产