About CompressBench
An open benchmark for evaluating prompt compression methods.
What is CBv2?
CBv2 (CompressBench v2) is a composite quality score that measures how well a compression method preserves the usefulness of prompts. It evaluates 100 diverse cases across 5 categories at 3 compression rates (0.25, 0.50, 0.70), producing 300 evaluations per method.
Scoring Formula
Final score = mean(CBv2@0.25, CBv2@0.50, CBv2@0.70)
Metrics
TRS — Task Retention Score (50%)
Can a downstream LLM still solve the original task after compression? Binary pass/fail per case, averaged.
CIR — Critical Information Recall (20%)
Are all critical information units (functions, numbers, flags, URLs) preserved in the compressed text?
SPS — Semantic Preservation Score (10%)
Semantic similarity between original and compressed text measured by embedding cosine similarity.
SFS — Structural Fidelity Score (10%)
Are structural elements (code blocks, JSON, YAML, tables, headers, lists) preserved after compression?
CES — Compression Efficiency Score (10%)
Did the method actually compress to the requested rate? Penalizes over- and under-compression.
Categories
100 benchmark cases span 5 categories: Code Context (20), Chat History (20), Structured Data (20), Documentation (20), and Mixed (20). Each category tests different aspects of compression quality.
Hard Constraints
- TRS < 0.85 → 50% score penalty
- CIR < 0.80 → flagged as information-lossy (no penalty)
- Structure required but invalid → case score = 0
Submit Your Method
pip install compressbench
compressbench register
compressbench run --method http --endpoint YOUR_URL
compressbench submit results/result_http_*.json