About CompressBench

An open benchmark for evaluating prompt compression methods.

What is CBv2?

CBv2 (CompressBench v2) is a composite quality score that measures how well a compression method preserves the usefulness of prompts. It evaluates 100 diverse cases across 5 categories at 3 compression rates (0.25, 0.50, 0.70), producing 300 evaluations per method.

Scoring Formula

CBv2 = 100 × TRS^0.50 × CIR^0.20 × SPS^0.10 × SFS^0.10 × CES^0.10

Final score = mean(CBv2@0.25, CBv2@0.50, CBv2@0.70)

Metrics

TRS — Task Retention Score (50%)

Can a downstream LLM still solve the original task after compression? Binary pass/fail per case, averaged.

CIR — Critical Information Recall (20%)

Are all critical information units (functions, numbers, flags, URLs) preserved in the compressed text?

SPS — Semantic Preservation Score (10%)

Semantic similarity between original and compressed text measured by embedding cosine similarity.

SFS — Structural Fidelity Score (10%)

Are structural elements (code blocks, JSON, YAML, tables, headers, lists) preserved after compression?

CES — Compression Efficiency Score (10%)

Did the method actually compress to the requested rate? Penalizes over- and under-compression.

Hard Constraints

TRS < 0.85 → 50% score penalty
CIR < 0.80 → flagged as information-lossy (no penalty)
Structure required but invalid → case score = 0

Submit Your Method

pip install compressbench

compressbench register

compressbench run --method http --endpoint YOUR_URL

compressbench submit results/result_http_*.json