About CompressBench

An open benchmark for evaluating prompt compression methods.

What is CBv2?

CBv2 (CompressBench v2) is a composite quality score that measures how well a compression method preserves the usefulness of prompts. It evaluates 100 diverse cases across 5 categories at 3 compression rates (0.25, 0.50, 0.70), producing 300 evaluations per method.

Scoring Formula

CBv2 = 100 × TRS0.50 × CIR0.20 × SPS0.10 × SFS0.10 × CES0.10

Final score = mean(CBv2@0.25, CBv2@0.50, CBv2@0.70)

Metrics

TRS — Task Retention Score (50%)

Can a downstream LLM still solve the original task after compression? Binary pass/fail per case, averaged.

CIR — Critical Information Recall (20%)

Are all critical information units (functions, numbers, flags, URLs) preserved in the compressed text?

SPS — Semantic Preservation Score (10%)

Semantic similarity between original and compressed text measured by embedding cosine similarity.

SFS — Structural Fidelity Score (10%)

Are structural elements (code blocks, JSON, YAML, tables, headers, lists) preserved after compression?

CES — Compression Efficiency Score (10%)

Did the method actually compress to the requested rate? Penalizes over- and under-compression.

Categories

100 benchmark cases span 5 categories: Code Context (20), Chat History (20), Structured Data (20), Documentation (20), and Mixed (20). Each category tests different aspects of compression quality.

Hard Constraints

  • TRS < 0.85 → 50% score penalty
  • CIR < 0.80 → flagged as information-lossy (no penalty)
  • Structure required but invalid → case score = 0

Submit Your Method

pip install compressbench

compressbench register

compressbench run --method http --endpoint YOUR_URL

compressbench submit results/result_http_*.json