CRS Summary Benchmark Frontier LLMs vs. real Congressional Research Service summaries

Every bill in the benchmark

Filter by summarizer, by a single criterion's pass/fail, by outcome, or search by title or bill number. Click a bill to read all summaries with the judge's reasoning.

Loading…

Each dot is one summarizer's verdict (hover for its name): met all applicable criteria missed one or more
Loading bills…