March 1, 2026
ParEVO: Agentic Evolutionary Synthesis of Parallel
Algorithms
We release ParEVO, a new framework capable of synthesizing performant parallel algorithms for
irregular data structures. By decoupling standard code generation from strict unit tests, our
Evolutionary Coding Agent achieves an average 106x speedup on ParEval through performance tests.
Read full post →
February 23, 2026
QEDBench: Auditing LLMs as Mathematical Judges
We introduce QEDBench, a 272-problem benchmark that decouples mathematical proof
generation from verification to reveal systemic limitations in frontier LLM
reasoning. Our evaluation of 5 solvers and 7 LLM judges against 1,000+ hours of
expert grading reveals a dangerous Sycophancy Trap and the Discrete-Continuous
Reasoning Gap.
Read full post →