jcode bench
Improve given production-grade primitives. Exhaustively verified, scored in doublings, time recorded but never capped.
How a run works
The agent receives a working, tested implementation of a real primitive, its exhaustive verifier, and a published deterministic cost model. It edits, grades, and climbs. Every grade is seconds. The harness records the best score continuously, producing a score-over-time curve. Correctness on every possible input is a gate; speed under the cost model is the score, in doublings over the given implementation.
Tasks
Chosen for headroom: primitives where the best known implementations are still far from any plausible limit, so the climb stays open past the frontier.
float-print — shortest round-trip float to decimal
Print a float32 as the shortest decimal string that parses back to the same bits. An active research problem (Grisu, Ryū, Dragonbox) where new algorithms are still being found. Verified by round-trip over all 232 floats.
IN DEVELOPMENT
json-unescape — decode JSON string escapes
The hot path of every JSON parser. Escape density varies wildly across real inputs, leaving a large open design space beyond current SIMD implementations. Exhaustively verified over bounded-length sequences.
IN DEVELOPMENT
utf16-transcode — UTF-16 to UTF-8 and back
The boundary between JavaScript, Windows, and the rest of the world. Far less polished than UTF-8 validation; mixed-width branching leaves genuine room. Exhaustively verified over all code point sequences up to bounded length.
IN DEVELOPMENT
Leaderboard
Coming with the first task release. Every entry will link its full score-over-time curve, submission history, and the exact grader version that scored it, reproducible by anyone.
Task specs, graders, and the harness will be published in full when the first task ships. There is no hidden test set. Read about the benchmark class at /bench.