jcode bench

Improve given production-grade primitives. Exhaustively verified, scored in doublings, time recorded but never capped.

How a run works

The agent receives a working, tested implementation of a real primitive, its exhaustive verifier, and a published deterministic cost model. It edits, grades, and climbs. Every grade is seconds. The harness records the best score continuously, producing a score-over-time curve. Correctness on every possible input is a gate; speed under the cost model is the score, in doublings over the given implementation.

Tasks

Chosen for headroom: primitives where the best known implementations are still far from any plausible limit, so the climb stays open past the frontier.

float-print — shortest round-trip float to decimal

Print a float32 as the shortest decimal string that parses back to the same bits. An active research problem (Grisu, Ryū, Dragonbox) where new algorithms are still being found. Verified by round-trip over all 2³² floats.

IN DEVELOPMENT

json-unescape — decode JSON string escapes

The hot path of every JSON parser. Escape density varies wildly across real inputs, leaving a large open design space beyond current SIMD implementations. Exhaustively verified over bounded-length sequences.

IN DEVELOPMENT

utf16-transcode — UTF-16 to UTF-8 and back

The boundary between JavaScript, Windows, and the rest of the world. Far less polished than UTF-8 validation; mixed-width branching leaves genuine room. Exhaustively verified over all code point sequences up to bounded length.

IN DEVELOPMENT

Leaderboard

Coming with the first task release. Every entry will link its full score-over-time curve, submission history, and the exact grader version that scored it, reproducible by anyone.

Task specs, graders, and the harness will be published in full when the first task ships. There is no hidden test set. Read about the benchmark class at /bench.