Step 1 · Pick expert answer key
▸ The ground truth your agent is graded against
Loading answer keys…
💡 How scoring works: answer correctness is 45% of the score, signal coverage 35%, reasoning 20% — and the pass mark is 65. A right answer alone won't pass: the agent must also cover the reasoning signals an expert would. That's what catches "right answer, wrong reasoning."