xVerify: Efficient Answer Verifier for Large Language Model Evaluations
benchmark regex reliability evaluation llm reliability-tools chatgpt cc-by-nc-nd-4 open-compass llm-as-a-judge deepseek-math judge-model reasoning-models open-r1 xverify math-verify
-
Updated
Mar 28, 2025 - Python