JobMesh

AI Evaluation & Reliability Engineer (Agents & LLM Systems)

abra · Center, Texas, US

Description abra R&D is looking for a AI Evaluation & Reliability Engineer (Agents & LLM Systems)! abra R&D is looking for a AI Evaluation & Reliability Engi...

Job description

Description abra R&D is looking for a AI Evaluation & Reliability Engineer (Agents & LLM Systems)! abra R&D is looking for a AI Evaluation & Reliability Engineer who will take part in building the next-generation agentic analytics platform, the first real-time database optimized for AI agents at scale. We’re looking for a Senior AI Evaluation & Reliability Engineer to define and build how AI agents are measured, validated, monitored, and improved in production. This role sits at the intersection of LLM systems, evaluation research, and production-grade engineering. You will design evaluation methodologies, build LLM-as-a-judge systems, and develop agent-based testing frameworks to ensure correctness, robustness, and reliability of complex multi-agent workflows operating on real-time data. What You’ll Do: - Design and implement evaluation frameworks for AI agents and multi-agent systems - Build LLM-as-a-judge pipelines to assess correctness, reasoning quality, and output quality - Develop agent-based evaluation systems (agents evaluating agents) for scalable testing - Define metrics, benchmarks, scorecards, and methodologies for agent reliability and performance - Build data-driven...