Freelance AI Evaluation Engineer (Python/Full-Stack)
Mindrift · FR
Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leadin...
Job description
Please submit your CV in English and indicate your level of English proficiency. Mindrift connects specialists with project-based AI opportunities for leading tech companies, focused on testing, evaluating, and improving AI systems. Participation isproject-based, not permanent employment. What this opportunity involves: You’ll create challenging coding test cases that push AI coding systems to their limits: - Review and refine realistic coding tasks based on provided production codebases with realistic scope, requirements and information sources - Write comprehensive functional tests that validate actual end-to-end behavior and edge-cases, not just superficial checks - Craft “fair but hard” challenges where the AI has all the context it needs, but has to work for it (information scattered across files and external sources, complex reasoning required) - Analyze AI failures to understand what the model struggles with vs. what it masters - Iterate based on feedback from expert QA reviewers who score your work on 7 quality criteria What we look for: This opportunity is a good fit for experienced developers, software engineers, and/or test automation specialists open to part-time, non-p...