JobMesh

Agentic Code-Generation Loop Research Intern

Windmill · Paris, Île-De-France, FR

Skills: Git, TypeScript, Deep Learning, Natural Language Processing, Prompt Engineering The intern will design, evaluate and bring to the state of the art th...

Job description

Skills: Git, TypeScript, Deep Learning, Natural Language Processing, Prompt Engineering The intern will design, evaluate and bring to the state of the art the internal Windmill agentic loop for generating scripts, flows and full-stack apps - and build the benchmarking system that measures its progress. The work tackles several open questions: how to objectively evaluate a generated workflow or app beyond "it compiles" (functional tests, end-to-end execution, UX quality, semantic correctness); how an agent should decompose a natural-language specification into coherent atomic steps; how to efficiently inject Windmill-specific context (hub, types, resource schemas) without saturating the context window; how to exploit execution feedback for self-correction; how to keep a dependency graph of scripts, flows and apps coherent across iterative multi-file edits; and how to detect hallucinations, silent regressions and "fake successes" where tests pass for the wrong reasons. Expected deliverables: the Windmill benchmark (corpus, harness, tracking dashboard); an improved agentic loop shipped to production with documented progression metrics; a weekly lab notebook; the final thesis report; a...