JobMesh

Lead Infrastructure and Reliability Engineer (Systems & Scale)

Luma AI · Palo Alto, California, US

About Luma AI A new class of intelligence is emerging, systems that understand and generate the world across video, images, audio, and language. Building mul...

Job description

About Luma AI A new class of intelligence is emerging, systems that understand and generate the world across video, images, audio, and language. Building multimodal AGI is not just a modeling challenge. It is an infrastructure challenge at the edge of what hardware, software, and organizations can support. At Luma, we operate rapidly scaling 10k+ GPU fleets, pushing utilization, throughput, and reliability hard enough that yesterday’s solutions break regularly. Researchers depend on this infrastructure to move the frontier forward. Customers depend on it to power real creative work. Many companies run accelerators. Very few sit directly next to the teams inventing the models that redefine what those accelerators must do. At Luma, improvements to scheduling, efficiency, and reliability immediately translate into faster research iteration and entirely new product capabilities. We are still early. The playbook is still being written. A single exceptional engineer can reshape how the company operates. Where You Come In: Our Infrastructure Engineering team is a systems engineering group with company-level responsibility. At Luma, reliability engineers work directly with the researchers...