JobMesh

Machine Learning Engineer - Multi-Modality Foundation Model

Zoox · Foster City, California, US

The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence. As a M...

Job description

The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence. As a Multi-modality Foundation Model Engineer, you will focus on building highly efficient, production-ready multi-modality models. We are looking for experts who have hands-on experience building multi-modality foundation models—whether that involves AV-centric modalities (Vision, LiDAR, Radar) or broader domains (Vision, Language, Text, Audio). You will design, train, and deploy these models using Knowledge Distillation (KD) to transfer capabilities from large-scale proprietary teacher models to efficient student models capable of real-time, on-vehicle inference. In this role, you will: Build, pre-train, and evaluate large-scale multi-modality foundation models from the ground up, successfully aligning diverse data streams (e.g., Vision, LiDAR, Radar, Language, Audio). Define and execute the ML roadmap for deploying these multi-modality representations to the vehicle. Architect and implement Knowledge Distillation pipelines to compress large-capacity multi-modal teacher models into highly efficient, production-ready student models....