Machine Learning Engineer - Multi-Modality Foundation Model
Zoox · Foster City, California, US
The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence. As a M...
Job description
The Perception team is pioneering the development of a multi-modality foundation model to drive the next generation of autonomous system intelligence. As a Multi-modality Foundation Model Engineer, you will focus on building highly efficient, production-ready multi-modality models. We are looking for experts who have hands-on experience building multi-modality foundation models—whether that involves AV-centric modalities (Vision, LiDAR, Radar) or broader domains (Vision, Language, Text, Audio). You will design, train, and deploy these models using Knowledge Distillation (KD) to transfer capabilities from large-scale proprietary teacher models to efficient student models capable of real-time, on-vehicle inference. In this role, you will: Build, pre-train, and evaluate large-scale multi-modality foundation models from the ground up, successfully aligning diverse data streams (e.g., Vision, LiDAR, Radar, Language, Audio). Define and execute the ML roadmap for deploying these multi-modality representations to the vehicle. Architect and implement Knowledge Distillation pipelines to compress large-capacity multi-modal teacher models into highly efficient, production-ready student models....