JobMesh

Research Engineer Graduate (AI Training Systems Reliability & Performance - Seed Infra) - 2026 Start (PhD)

ByteDance · Seattle, Washington, US

About the Team The Seed Infrastructures team oversees the distributed training, reinforcement learning framework, high-performance inference, and heterogeneo...

Job description

About the Team The Seed Infrastructures team oversees the distributed training, reinforcement learning framework, high-performance inference, and heterogeneous hardware compilation technologies for AI foundation models. We are looking for talented individuals to join our team in 2026. As a graduate, you will get opportunities to pursue bold ideas, tackle complex challenges, and unlock limitless growth. Launch your career where inspiration is infinite at our Company. Successful candidates must be able to commit to an onboarding date by end of year 2026. Please state your availability and graduation date clearly in your resume. Responsibilities: The base salary range for this position in the selected city is $232560 - $427500 annually. - Improve the reliability and performance of large-scale training systems across pre-training, fine-tuning, evaluation, and inference - Build observability, profiling, and debugging tools for distributed ML workloads - Identify and optimize performance bottlenecks across GPU, networking, and storage layers - Contribute to distributed training frameworks in multi-GPU and multi-node environments - Collaborate with model and infrastructure teams to improv...