JobMesh

Research Scientist - LLM Training System as a Service - Global Frontier Tech Recruitment Program - 2027 Start (PhD)

ByteDance · San Jose, California, US

We are looking for talented individuals to join our team in 2027. As a graduate, you will get opportunities to pursue bold ideas, tackle complex challenges,...

Job description

We are looking for talented individuals to join our team in 2027. As a graduate, you will get opportunities to pursue bold ideas, tackle complex challenges, and unlock limitless growth. Launch your career where inspiration is infinite at our Company. Successful candidates must be able to commit to an onboarding date by end of year 2027. Please state your availability and graduation date clearly in your resume. Team Introduction: AML-MLsys combines system engineering and the art of machine learning to develop and maintain massively distributed ML training and Inference system/services around the world, providing high-performance, highly reliable, scalable systems for LLM/AIGC/AGI. Topic Content: With the evolution from large language models (LLMs) to AI Agents, the training paradigm is undergoing a fundamental shift. Traditional distributed training frameworks like Megatron-LM are designed around relatively static parallelism strategies, whereas Agent training introduces more dynamic patterns, including external tool interactions, multi-step reasoning, and iterative self-improvement. In this context, tightly coupled system design can limit flexibility and efficiency. To better suppo...