JobMesh

Research Scientist in Multimodal Interaction and World Model - Seed - Graduates - 2027 Start (PhD)

ByteDance · San Jose, California, US

About the team The Seed Multimodal Interaction and World Model team is dedicated to developing models that have human-level multimodal understanding and inte...

Job description

About the team The Seed Multimodal Interaction and World Model team is dedicated to developing models that have human-level multimodal understanding and interaction capabilities. The team is working to advance the exploration and development of multimodal assistant products. Responsibilities: The base salary range for this position in the selected city is $244800 - $450000 annually. - Develop multimodal foundation models integrating vision, language, audio, and environment signals. - Design and optimize world models for reasoning, planning, and interaction. - Build training pipelines including data curation, alignment, and reinforcement learning. - Improve agent capabilities such as perception, memory, decision-making, and tool use. - Explore next-generation interaction paradigms between humans and intelligent systems.