JobMesh

Machine Learning Engineer - Orchestration

ByteDance · San Jose, California, US

About the Team: Data AML is ByteDance's Machine Learning mid-platform, providing training and inference systems for recommendation, advertising, CV, speech,...

Job description

About the Team: Data AML is ByteDance's Machine Learning mid-platform, providing training and inference systems for recommendation, advertising, CV, speech, and NLP for businesses such as Douyin, Jinri Toutiao, and Xigua Video. It provides powerful Machine Learning computing power to internal business units within the company and conducts research on some general and innovative algorithms for issues in these businesses. At the same time, it also provides some core capabilities of Machine Learning and Recommender systems to external enterprise customers through Volcano Engine. In addition, AML also conducts some cutting-edge research in fields such as Al for Science and scientific computing. Responsibilities: 1) Optimizing resource efficiency in distributed orchestration and scheduling, through engineering means, enhances the scale of business/models supported per unit of computing power: a) Use/secondarily develop distributed scheduling frameworks around the Kubernetes/Godel ecosystem, make reasonable selections in different business scenarios, and optimize scheduling strategies for cluster utilization/uniformity based on the characteristics of different scenarios; b) Connect/exten...