JobMesh

ML Model Serving Engineer

Sesame · San Francisco, California, US

About Sesame Sesame believes in a future where computers are lifelike - with the ability to see, hear, and collaborate with us in ways that feel natural and...

Job description

About Sesame Sesame believes in a future where computers are lifelike - with the ability to see, hear, and collaborate with us in ways that feel natural and human. With this vision, we're designing a new kind of computer, focused on making voice agents part of our daily lives. Our team brings together founders from Oculus and Ubiquity6, alongside proven leaders from Meta, Google, and Apple, with deep expertise spanning hardware and software. Join us in shaping a future where computers truly come alive. Responsibilities: Turbocharge our serving layer, consisting of a variety of LLM, speech, and vision models. Partner with ML infrastructure and training engineers to build a fast, cost-effective, accurate, and reliable serving layer to power a new consumer product category. Modify and extend LLM serving frameworks like VLLM and SGLang to take advantage of the latest techniques in high-performance model serving. Work with the training team to identify opportunities to produce faster models without sacrificing quality. Use techniques like in-flight batching, caching, and custom kernels to speed up inference. Find ways to reduce model initialization times without sacrificing quality. Req...