JobMesh

Sr. AI Inference Systems Engineer

Tencent · Palo Alto, California, US

Business Unit What the Role Entails - End-to-End Inference Optimization: Lead the optimization of the full inference pipeline for Large Models (LLM, Multimod...

Job description

Business Unit What the Role Entails: End-to-End Inference Optimization: Lead the optimization of the full inference pipeline for Large Models (LLM, Multimodal); focus on KV Cache storage strategies, Router architecture design, and collaborative operator optimization to maximize throughput and minimize latency. Heterogeneous Computing Research: Conduct in-depth research into the underlying inference logic of various hardware accelerators ; evaluate architectural suitability for real-time, batch, and streaming inference scenarios to develop standardized optimization schemes. Inference Framework & Toolchain: Design and implement high-performance inference frameworks; optimize scheduling and memory management to resolve long-tail issues such as communication latency and load imbalance in distributed inference. Technological Innovation: Track global advancements in inference technology (e.g., compiler optimization, model compression, and hardware fusion); drive the productization of emerging technologies within production environments. Technical Leadership: Lead efforts to overcome key technical bottlenecks in inference optimization; design technical roadmaps and mentor team members to...