GPU Performance Engineer | Experienced Hire
Susquehanna International Group · New York City, New York, US
Overview We are looking for a GPU Performance Engineer to build highly optimized CUDA kernels for low-latency inference.
Job description
Overview We are looking for a GPU Performance Engineer to build highly optimized CUDA kernels for low-latency inference. This role is focused on workloads where off-the-shelf runtimes and vendor libraries do not fully exploit the structure of the model, and where custom kernels, memory layouts, and execution strategies can deliver meaningful gains. You will work closely with quantitative researchers and engineers to understand model structure, identify computational bottlenecks, and turn mathematical ideas into production-grade GPU implementations. You will use your understanding of GPU hardware to help shape models that are both mathematically effective and efficient to run. The problems span compact neural networks, tree-based models, and other structured inference workloads where latency, throughput, and efficiency all matter. This role is a strong fit for someone who enjoys low-level optimization, performance analysis, and translating abstract models into hardware-efficient code. What you'll do: - Design, implement, and optimize custom CUDA kernels for latency-critical inference workloads - Develop fine-grained GPU implementations tailored to specific model structures - Analyze...