JobMesh

CUDA Kernel Engineer

Pragmatike · Cambridge, Massachusetts, US

Location: Remote US Start date: ASAP Languages: English (required) About the Role Pragmatike is hiring on behalf of a fast-growing AI startup recognized as a...

Job description

Location: Remote US Start date: ASAP Languages: English (required) About the Role: Pragmatike is hiring on behalf of a fast-growing AI startup recognized as a Top 10 GenAI company by GTM Capital , founded by MIT CSAIL researchers. We are searching for a CUDA Kernel Engineer who has hands-on experience developing and optimizing NVIDIA CUDA kernels from scratch . You will work on the GPU performance layer powering large-scale, high-throughput AI systems used by Fortune 500 customers. This role is ideal for someone who deeply understands NVIDIA GPU architecture, memory hierarchy, warp-level execution, and profiling workflows not someone coming from generic hardware, FPGA, or non-NVIDIA compute backgrounds. You will directly influence the GPU efficiency, throughput, and scalability of mission-critical AI systems. What Youll Do: - Design, implement, and optimize custom CUDA kernels for NVIDIA GPUs , with a focus on maximizing occupancy, memory throughput, and warp efficiency. - Profile GPU workloads using tools such as N sight Compute, Nsight Systems, nvprof, and CUDA‐MEMCHECK . - Analyze and eliminate performance bottlenecks including warp divergence, uncoalesced memory access, registe...