Principal Engineer – Distributed AI Systems Architecture (Heterogeneous Compute)
Intel · Santa Clara, California, US
Job Description: We are seeking a Principal Engineer to define and architect the next generation of distributed AI systems across heterogeneous compute platf...
Job description
Job Details: Job Description: We are seeking a Principal Engineer to define and architect the next generation of distributed AI systems across heterogeneous compute platforms, including CPUs, GPUs, IPUs/FNICs/FNICs, and emerging dataflow accelerators. This role focuses on one of the hardest problems in modern computing: How to dynamically execute and optimize large-scale AI computation graphs across diverse hardware while managing state, locality, and performance at system scale. You will operate at the intersection of systems architecture, high-performance computing, and AI infrastructure-defining the execution model, runtime abstractions, and placement strategies that turn a rack of heterogeneous devices into a coherent, programmable system. Key Responsibilities: 1. Dynamic Execution of Distributed Computation Graphs 2. Stateful Scheduling and Memory-Centric Architecture 3. Graph Introspection and Automated Partitioning o compute intensity o memory bandwidth requirements o communication cost o latency sensitivity 4. Integration of Specialized Accelerators 5. MoE-Aware Execution and Adaptive Placement o expert placement o routing locality o load balancing vs data movement trade-of...