Staff Software Engineer, Kubernetes Platform
Anthropic · New York City, New York, US
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and fo...
Job description
About Anthropic Anthropic’s mission is to create reliable, interpretable, and steerable AI systems. We want AI to be safe and beneficial for our users and for society as a whole. Our team is a quickly growing group of committed researchers, engineers, policy experts, and business leaders working together to build beneficial AI systems. About the role: Anthropic runs some of the largest Kubernetes clusters in the industry. We have fleets of hundreds of thousands of nodes across multiple cloud providers and datacenters to train, research, and serve frontier AI models. The Kubernetes Platform team owns the Kubernetes control plane that makes those clusters work. We are operating at a scale where the defaults stop working. We own the scheduler and extend it to place topology-sensitive ML workloads across thousands of accelerators at once. We scale the control plane itself — apiserver, etcd, controllers — so it stays responsive as object counts and node counts grow by orders of magnitude. And we build the core cluster services every workload depends on, like service discovery, so they hold up under the same pressure. We make sure the control plane is fast, correct, and always available....