JobMesh

Senior Manager Site Reliability Engineering

Akamai Technologies · PL

Do you thrive on building reliability into AI infrastructure from the ground up? Prepare to lead SRE initiatives for AI solutions, optimizing GPU clusters an...

Job description

Do you thrive on building reliability into AI infrastructure from the ground up? Prepare to lead SRE initiatives for AI solutions, optimizing GPU clusters and serverless inference, while ensuring global-scale performance and reliability. Join the Akamai AI Team!: Akamai's Cloud Technology Group delivers AI infrastructure at global scale. Our GPU compute platform provides customers with dedicated GPU resources, from single GPUs to full clusters, for training, simulation, inference, and any workload they choose to run. Site Reliability Engineering is embedded from the start to ensure production-grade reliability and performance. Partner with the best: As Senior Manager, you will lead the team responsible for reliability across Akamai's AI compute and platform services. You will also build the team, owning hiring strategy, candidate evaluation, and interview coordination for AI SRE roles. This is a hands-on leadership role that requires partnering with product engineering teams to embed reliability into products that are moving fast. As a Senior Manager of SRE, you will be accountable for: - Fostering and growing the AI SRE team by recruiting, guiding, and supporting career developmen...