Site Reliability Engineer II
Akamai Technologies · US
Are you passionate about cutting-edge AI infrastructure? Do you want to build your SRE career on one of the most exciting platforms in cloud computing? Join...
Job description
Are you passionate about cutting-edge AI infrastructure? Do you want to build your SRE career on one of the most exciting platforms in cloud computing? Join the Akamai Inference Cloud Team The Akamai Inference Cloud team is part of Akamai's Cloud Technology Group. We design, implement, deploy and operate AI platforms that enable customers to run inference models and developers to create AI applications. Partner with the best: In this role, responsibilities will include automation, monitoring, incident response, and working collaboratively with skilled team members. Candidates should possess expertise in Linux systems, automation, and SRE practices. Daily activities involve coding, improving dashboards, enhancing alerts, and minimizing repetitive tasks. Opportunities exist to focus on GPU infrastructure, Kubernetes, and ensuring reliability for AI workloads within Akamai's serverless inference platform. As an Site Reliability Engineer II, you will be responsible for: - Building and maintaining dashboards, alerts, and monitoring for inference workloads using Akamai's existing observability platform - Writing automation and tooling in Python or Go to reduce operational toil and improv...