Senior Site Reliability Engineer
O’Reilly Auto Parts · US
The Sr Site Reliability Engineer operates with a high degree of independence and leverages multiple functional and technology skillsets to design, develop, t...
Job description
The Sr Site Reliability Engineer operates with a high degree of independence and leverages multiple functional and technology skillsets to design, develop, test, and implement resilient software solutions. The Sr Site Reliability Engineer supports in leading the team through consistent software development best practices and will mentor and guide junior engineers, fostering their technical growth. What you'll do: - Experience of working with large scale distributed systems, including scalability, disaster recovery and fault tolerance. - Expertise Python scripting . - Define, implement, and own SLIs, SLOs, and error budgets for critical microservices in collaboration with product and engineering teams. - Use error budgets to influence release decisions, prioritize reliability work, and manage operational risk. - Design and maintain observability platforms including metrics, logs, traces, and real-time telemetry. - Track, manage, and reduce operational toil by converting repetitive operational work into Jira stories and epics with clear ownership and measurable outcomes. - Design, implement, and validate resiliency mechanisms such as graceful degradation, redundancy, automated failov...