JobMesh

Senior Site Reliability Engineer

Castleton Commodities International · Stamford, Connecticut, US

The Senior Site Reliability Engineer is responsible for improving the reliability, availability, scalability, and operational excellence of our critical infr...

Job description

The Senior Site Reliability Engineer is responsible for improving the reliability, availability, scalability, and operational excellence of our critical infrastructure platforms and services. This role partners closely with Engineering, Security, and Infrastructure teams to design resilient cloud-native architectures, implement Infrastructure as Code (IaC) and CI/CD standards, and drive measurable reliability outcomes. The Senior Site Reliability Engineer will also lead efforts to define and validate recovery objectives (RTO/RPO), design and implement Business Continuity / Disaster Recovery (BCP/DR) plans, and coordinate structured testing to ensure readiness. Responsibilities: Reliability Engineering & Operations Own and improve service reliability through SLO/SLI definition, error budgets, and operational best practices. Design, implement, and maintain observability (monitoring, logging, tracing, alerting) to reduce MTTR and improve proactive detection. Lead incident response practices including on-call improvements, runbooks, post-incident reviews (RCA), and preventative actions. Partner with application teams to improve performance, capacity planning, and resiliency under failu...