JobMesh

Site Reliability Engineer (SRE) / Operations Engineer

ECS · Arlington, Virginia, US

ECS is seeking a Site Reliability Engineer (SRE) / Operations Engineer to work in our Arlington, VA office / remote .

Job description

ECS is seeking a Site Reliability Engineer (SRE) / Operations Engineer to work in our Arlington, VA office / remote . ECS is seeking a Site Reliability Engineer (SRE) / Operations Engineer who is responsible for ensuring the reliability, availability, performance, and operational efficiency of enterprise applications and supporting infrastructure. This role bridges software engineering and IT operations by applying engineering practices, automation, and monitoring to maintain stable systems and rapidly resolve operational issues. The SRE/Ops Engineer works closely with development, security, and platform teams to support system deployments, manage incidents, improve observability, and implement resilient architectures that support continuous delivery and mission-critical operations. Responsibilities: - Maintain the reliability, availability, and performance of production systems and cloud-based services. - Monitor system health using observability tools (metrics, logs, and tracing) and respond to alerts and incidents. - Participate in incident response, troubleshooting, and root cause analysis to restore service and prevent recurrence. - Implement automation and infrastructure-as-c...