Site Reliability Engineer
Leidos · US
Leidos is seeking a Site Reliability Engineer as part of our DevOps team in support of a large-scale, complex Software program within the Department of Justi...
Job description
Leidos is seeking a Site Reliability Engineer as part of our DevOps team in support of a large-scale, complex Software program within the Department of Justice. This role will ensure the applications are reliable, scalable, and efficient. This role will act as the bridge between development and IT operations, applying software engineering principles to automate infrastructure tasks, improve system reliability, and optimize performance. Responsibilities: Qualifications: - Automate operations, CI/CD, and release management to ensure system reliability and scalability. - Monitor system health, performance, and capacity in real-time, proactively addressing issues. - Implement monitoring and alerting systems for rapid incident response, in accordance with ATF SLAs or KPIs. - Conduct post-incident reviews to identify root causes and drive remediation efforts. - Manage OpenShift/Kubernetes clusters and define application-level infrastructure using Terraform. - Analyze historical data to predict and provision future infrastructure needs. - Support application-level infrastructure (DBs, S3, IAM) while interfacing with the hardware/networking team, project capacity and utilization. - Improve...