Site Reliability Engineer NEX
Patterson-UTI · Houston, Texas, US
The NexTier Technology team is looking for a Site Reliability Engineer (SRE) to help build, scale, and maintain highly reliable systems on Google Cloud Platf...
Job description
The NexTier Technology team is looking for a Site Reliability Engineer (SRE) to help build, scale, and maintain highly reliable systems on Google Cloud Platform (GCP). This role blends software engineering with infrastructure expertise to ensure our services are performant, resilient, and cost-efficient. Candidates will work closely with engineering teams to improve system reliability, automate operations, and embed best practices across the platform. Detailed Description: Design, implement, and manage scalable, reliable infrastructure on GCP Define and track SLIs, SLOs, and error budgets - Maintain and improve system availability, performance, and latency - Build and manage infrastructure as code using tools like Terraform or similar - Develop automation to reduce manual operational work - Monitor systems using observability tools (metrics, logs, tracing) and respond to incidents - Participate in on-call rotations and lead incident response and postmortems - Collaborate with development teams to improve service reliability and deployment processes - Optimize cloud resource usage and cost efficiency - Implement and maintain CI/CD pipelines for reliable software delivery Required Kn...