Senior DevOps Engineer
BRAHMA AI · GB
Job Description: Own build, deploy, and runtime reliability across BRAHMA AI’s hybrid estate. Deliver secure, scalable infrastructure for Gen AI based workfl...
Job description
Job Description: Own build, deploy, and runtime reliability across BRAHMA AI’s hybrid estate. Deliver secure, scalable infrastructure for Gen AI based workflows and products across hybrid environments. Partner with infrastructure and multidisciplinary product and research teams to help them innovate and ship fast.We are hiring remotely across the EMEA region. Key Responsibilities: - Design, implement, and operate Slurm and Kubernetes-based platforms across cloud and on-prem GPU nodes, including autoscaling, rollout strategies, and multi-cluster operations. - Build CI/CD pipelines for services, model training, and model serving; standardise artifact/version management and environment promotion. - Implement Infrastructure as Code with Terraform/Terragrunt and configuration management; enforce drift detection and repeatable environments. - Design and implement observability stacks (metrics, logs, tracing); drive incident response and postmortems. - Secure the stack with least privilege, secrets management, network policy, and hardened baselines; support ISO/MPA controls with the security team. - Operate model-serving infrastructure for real-time and batch workloads; optimise GPU utili...