Principal Engineer – Gen AI Platform Inferencing Engineering
Wells Fargo · Charlotte, North Carolina, US
Wells Fargo is seeking a Principal Engineer – Gen AI Platform Inferencing Engineering to lead the development and optimization of our AI model serving and in...
Job description
About this role: Wells Fargo is seeking a Principal Engineer – Gen AI Platform Inferencing Engineering to lead the development and optimization of our AI model serving and inferencing platforms within Digital Technology's AI Capability Engineering group. This is a software engineering role — you'll write code, build systems, and solve hard problems in the AI inference stack. You'll work deep inside frameworks like vLLM, SGLang, and NVIDIA Dynamo, extending and optimizing them to serve models at enterprise scale. You'll also build the automation, tooling, and deployment infrastructure that connects these runtimes to Kubernetes-native serving layers like KServe, KNative, and OpenShift AI. If you've contributed to inference frameworks, written custom serving logic, or built production ML serving pipelines in Python, we want to hear from you. In this role, you will: - Develop, extend, and optimize inference runtime configurations and integrations across vLLM, SGLang, NVIDIA Dynamo, TensorRT-LLM, and Triton - Write Python-based tooling and automation for model onboarding, serving configuration, performance benchmarking, and deployment pipelines - Build and maintain Kubernetes-native mod...