Lead Data Engineer (Databricks/Pyspark)
Capgemini · Dallas, Texas, US
Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by...
Job description
Choosing Capgemini means choosing a company where you will be empowered to shape your career in the way you’d like, where you’ll be supported and inspired by a collaborative community of colleagues around the world, and where you’ll be able to reimagine what’s possible. Join us and help the world’s leading organizations unlock the value of technology and build a more sustainable, more inclusive world. Job Description: Responsibilities: Design and implement scalable ETL pipelines on Databricks PySpark SQL Delta Lake to process credit card transactions balances and payments Develop the core calculation engines and integrate with upstream downstream systems Optimize Spark jobs for largescale financial datasets billions of records partitioning caching AQE Ensure data quality and reconciliation across raw curated and output layers Implement parameterized rules APR compounding frequency grace period logic Collaborate with business analysts to translate product rules into technical implementations Apply unit tests CICD pipelines and monitoring for production grade pipelines Ensure compliance with financial data governance lineage and audit requirements Skills: Strong in PySpark SQL Databr...