[Remote] Lead AWS Data Engineer (PySpark, Glue & Dimensional Modelling)
Note: The job is a remote job and is open to candidates in USA. FUSTIS LLC is seeking a Lead AWS Data Engineer to develop and maintain PySpark-based ETL pipelines and manage AWS Glue jobs. The role involves designing dimensional data models and optimizing data workflows to ensure data reliability and performance. Responsibilities • Develop and maintain PySpark-based ETL pipelines for batch and incremental data processing • Build and operate AWS Glue Spark jobs (batch and event-driven), including: • Job configuration, scaling, retries, and cost optimization • Glue Catalog and schema management • Design and maintain event-driven data workflows triggered by S3, EventBridge, or streaming sources • Load and transform data into Amazon Redshift , optimizing for: • Distribution and sort keys • Incremental loads and upserts • Query performance and concurrency • Design and implement dimensional data models (star/snowflake schemas), including: • Fact and dimension tables • Slowly Changing Dimensions (SCDs) • Grain definition and data quality controls • Collaborate with analytics and reporting teams to ensure the warehouse is BI-ready • Monitor, troubleshoot, and optimize data pipelines for reliability and performance Skills • PySpark ramp-up • Glue job hands-on proof • Dimensional modeling • Strong PySpark experience (Spark SQL, DataFrames, performance tuning) • Hands-on experience with AWS Glue (Spark jobs, not just crawlers) • Experience loading and optimizing data in Amazon Redshift • Proven experience designing dimensional data warehouse schemas • Familiarity with AWS-native data services (S3, IAM, CloudWatch) • Production ownership mindset (debugging, failures, reprocessing) Company Overview • Welcome to Fustis, your trusted partner in connecting IT consultants with clients. It was founded in 2022, and is headquartered in Sacramento, California, US, with a workforce of 11-50 employees. Its website is Apply tot his job