Senior Data Engineer, Data Warehouse
About the position GeneDx (Nasdaq: WGS) delivers personalized and actionable health insights to inform diagnosis, direct treatment, and improve drug discovery. The company is uniquely positioned to accelerate the use of genomic and large-scale clinical information to enable precision medicine as the standard of care. GeneDx is at the forefront of transforming healthcare through its industry-leading exome and genome testing and interpretation services, fueled by the world’s largest, rare disease data sets. For more information, please visit www.genedx.com . Summary We are looking for a Data Engineer to join our growing Unified Data Warehouse team. This role is ideal for someone with a strong foundation in data engineering principles and data warehousing concepts who is eager to build scalable, high-performance data systems. You will be responsible for developing, maintaining, and optimizing our data pipelines and infrastructure, collaborating closely with analysts, data scientists, and stakeholders across the organization. If you are passionate about data, solving complex problems, and building systems that scale, this role is for you. Responsibilities • Design, build, and maintain scalable ETL/ELT pipelines for structured and unstructured data. • Contribute to and maintain the enterprise data model – the source of truth in our Snowflake warehouse. • Write and optimize complex SQL queries (including window functions, temp tables, and query performance tuning) to support analytics and reporting needs. • Take part in designing and maintaining centralized model layer. • Support data warehousing solutions via Snowflake + dbt. • Develop automation scripts in Bash, Python, or other programming languages. • Manage cloud environments (AWS, OCI) in collaboration with infrastructure teams. • Maintain and optimize Kubernetes (EKS) cluster for scalable workloads. • Implement and maintain infrastructure-as-code using tools like Terraform, YAML, and Argo for reproducible and reliable deployments. • Debug and troubleshoot data pipelines and data quality issues across systems. • Collaborate with stakeholders of varying technical backgrounds to translate business requirements into scalable technical solutions. • Be an active contributor to our ETL/ELT framework. We contribute features, fixes, and improvements almost daily. Everyone is encouraged and empowered to propose improvements and optimizations to our framework. • Contribute to best practices for data modeling, governance, and quality control. • Explore and recommend AI tools and modern data solutions for efficiency and automation. Requirements • Strong understanding of data engineering concepts and data warehousing fundamentals. • Advanced SQL skills, including debugging and performance tuning. • Proficiency in at least one general-purpose programming language (e.g., Python, Java, Scala). We use Python. • Familiarity with Kimball (Dimensional) Modeling • Basic scripting knowledge (Bash) for automation and operational workflows. • Familiarity with cloud platforms (AWS, GCP, or OCI). • Solid communication and collaboration skills to work effectively with technical and non-technical stakeholders. • Familiarity with Git Nice-to-haves • Experience with distributed computing frameworks such as Dask (preferred) or Spark. • Hands-on experience managing and deploying workloads in Kubernetes. • Exposure to infrastructure-as-code (Terraform, Helm, Argo, etc.). • Experience with any of the popular workflow orchestration systems (Airflow, Dagster, Argo Workflows, etc) • Experience implementing Change Data Capture (CDC) pipelines. • Strong debugging and problem-solving skills for troubleshooting complex data issues. • Knowledge of AI tools and when to apply them in a data engineering context. Benefits • Paid Time Off (PTO) • Health, Dental, Vision and Life insurance • 401k Retirement Savings Plan • Employee Discounts • Voluntary benefits Apply tot his job