Senior Director of Platform Engineering
DeepSee delivers an open and flexible agentic platform to accelerate AI adoption for financial services in front, middle, and back-office operations. Our cloud-based platform seamlessly integrates with existing bank architectures, whether they're just starting their AI transformation journey or looking to enhance existing in-house capabilities with Agentic AI solutions. With DeepSee's pre-trained & pre-configured agents, banking and capital markets firms can automate and orchestrate manual, repetitive tasks-freeing domain experts for strategic work, reducing risk, and streamlining operations to drive greater efficiency. We are looking for a Senior Director of Platform Engineering to lead our backend, frontend, infrastructure, and MLOps/DevOps/CICD teams. You'll scale our Kubernetes platform across AKS, EKS, and on-prem, ensure high availability and performance, and evolve our agentic AI and MCP-based integrations for bank-grade reliability. You'll partner tightly with the Chief Architect and our Product team to deliver a secure, observable, auditable platform for regulated clients. Job Responsibilities: • Own and drive the platform roadmap and strategy for multi-cloud/on-prem Kubernetes (AKS, EKS, vanilla K8s), compute, data, networking, ML serving, and high availability/performance. • Lead, build, and develop multiple teams (Backend, Frontend, Infrastructure, MLOps/DevOps), including leadership, career ladders, and operational rhythms. • Scale Kubernetes reliably: capacity planning, autoscaling (HPA/VPA/Cluster Autoscaler/KEDA), cost controls for mixed CPU/GPU workloads. • Advance and mature GitOps, IaC, and observability practices (Argo CD, Terraform, Helm, OpenTelemetry, Datadog, Prometheus), including rollout strategies, standardization, monitoring, incident response, and post-mortems. • Advance MLOps for LLMs/SLMs/ML/DL (KServe, MLflow pipelines, model governance, inference patterns, GPU scheduling, canary rollouts). • Evolve and operate eventing and stateful architecture at scale (Kafka/ZooKeeper/KRaft, Postgres, S3/Blob, protobuf, schema evolution/versioning, resilient data planes). • Directly contribute technically via coding, reviews, and debugging distributed systems. • Partner closely with Chief Architect, Principal AI, Product, and other leads to deliver secure, observable, auditable, regulated banking solutions, supporting agentic AI and workflow automation. Must Haves: • Significant leadership experience: 10 years on distributed platforms and 5 years leading multi-disciplinary platform teams. • Deep, hands-on Kubernetes expertise (networking, security, tenancy, upgrades; AKS/EKS operations). • Proven hands-on expertise with GitOps, IaC, change management, rollout safety, and production observability (Argo CD, Terraform, Helm, OpenTelemetry, Datadog/Prometheus, SLOs/on-call). • Advanced MLOps experience (KServe, MLflow, model registry/governance, GPU scheduling, cost tuning, canary rollouts, safe rollouts). • Experience with designing/operating event streaming, stateful data, and resilient architecture at scale (Kafka/ZooKeeper/KRaft, Postgres, S3/Blob, protobuf, schema/versioning). • Deep proficiency in core languages (Java, Python, Go), cloud SDKs, and strong architectural communication to executive-level and clients. • Regulated FinServ experience (SOC 2/ISO 27001, SR 11-7, SEC/FINRA, model governance, OpenTelemetry, trace-driven perf, KServe ModelMesh or similar tools). Nice to Haves: • Hands-on skills with most listed technologies: Kubernetes (vanilla, AKS, EKS), Docker, Argo CD, Helm, Terraform, Kafka/ZooKeeper/KRaft, KServe, MLflow, OpenTelemetry, Datadog, Prometheus, protobuf, HPA, VPA, Karpenter or Cluster Autoscaler, LightRAG, Milvus, Postgres, S3/Blob, Redis, Airflow/dbt, Java, Python, Go. • Experience working alongside a variety of engineering leaders and principal engineers (Chief Architect, CISO, Principal Knowledge Graph Engineer, AI Engineer, Lead BE, Principal FE, Product). • Platform-as-a-product advocacy and developer experience focus, CNCF platform engineering guidance. Finally, it is important that you align with our Stuff That Matters. Knowledge Over Noise: We prioritize actionable insights One Team, One Dream: We collaborate seamlessly across functions Be a Seeker: We constantly pursue innovation and learning Stay Human: We keep our solutions people-centric Act Boldly: We take calculated risks to drive progress Believe: We're passionate about our mission Own It: We take responsibility for our work and its impact Why DeepSee.ai? Competitive compensation package including equity, with remote work options 100% company-paid premiums on health, dental, and vision insurance Opportunity to work on cutting-edge AI technology with real impact Collaborative and innovative work environment Join us in shaping the future of AI-powered automation and make a significant impact in a rapidly growing startup. If you're a hands-on problem solver who thrives in fast-paced environments and is excited about leveraging AI to solve complex problems, we want to hear from you! Apply tot his job