OpenAI

Engineering Manager, Cloud Infrastructure Automation

OpenAI1 months ago
Location

San Francisco

Type

Full Time

Salary

USD 293,000 – 385,000

Level

Senior

Role

Engineering Manager

Posted

Jan 24, 2026

Full TimeSenior

The role

Summary

OpenAI is seeking a Senior Engineering Manager to lead their Cloud Infrastructure teams responsible for Kubernetes-based platform operations at massive scale. This leadership role involves building high-performing teams, setting technical direction, and delivering infrastructure primitives that support OpenAI's production AI systems used by millions globally. The position requires deep expertise in Kubernetes, distributed systems, and infrastructure automation, combined with strong leadership skills to manage platform engineering teams in a fast-paced, high-scale environment.

What you'll do

Team Leadership: Build, lead, and grow high-performing infrastructure engineering teams focused on cloud platform development
Platform Evolution: Own the evolution of OpenAI's Kubernetes platform, including cluster lifecycle, upgrades, configuration standards, and safety mechanisms
Reliability Engineering: Set and enforce platform-level reliability goals (SLIs/SLOs), ensuring reliability is designed into the system architecture
Infrastructure Automation: Drive infrastructure automation across provisioning, upgrades, remediation, and fleet consistency using Terraform and internal tooling
Operational Excellence: Reduce operational toil and incident frequency through better abstractions, guardrails, and self-healing systems
Technical Direction: Establish clear ownership boundaries, technical direction, and execution discipline for platform engineering initiatives
Cross-Team Collaboration: Partner closely with adjacent infrastructure, security, and product teams to ensure platform scalability and reliability
Production System Management: Maintain direct responsibility for production systems operating at extreme scale with high availability requirements

What we look for

Technical

Kubernetes ExpertiseDeep hands-on understanding of Kubernetes at scale and distributed systems architecture
Infrastructure-as-CodeProficiency with Terraform and infrastructure automation tools
Service MeshExperience with service mesh technologies like Istio and Envoy
ObservabilityKnowledge of metrics, logging, and distributed tracing systems
Cloud NetworkingUnderstanding of global networking, load balancing, and CDN technologies
Production OperationsExperience operating production infrastructure with strict reliability, latency, and security requirements

Education

Bachelor's DegreeBachelor's degree in Computer Science, Engineering, or related technical field preferred
Advanced DegreeMaster's degree in relevant field or equivalent industry experience

Experience

Management ExperienceSignificant experience managing infrastructure or platform engineering teams
Senior Engineer HiringStrong track record of hiring, developing, and retaining senior engineers
Technical LeadershipAbility to balance technical depth with organizational leadership and long-term strategy
High-Scale OperationsExperience operating large-scale production systems with high reliability requirements
Fast-Paced EnvironmentComfortable operating in ambiguous, fast-moving environments and creating clarity for others

Skills

Required skills

KubernetesExpert-level knowledge of Kubernetes architecture, operations, and scaling
Team ManagementProven ability to build, lead, and grow high-performing engineering teams
Infrastructure AutomationHands-on experience with Terraform and infrastructure-as-code practices
Distributed SystemsDeep understanding of distributed systems design and reliability patterns
Production OperationsExperience managing large-scale production infrastructure with strict SLAs
Technical LeadershipAbility to set technical direction and drive execution across engineering teams

Nice to have

Service MeshExperience with Istio, Envoy, or similar service mesh technologies
Cloud PlatformsMulti-cloud experience with AWS, GCP, or Azure infrastructure
Observability ToolsKnowledge of Prometheus, Grafana, and distributed tracing systems
AI/ML InfrastructureUnderstanding of infrastructure requirements for AI and machine learning workloads
Security PracticesExperience with infrastructure security and compliance requirements
GitOpsFamiliarity with GitOps workflows and continuous deployment practices

Compensation & benefits

Salary

USD 293,000 – 385,000 (annual)

Stock options

Available

Benefits

Equity Compensation

Significant equity package in OpenAI with high growth potential

Health Insurance

Comprehensive health, dental, and vision insurance coverage

Flexible Work Arrangement

Hybrid work model allowing for remote and in-office collaboration

Professional Development

Opportunities to work on cutting-edge AI infrastructure and learn from industry experts

Equal Opportunity

Inclusive workplace committed to diversity and equal employment opportunities

Reasonable Accommodations

Support for employees with disabilities and accommodation requests


Interview process

  1. 1
    Initial Screening Phone or video call with recruiting team to discuss background and role fit
  2. 2
    Technical Discussion In-depth technical interview covering Kubernetes, distributed systems, and infrastructure architecture
  3. 3
    Leadership Interview Discussion of management philosophy, team building experience, and leadership scenarios
  4. 4
    System Design Whiteboard or virtual session designing large-scale infrastructure solutions
  5. 5
    Panel Interview Meeting with multiple team members and stakeholders to assess cultural fit and collaboration
  6. 6
    Final Round Executive interview covering strategic thinking and long-term vision for platform engineering

Apply for this position

You'll be redirected to the company's application page