Engineering Manager, Cloud Infrastructure Automation

OpenAI1 months ago

Location

San Francisco

Type

Full Time

Salary

USD 293,000 – 385,000

Level

Senior

Role

Engineering Manager

Posted

Jan 24, 2026

Full TimeSenior

The role

Summary

OpenAI is seeking a Senior Engineering Manager to lead their Cloud Infrastructure teams responsible for Kubernetes-based platform operations at massive scale. This leadership role involves building high-performing teams, setting technical direction, and delivering infrastructure primitives that support OpenAI's production AI systems used by millions globally. The position requires deep expertise in Kubernetes, distributed systems, and infrastructure automation, combined with strong leadership skills to manage platform engineering teams in a fast-paced, high-scale environment.

What you'll do

Team Leadership: Build, lead, and grow high-performing infrastructure engineering teams focused on cloud platform development

Platform Evolution: Own the evolution of OpenAI's Kubernetes platform, including cluster lifecycle, upgrades, configuration standards, and safety mechanisms

Reliability Engineering: Set and enforce platform-level reliability goals (SLIs/SLOs), ensuring reliability is designed into the system architecture

Infrastructure Automation: Drive infrastructure automation across provisioning, upgrades, remediation, and fleet consistency using Terraform and internal tooling

Operational Excellence: Reduce operational toil and incident frequency through better abstractions, guardrails, and self-healing systems

Technical Direction: Establish clear ownership boundaries, technical direction, and execution discipline for platform engineering initiatives

Cross-Team Collaboration: Partner closely with adjacent infrastructure, security, and product teams to ensure platform scalability and reliability

Production System Management: Maintain direct responsibility for production systems operating at extreme scale with high availability requirements

What we look for

Technical

Kubernetes ExpertiseDeep hands-on understanding of Kubernetes at scale and distributed systems architecture

Infrastructure-as-CodeProficiency with Terraform and infrastructure automation tools

Service MeshExperience with service mesh technologies like Istio and Envoy

ObservabilityKnowledge of metrics, logging, and distributed tracing systems

Cloud NetworkingUnderstanding of global networking, load balancing, and CDN technologies

Production OperationsExperience operating production infrastructure with strict reliability, latency, and security requirements

Education

Bachelor's DegreeBachelor's degree in Computer Science, Engineering, or related technical field preferred

Advanced DegreeMaster's degree in relevant field or equivalent industry experience

Experience

Management ExperienceSignificant experience managing infrastructure or platform engineering teams

Senior Engineer HiringStrong track record of hiring, developing, and retaining senior engineers

Technical LeadershipAbility to balance technical depth with organizational leadership and long-term strategy

High-Scale OperationsExperience operating large-scale production systems with high reliability requirements

Fast-Paced EnvironmentComfortable operating in ambiguous, fast-moving environments and creating clarity for others

Skills

Required skills

KubernetesExpert-level knowledge of Kubernetes architecture, operations, and scaling

Team ManagementProven ability to build, lead, and grow high-performing engineering teams

Infrastructure AutomationHands-on experience with Terraform and infrastructure-as-code practices

Distributed SystemsDeep understanding of distributed systems design and reliability patterns

Production OperationsExperience managing large-scale production infrastructure with strict SLAs

Technical LeadershipAbility to set technical direction and drive execution across engineering teams

Nice to have

Service MeshExperience with Istio, Envoy, or similar service mesh technologies

Cloud PlatformsMulti-cloud experience with AWS, GCP, or Azure infrastructure

Observability ToolsKnowledge of Prometheus, Grafana, and distributed tracing systems

AI/ML InfrastructureUnderstanding of infrastructure requirements for AI and machine learning workloads

Security PracticesExperience with infrastructure security and compliance requirements

GitOpsFamiliarity with GitOps workflows and continuous deployment practices

Compensation & benefits

Salary

USD 293,000 – 385,000 (annual)

Stock options

Available

Benefits

Equity Compensation

Significant equity package in OpenAI with high growth potential

Health Insurance

Comprehensive health, dental, and vision insurance coverage

Flexible Work Arrangement

Hybrid work model allowing for remote and in-office collaboration

Professional Development

Opportunities to work on cutting-edge AI infrastructure and learn from industry experts

Equal Opportunity

Inclusive workplace committed to diversity and equal employment opportunities

Reasonable Accommodations

Support for employees with disabilities and accommodation requests

Interview process

1
Initial Screening — Phone or video call with recruiting team to discuss background and role fit
2
Technical Discussion — In-depth technical interview covering Kubernetes, distributed systems, and infrastructure architecture
3
Leadership Interview — Discussion of management philosophy, team building experience, and leadership scenarios
4
System Design — Whiteboard or virtual session designing large-scale infrastructure solutions
5
Panel Interview — Meeting with multiple team members and stakeholders to assess cultural fit and collaboration
6
Final Round — Executive interview covering strategic thinking and long-term vision for platform engineering

Apply for this position

You'll be redirected to the company's application page

More Jobs at OpenAI

75 other open positions

View all

Engineering Manager, Online Data Systems

San Francisco

Manager

Software Engineer, Delivery / CD

San Francisco

Senior

Engineering Manager ChatGPT Infra

London, UK

Manager

Software Engineer, Ads Monetization, Revenue Platform

San Francisco

Senior

iOS Engineer, ChatGPT Mobile Infrastructure

San Francisco

Staff

OpenAI

View all jobs

OpenAI is an American artificial intelligence research organization developing advanced AI models like GPT. Focused on ensuring AI benefits humanity, it creates tools for natural language processing and generative AI applications.

San Francisco, California, United StatesFounded 2015openai.com

Tech Stack

Languages

GoPythonBash/Shell

Frameworks

KubernetesIstioEnvoy

Databases

etcdPrometheus

Tools

TerraformCloudflareDockerHelmGitOps Tools

Other

Observability StackService MeshCloud Platforms

Interview Guides

5 guides available for OpenAI

Apply Now

Engineering Manager, Cloud Infrastructure Automation

The role

Summary

What you'll do

What we look for

Technical

Education

Experience

Skills

Required skills

Nice to have

Compensation & benefits

Benefits

Interview process

More Jobs at OpenAI

OpenAI

Tech Stack

Interview Guides

On this page