OpenAI

Software Engineer, Infrastructure Reliability

OpenAI4 days ago
Location

San Francisco

Type

Full Time

Salary

USD 255,000 – 405,000

Level

Senior

Role

Infrastructure Reliability Engineer

Posted

Mar 19, 2026

Full TimeSenior

The role

Summary

OpenAI is seeking a highly skilled Software Engineer for its Infrastructure Reliability team in San Francisco, responsible for scaling and hardening critical infrastructure that powers cutting-edge AI systems like ChatGPT. The role focuses on designing, building, and operating reliable, performant, and secure distributed systems at global scale.

What you'll do

System Design and Operation: Design, build, and operate reliable and high-performance systems used across OpenAI's engineering teams
Performance Optimization: Identify and resolve performance bottlenecks, ensuring infrastructure can scale to support next-order magnitude growth
Automation and Tooling: Continuously improve automation processes and internal tooling to enhance developer experience and reduce manual work
Incident Management: Contribute to incident response, conduct postmortems, and develop best practices for system reliability and scalability

What we look for

Technical

Distributed SystemsDeep understanding of distributed systems principles and proven track record in building scalable, reliable systems
Cloud InfrastructureStrong proficiency in cloud platforms (AWS, GCP, Azure) and Infrastructure as Code tools like Terraform

Education

Technical DegreeBachelor's or Master's degree in Computer Science, Software Engineering, or related technical field preferred

Experience

Industry Experience4+ years of relevant industry experience, with 2+ years leading large-scale, complex projects
Reliability EngineeringProven experience as a reliability engineer or production engineer in fast-paced, rapidly scaling environments

Skills

Required skills

KubernetesExperience operating Kubernetes at scale and building cloud platform abstractions
LinuxComfortable working in Linux environments and with modern infrastructure tools
ObservabilityProficiency with observability tools like Datadog, Prometheus, Grafana, and ELK stack

Nice to have

MicroservicesExperience with microservices architecture and service mesh technologies
SecurityKnowledge of cloud security best practices and infrastructure security

Compensation & benefits

Salary

USD 255,000 – 405,000 (annual)

Stock options

Available

Benefits

Competitive Compensation

Salary range of $255K-$405K with significant equity compensation

Healthcare

Comprehensive medical, dental, and vision insurance

Retirement Planning

401(k) plan with company matching

Professional Development

Continuous learning opportunities, conference attendance, and skill development programs

Work-Life Balance

Flexible work arrangements and generous paid time off


Interview process

  1. 1
    Initial Screening Phone or video call with recruiting team to discuss background and role fit
  2. 2
    Technical Assessment Online coding challenge or take-home project focusing on infrastructure and systems design
  3. 3
    Technical Interviews Multiple rounds of technical interviews with infrastructure and engineering team members
  4. 4
    System Design Interview In-depth interview exploring candidate's approach to designing scalable, reliable systems
  5. 5
    Cultural Fit Interview Discussion with team members to assess collaboration and alignment with OpenAI's mission
  6. 6
    Final Discussion Meeting with hiring manager to discuss role expectations and potential team contributions

Apply for this position

You'll be redirected to the company's application page