OpenAI

Software Engineer, Cloud Infrastructure

OpenAI7 months ago
Location

San Francisco

Type

Full Time

Salary

USD 230,000 – 490,000

Level

Senior

Role

Backend Engineer

Posted

Aug 4, 2025

Full TimeSenior

The role

Summary

OpenAI is seeking a Senior Software Engineer to join their Cloud Infrastructure team, responsible for building and maintaining the core infrastructure that powers ChatGPT and their API services. The role involves designing scalable Kubernetes-based platforms, cloud abstractions, and ensuring infrastructure can handle massive scale while maintaining reliability and security standards.

What you'll do

Infrastructure Platform Design: Design and build scalable development and production platforms that power OpenAI's products including ChatGPT and API services
Scale Architecture Planning: Ensure infrastructure architecture can scale to the next order of magnitude to support massive AI workloads
Kubernetes Operations: Operate and maintain large-scale Kubernetes clusters supporting critical AI model inference and training workloads
Cloud Abstraction Development: Build and maintain cloud infrastructure abstractions that enable rapid product deployment across multiple cloud providers
System Reliability Engineering: Maintain high availability and reliability standards for production systems supporting millions of users
Incident Response: Participate in on-call rotation to respond to critical infrastructure incidents and ensure rapid resolution
Security Implementation: Implement and maintain security best practices across all infrastructure components and deployment pipelines
Performance Optimization: Monitor and optimize infrastructure performance to support AI model inference at scale
Cross-team Collaboration: Work closely with research, product, and design teams to enable rapid iteration and deployment of AI technologies
Infrastructure Automation: Develop automation tools and processes to reduce manual operations and improve deployment reliability

What we look for

Technical

Infrastructure Experience5+ years of experience building and maintaining core infrastructure systems at scale
Kubernetes ExpertiseExtensive experience operating Kubernetes orchestration systems in production environments
Cloud Platform ProficiencyDeep experience building abstractions and working with major cloud platforms (AWS, GCP, Azure)
System ArchitectureStrong background in designing scalable, reliable, and secure distributed systems
Infrastructure as CodeProficiency with infrastructure automation tools like Terraform, Ansible, or similar
Monitoring and ObservabilityExperience with monitoring systems, logging, and observability tools for large-scale infrastructure
Network ArchitectureUnderstanding of networking concepts including load balancing, CDNs, and service mesh technologies

Education

Computer Science DegreeBachelor's degree in Computer Science, Engineering, or equivalent practical experience
Advanced Technical EducationMaster's degree in Computer Science or related field preferred but not required

Experience

Senior Infrastructure Role5+ years in senior infrastructure engineering roles with increasing responsibility
Scale OperationsExperience operating infrastructure supporting millions of users or high-throughput applications
DevOps CultureBackground working in DevOps or SRE environments with emphasis on automation and reliability
Startup EnvironmentExperience working in fast-paced, high-growth technology companies preferred

Skills

Required skills

Kubernetes AdministrationExpert-level skills in Kubernetes cluster management, networking, and troubleshooting
Cloud ArchitectureAdvanced knowledge of cloud-native architecture patterns and multi-cloud strategies
Infrastructure AutomationProficiency in infrastructure as code tools and CI/CD pipeline development
System DesignStrong system design skills for building scalable and reliable distributed systems
Incident ManagementExperience with incident response, troubleshooting, and post-mortem analysis
Security Best PracticesKnowledge of infrastructure security, compliance, and vulnerability management

Nice to have

AI/ML InfrastructureExperience with infrastructure supporting machine learning workloads and GPU clusters
Service MeshHands-on experience with service mesh technologies like Istio or Linkerd
Observability PlatformsAdvanced experience with observability tools like Prometheus, Grafana, and distributed tracing
Multi-Cloud StrategyExperience designing and implementing multi-cloud infrastructure strategies
Performance TuningSkills in performance optimization for high-throughput, low-latency systems
Open Source ContributionsActive contributions to infrastructure-related open source projects

Compensation & benefits

Salary

USD 230,000 – 490,000 (annual)

Stock options

Available

Benefits

Equity Compensation

Competitive equity package with significant upside potential in a rapidly growing AI company

Health Insurance

Comprehensive medical, dental, and vision insurance coverage

Relocation Assistance

Full relocation support for candidates moving to San Francisco

Professional Development

Access to cutting-edge AI research and learning opportunities

Flexible PTO

Generous time off policy to support work-life balance

Retirement Benefits

401(k) plan with company matching

Parental Leave

Comprehensive parental leave policy for new parents

Mental Health Support

Access to mental health resources and counseling services

Learning Budget

Annual budget for conferences, courses, and professional development

Gym/Wellness

Fitness and wellness benefits including gym membership reimbursement


Interview process

  1. 1
    Application Review Initial screening of resume and technical background by recruiting team
  2. 2
    Recruiter Phone Screen 30-minute call to discuss role fit, experience, and answer initial questions
  3. 3
    Technical Phone Interview 60-minute technical discussion focusing on infrastructure design and Kubernetes experience
  4. 4
    System Design Interview 90-minute session designing scalable infrastructure solutions for AI workloads
  5. 5
    Technical Deep Dive Detailed technical interview covering cloud platforms, monitoring, and incident response
  6. 6
    Cultural Fit Interview Discussion with team members about OpenAI's mission, values, and collaborative approach
  7. 7
    Final Interview Round Meetings with senior leadership and potential team members
  8. 8
    Reference Checks Professional reference verification and background check process

Apply for this position

You'll be redirected to the company's application page