Software Engineer, Delivery / CD

DevOps Engineer · Senior · Full Time

San FranciscoUSD 210k – 490k4mo ago

Opens OpenAI's application page

Role

What you'll do.

OpenAI is seeking a Software Engineer for their Delivery/CD team to build and operate continuous deployment systems that safely ship infrastructure and product code to production. This role focuses on creating deployment platforms, release pipelines, and rollout safety mechanisms to enable rapid, low-risk deployments across dozens of Kubernetes clusters and global regions.

Responsibilities

Continuous Deployment Infrastructure: Design and build CD systems that safely deploy changes across dozens of Kubernetes clusters and global regions
Progressive Delivery Systems: Develop canary releases, staged rollouts, and automated rollback mechanisms for safe production deployments
Engineering Velocity Optimization: Reduce friction in release pipelines and automate manual operational workflows to improve development speed
Cross-Team Collaboration: Partner with product and infrastructure teams to ensure services are deployable, observable, and resilient at scale
Deployment Methodology Evolution: Implement and advance GitOps, infrastructure-as-code, and progressive delivery patterns across the organization
Automated Health Monitoring: Build systems that evaluate deployment health using metrics, logs, traces, and alerts to detect regressions and trigger rollbacks
AI-Assisted Deployment Workflows: Develop systems supporting agent-assisted or autonomous deployment processes using modern AI tooling
Platform Reliability: Maintain high availability and performance of deployment infrastructure serving hundreds of engineers
Security Integration: Implement security scanning, compliance checks, and vulnerability assessments in deployment pipelines
Documentation and Training: Create comprehensive documentation and training materials for deployment platform adoption across engineering teams

Qualifications

What we look for.

Technical

Kubernetes Expertise
5+ years experience with Kubernetes-based deployment systems at enterprise scale
Continuous Deployment Platforms
Proven experience building or operating CD platforms serving multiple development teams
GitOps Proficiency
Deep familiarity with GitOps tooling such as ArgoCD, Flux, or similar systems
Infrastructure as Code
Expert-level experience with Terraform, Ansible, or similar IaC tools for cloud infrastructure management
Programming Languages
Strong proficiency in Python, Go, or similar languages for building automation and platform tools
Cloud Platforms
Extensive experience with AWS, GCP, or Azure for large-scale infrastructure deployment
Monitoring and Observability
Experience implementing comprehensive monitoring, logging, and alerting systems for production deployments
Container Technologies
Deep understanding of Docker, container orchestration, and cloud-native architecture patterns

Education

Degree Requirement
Bachelor's degree in Computer Science, Engineering, or equivalent practical experience preferred

Experience

Platform Engineering Experience
5-8 years in platform engineering, DevOps, or site reliability engineering roles
Large-Scale Systems
Experience operating deployment systems serving 100+ engineers and thousands of deployments daily
Production Operations
Strong background in production incident response, rollback procedures, and operational safety
Cross-Functional Leadership
Experience working with diverse engineering teams to improve deployment practices and developer productivity

Skills

Required

Kubernetes
Expert-level experience with Kubernetes cluster management, networking, and large-scale deployments
Python
Advanced Python programming skills for building automation tools and platform services
GitOps
Deep understanding of GitOps principles and hands-on experience with ArgoCD or Flux
Infrastructure as Code
Proficiency with Terraform for managing cloud infrastructure and Kubernetes resources
CI/CD Pipelines
Experience designing and implementing complex continuous integration and deployment pipelines
Monitoring and Alerting
Skills in implementing comprehensive observability using Prometheus, Grafana, and similar tools
Container Orchestration
Advanced knowledge of Docker, container lifecycle management, and orchestration patterns
Cloud Platforms
Hands-on experience with AWS, GCP, or Azure for production-grade infrastructure

Preferred

Go Programming
Nice to have
Experience with Go for building high-performance infrastructure tools and Kubernetes operators
Service Mesh
Nice to have
Knowledge of Istio, Linkerd, or similar service mesh technologies for advanced traffic management
Helm Charts
Nice to have
Experience creating and maintaining Helm charts for complex application deployments
Machine Learning Operations
Nice to have
Understanding of MLOps practices and deploying machine learning models at scale
Security Tooling
Nice to have
Experience with security scanning tools, vulnerability management, and compliance automation
Incident Response
Nice to have
Background in production incident management, post-mortem processes, and reliability engineering
AI/ML Infrastructure
Nice to have
Interest or experience in AI model deployment, GPU cluster management, and ML infrastructure
Open Source Contributions
Nice to have
Active contributions to CNCF projects, Kubernetes ecosystem, or other relevant open source projects

Tech stack

Languages

PythonGoYAML

Frameworks

FastAPI

Databases

PostgreSQLRedis

Tools

KubernetesTerraformArgoCDBuildkiteFluxHelmPrometheusGrafanaIstio

Other

DockerGitOpsAWS/GCP/AzureOpenTelemetry

Compensation

Pay and benefits.

Base·USD 210,000 – 490,000

Equity·Stock options

Benefits

Equity Compensation
Significant equity package in one of the world's leading AI companies with strong growth potential
Health Insurance
Comprehensive medical, dental, and vision insurance coverage for employees and families
Mental Health Support
Access to mental health resources, counseling services, and wellness programs
Professional Development
Conference attendance, training budget, and opportunities to work with cutting-edge AI technologies
Flexible Work Arrangements
Hybrid work options with modern office facilities in San Francisco
Retirement Benefits
401(k) plan with company matching and comprehensive retirement planning resources
Parental Leave
Generous parental leave policies supporting new parents with extended time off
Technology Stipend
Equipment and technology allowances for optimal home office setup
Learning Budget
Annual budget for books, courses, certifications, and skill development in AI and engineering
Commuter Benefits
Transportation subsidies and parking allowances for San Francisco office

Process

Interview steps.

01
Initial Screening
30-minute phone call with recruiter covering background, motivation, and basic technical fit for the role
02
Technical Phone Screen
45-minute technical interview focusing on Kubernetes, CI/CD concepts, and system design fundamentals
03
System Design Interview
60-minute deep-dive into designing a large-scale continuous deployment system with focus on safety and scalability
04
Technical Deep Dive
90-minute hands-on session covering GitOps implementation, infrastructure as code, and deployment pipeline design
05
Behavioral and Culture Fit
45-minute interview focusing on collaboration, problem-solving approach, and alignment with OpenAI's mission
06
Final Round Panel
Series of conversations with team members and leadership covering technical expertise and team integration
07
Reference Checks
Verification of previous work experience and technical contributions with former colleagues and managers

Full posting

Original listing.

About the Role

The Engineering Acceleration Delivery / Continuous Deployment team builds and operates the systems that safely ship OpenAI’s infrastructure and product code to production.

We own the deployment platform, release pipelines, and rollout safety mechanisms that allow engineers across OpenAI to deploy changes rapidly while minimizing operational risk. Our mission is to make production deployments fast, safe, and increasingly autonomous.

This role sits at the intersection of developer productivity, distributed systems reliability, and large-scale infrastructure orchestration.

In This Role, You Will

Design and build continuous deployment infrastructure that safely rolls out changes across dozens of Kubernetes clusters and global regions.
Develop systems for progressive delivery, including canary releases, staged rollouts, and automated rollback.
Improve engineering velocity by reducing friction in the release pipeline and automating manual operational workflows.
Work with product and infrastructure teams to ensure their services are deployable, observable, and resilient at scale.
Implement and evolve deployment methodologies such as GitOps, infrastructure-as-code, and progressive delivery patterns.
Build systems that automatically evaluate deployment health using metrics, logs, traces, and alerts to detect regressions and trigger safe rollbacks.
Build systems that support agent-assisted or autonomous deployment workflows using modern AI tooling.

Technologies commonly used in this environment include:

Kubernetes for large-scale container orchestration and runtime infrastructure
Python and FastAPI for internal services
Terraform for infrastructure as code
GitOps-based deployment workflows (e.g., ArgoCD, Flux, or similar systems)
Buildkite for CI orchestration

You may be a strong fit if you:

Have worked with Kubernetes-based deployment systems at scale
Have experience building or operating continuous deployment platforms
Are familiar with GitOps tooling such as ArgoCD or Flux
Are excited about building AI-assisted systems and agents that intelligently shepherd software changes from commit to safe production rollout.
Care deeply about safe production rollouts and minimizing blast radius
Enjoy building internal platforms that improve developer productivity across the organization

Compensation

$230K – $490K + Offers Equity

About OpenAI

OpenAI is an AI research and deployment company dedicated to ensuring that general-purpose artificial intelligence benefits all of humanity. We push the boundaries of the capabilities of AI systems and seek to safely deploy them to the world through our products. AI is an extremely powerful tool that must be created with safety and human needs at its core, and to achieve our mission, we must encompass and value the many different perspectives, voices, and experiences that form the full spectrum of humanity.

We are an equal opportunity employer, and we do not discriminate on the basis of race, religion, color, national origin, sex, sexual orientation, age, veteran status, disability, genetic information, or other applicable legally protected characteristic.

For additional information, please see OpenAI’s Affirmative Action and Equal Employment Opportunity Policy Statement.

Background checks for applicants will be administered in accordance with applicable law, and qualified applicants with arrest or conviction records will be considered for employment consistent with those laws, including the San Francisco Fair Chance Ordinance, the Los Angeles County Fair Chance Ordinance for Employers, and the California Fair Chance Act, for US-based candidates. For unincorporated Los Angeles County workers: we reasonably believe that criminal history may have a direct, adverse and negative relationship with the following job duties, potentially resulting in the withdrawal of a conditional offer of employment: protect computer hardware entrusted to you from theft, loss or damage; return all computer hardware in your possession (including the data contained therein) upon termination of employment or end of assignment; and maintain the confidentiality of proprietary, confidential, and non-public information. In addition, job duties require access to secure and protected information technology systems and related data security obligations.

To notify OpenAI that you believe this job posting is non-compliant, please submit a report through this form. No response will be provided to inquiries unrelated to job posting compliance.

We are committed to providing reasonable accommodations to applicants with disabilities, and requests can be made via this link.

OpenAI Global Applicant Privacy Policy

At OpenAI, we believe artificial intelligence has the potential to help people solve immense global challenges, and we want the upside of AI to be widely shared. Join us in shaping the future of technology.

Interview prep

5 guides for OpenAI

Apply for this position

Redirects to OpenAI's application page.

Other roles

More at OpenAI.

View all 126 roles