OpenAI

Software Engineer, Observability

OpenAI3 weeks ago
Location

San Francisco

Type

Full Time

Salary

USD 255,000 – 405,000

Level

Senior

Role

Backend Engineer

Posted

Feb 19, 2026

Full TimeSenior

The role

Summary

OpenAI is seeking a Software Engineer for their Observability team to build large-scale monitoring and AI-powered debugging tools. This role involves developing infrastructure to handle petabytes of logs and billions of metrics while creating intelligent systems that help engineers detect and resolve issues autonomously. The position requires expertise in distributed systems, full-stack development, and experience with observability technologies.

What you'll do

Infrastructure Ownership: Own core observability infrastructure including distributed logging, time series databases, and trace storage systems handling petabytes of data
AI-Native Tool Development: Build AI-powered tools that help engineers detect, understand, and resolve production issues autonomously using machine learning techniques
UI Experience Development: Contribute to user interface experiences including dashboards, notebook-style interfaces, and interactive debugging tools for internal engineers
Cross-Functional Collaboration: Collaborate closely with engineers, researchers, user operations teams, and other departments to build next-generation observability products
System Reliability: Ensure the reliability and performance of OpenAI's production systems through comprehensive observability and monitoring solutions
Data Pipeline Management: Design and maintain data pipelines that ingest and process billions of time series metrics and massive log volumes in real-time
Problem-Solving Leadership: Take ownership of unscoped, ambiguous problems and drive them to completion in a fast-paced AI research environment

What we look for

Technical

Distributed Systems ExperienceProven experience operating large-scale distributed systems in production environments, especially logging systems or time series databases
Systems FundamentalsStrong fundamentals in systems architecture, networking protocols, and cloud infrastructure technologies
Cloud Platform ExpertiseHands-on experience with cloud platforms, particularly AWS, and container orchestration using Kubernetes
Full-Stack DevelopmentFull-stack development capabilities or strong product sensibilities for building user-facing tools and applications
Observability TechnologiesExperience with observability systems such as Prometheus, OpenTelemetry, Grafana, or similar monitoring and tracing tools

Education

Computer Science DegreeBachelor's or Master's degree in Computer Science, Software Engineering, or related technical field, or equivalent practical experience

Experience

Production Systems5+ years of experience building and maintaining production-grade distributed systems at scale
Infrastructure EngineeringDemonstrated experience in infrastructure engineering with focus on reliability, scalability, and performance optimization
Problem-Solving AbilityTrack record of thriving in ambiguous environments and successfully solving complex, unscoped technical problems

Skills

Required skills

Distributed SystemsDeep understanding of distributed system design patterns, consistency models, and scalability challenges
Infrastructure ProgrammingProficiency in systems programming languages like Go, Rust, or Python for building scalable infrastructure
Cloud ArchitectureExperience designing and implementing cloud-native architectures on AWS, GCP, or Azure
Database SystemsKnowledge of time series databases, distributed storage systems, and query optimization techniques
Monitoring and ObservabilityHands-on experience with monitoring tools, metrics collection, distributed tracing, and log aggregation

Nice to have

AI/ML IntegrationExperience integrating machine learning models into production systems for intelligent automation
Frontend DevelopmentReact, TypeScript, or similar frontend technologies for building observability dashboards and user interfaces
Open Source ContributionsContributions to observability open source projects like Prometheus, Grafana, OpenTelemetry, or similar tools
Performance OptimizationExperience with performance profiling, optimization techniques, and handling high-throughput data systems
DevOps PracticesKnowledge of CI/CD pipelines, infrastructure as code, and automated deployment strategies

Compensation & benefits

Salary

USD 255,000 – 405,000 (annual)

Stock options

Available

Benefits

Equity Compensation

Stock options and equity participation in one of the world's most valuable AI companies

Comprehensive Health Coverage

Full medical, dental, and vision insurance with premium coverage options

Flexible Time Off

Unlimited PTO policy allowing for work-life balance and personal time

Professional Development

Learning and development budget for conferences, courses, and skill advancement in AI and engineering

AI Research Access

Direct access to cutting-edge AI research and early access to OpenAI's latest models and technologies

Innovation Culture

Opportunity to shape the future of AI observability and work on problems that impact millions of users

Team Collaboration

Work alongside world-class researchers, engineers, and product teams in a collaborative environment


Interview process

  1. 1
    Initial Screening Phone or video call with talent acquisition team to discuss background, motivation, and basic technical qualifications
  2. 2
    Technical Phone Interview 45-60 minute technical discussion covering distributed systems concepts, observability challenges, and coding fundamentals
  3. 3
    System Design Interview Design session focused on building large-scale observability infrastructure, handling petabyte-scale data, and AI integration patterns
  4. 4
    Coding Interview Live coding session implementing algorithms related to data processing, time series analysis, or infrastructure problems
  5. 5
    Cross-Functional Collaboration Interview with team members from different functions (research, product, ops) to assess collaboration and communication skills
  6. 6
    Final Round Meeting with engineering leadership to discuss technical vision, career goals, and cultural fit within OpenAI's mission

Apply for this position

You'll be redirected to the company's application page