ChatGPT Performance Engineer

OpenAI1 months ago

Location

San Francisco

Type

Full Time

Salary

USD 325,000 – 405,000

Level

Senior

Role

Performance Engineer

Posted

Apr 15, 2026

Full TimeSenior

The role

Summary

OpenAI is seeking an experienced Performance Engineer to optimize the performance, reliability, and efficiency of mission-critical systems powering ChatGPT and the OpenAI API. This highly technical individual contributor role requires deep expertise in systems optimization, performance profiling, and distributed systems scaling. You'll work cross-functionally to drive latency, throughput, and cost-efficiency improvements across the entire technology stack, from GPU utilization to networking and application runtime optimization.

What you'll do

Multi-Layer Performance Analysis and Optimization: Conduct comprehensive performance profiling and optimization across application, middleware, runtime, and infrastructure layers including networking, storage, Python runtime, GPU utilization, and beyond. Utilize advanced tracing and observability tools to identify bottlenecks and implement systematic improvements that directly impact system efficiency and user experience.

Observability and Instrumentation Development: Design and implement sophisticated tooling and metrics systems that provide deep observability into system performance across distributed environments. Create dashboards, logging frameworks, and monitoring solutions that enable real-time visibility into critical performance indicators and emerging degradation patterns.

Cross-Functional Collaboration and Architecture Influence: Partner closely with infrastructure, platform, training, and product teams to identify key performance goals, define SLAs/SLOs, and drive systemic improvements. Influence critical architecture and design decisions at scale to prioritize latency, throughput, and efficiency from inception rather than as post-hoc optimizations.

Production Performance Investigation and Resolution: Lead root-cause analysis investigations into high-impact performance regressions, scalability issues, and production incidents affecting mission-critical services like ChatGPT and the OpenAI API. Develop remediation strategies and implement fixes that prevent recurrence while documenting findings for organizational learning.

Performance Testing and SLA/SLO Definition: Design and execute comprehensive performance testing strategies for critical systems operating at scale. Establish and maintain Service Level Agreements (SLAs) and Service Level Objectives (SLOs) around latency and throughput, ensuring all stakeholders have clear performance expectations and measurement frameworks.

Scalability and Efficiency Optimization: Apply deep technical expertise to push latency, throughput, and cost-efficiency to the next level across mission-critical products. Identify and eliminate performance bottlenecks that prevent optimal resource utilization and system scaling, with particular focus on high-impact opportunities in large-scale distributed systems.

What we look for

Technical

Performance Profiling and Tracing Tools ExpertiseAdvanced proficiency with performance profiling tools, distributed tracing systems, and APM solutions. Demonstrated ability to identify performance bottlenecks and interpret profiling output across multiple platforms and architectures.

Multi-Layer Stack Optimization ExperienceProven experience optimizing performance across one or more layers including database query optimization, network protocols and latency reduction, storage IO patterns, application runtime tuning, garbage collection configuration, Python/Golang internals, CUDA optimization, and GPU utilization maximization.

Operating Systems and Systems-Level UnderstandingStrong foundational knowledge of OS internals, CPU scheduling, context switching, memory management hierarchy (cache, virtual memory, NUMA), and IO patterns. Ability to analyze system behavior at the kernel level and optimize for efficient resource utilization.

Distributed Systems and Infrastructure KnowledgeDeep understanding of distributed systems architecture, including load balancing, service mesh patterns, replication strategies, and consistency tradeoffs. Experience with large-scale infrastructure systems and the performance implications of distributed design decisions.

Observability Infrastructure ImplementationExperience building or contributing to observability systems at scale, including metrics collection, distributed tracing, logging aggregation, and performance dashboards. Familiarity with open-source and commercial observability platforms.

Education

Computer Science or Related FieldBachelor's degree in Computer Science, Computer Engineering, or related technical field. Advanced degrees in systems, networks, or performance engineering are a plus but equivalent professional experience may substitute.

Experience

7+ Years Software Engineering ExperienceMinimum 7 years of professional software engineering experience with demonstrated expertise in performance optimization, reliability engineering, or systems engineering roles at technology companies with significant scale requirements.

High-Scale Distributed Systems Track RecordProven track record optimizing performance and reliability of high-scale distributed systems handling significant traffic volume or computational load. Experience navigating ambiguity and aligning multiple stakeholders around performance goals and competing priorities.

Benchmark and Performance Testing BackgroundDemonstrated success contributing to benchmarking frameworks, performance testing infrastructure, or performance-focused optimization initiatives at scale. Experience establishing baselines and measuring improvements quantitatively.

Skills

Required skills

Performance ProfilingExpertise with flame graphs, sampling profilers, and continuous profiling tools to identify hot paths and CPU bottlenecks in production systems.

Distributed TracingProficiency with distributed tracing systems to understand request flows, latency attribution, and dependency analysis across microservices.

Python Performance OptimizationDeep knowledge of Python runtime internals, GIL implications, memory profiling, and optimization techniques specific to Python-based systems.

Systems-Level DebuggingAbility to use kernel-level tools (strace, perf, BPF) to investigate system behavior and identify performance bottlenecks at the OS level.

Infrastructure and NetworkingUnderstanding of networking protocols, TCP/UDP optimization, DNS resolution impacts, and infrastructure-level performance considerations.

Load and Stress TestingExperience designing and executing load tests, stress tests, and chaos engineering experiments to validate system behavior under real-world conditions.

SQL Query OptimizationProficiency with database query analysis, index optimization, and query execution plan interpretation for relational databases.

Cross-Functional CommunicationAbility to communicate complex technical findings to non-technical stakeholders and align teams around performance goals and tradeoffs.

Nice to have

GPU Performance OptimizationExperience optimizing CUDA-based systems or working with GPU workloads, understanding memory transfers, kernel execution, and efficiency improvements.

Compiled Language PerformanceBackground with Go, Rust, or C++ performance optimization, including knowledge of compiler flags, SIMD optimization, and low-level tuning.

Machine Learning Systems KnowledgeUnderstanding of ML inference and training systems, including model serving frameworks, throughput optimization, and latency reduction in ML pipelines.

Open-Source Observability ContributionsContributions to open-source observability tools such as Prometheus, Grafana, Jaeger, or similar platforms demonstrating commitment to instrumentation excellence.

Kubernetes and Container OrchestrationExperience optimizing performance of containerized systems and Kubernetes deployments, including resource allocation and scheduling efficiency.

Large Language Model OptimizationFamiliarity with LLM serving infrastructure, inference optimization techniques, token processing efficiency, and real-time API performance considerations.

Compensation & benefits

Salary

USD 325,000 – 405,000 (annual)

Stock options

Available

Benefits

Equity Compensation

Competitive stock options package aligned with company performance and your individual contributions, providing meaningful ownership and upside participation as OpenAI advances its mission.

Comprehensive Health Coverage

Extensive medical, dental, and vision insurance plans covering employees and dependents with low or no out-of-pocket costs for preventative care and specialist services.

Retirement Planning

401(k) plan with company matching contributions to support long-term financial security and retirement savings with tax-advantaged growth opportunities.

Unlimited PTO

Flexible paid time off policy recognizing the importance of work-life balance, wellbeing, and personal recovery time to maintain high performance and job satisfaction.

Professional Development and Learning

Budgets and support for technical conferences, specialized training, certification programs, and continuous learning opportunities to stay current with advancing technologies and industry practices.

Mental Health and Wellness Support

Access to mental health services, counseling, wellness programs, and employee assistance programs supporting overall wellbeing and resilience in demanding technical roles.

Parental Leave

Generous parental leave policies supporting new parents with extended paid leave and flexible return-to-work arrangements.

Home Office Equipment

Support for remote work infrastructure including equipment stipends, ergonomic setups, and technology allowances to create productive work environments.

Commuter and Relocation Benefits

Flexible commuting support in San Francisco area including transit passes or parking benefits, plus relocation assistance for candidates relocating to the Bay Area.

Interview process

1
Initial Technical Screening — Brief conversation with recruiting team to validate background, discuss your performance engineering experience, and assess cultural fit with OpenAI's mission-driven approach. This stage confirms baseline qualifications and enthusiasm for the role.
2
Technical Deep-Dive Interview — Comprehensive discussion with engineering leaders covering your hands-on experience with performance profiling tools, specific optimization projects you've led, and your approach to root-cause analysis. Expect questions about architecture trade-offs, instrumentation strategies, and real-world performance challenges you've solved.
3
Systems Design and Problem-Solving — Technical interview focused on your ability to reason about large-scale systems, identify performance bottlenecks, and propose optimization strategies. You may be presented with performance scenarios, latency issues, or scaling challenges to demonstrate your analytical approach and systems thinking.
4
Cross-Functional Collaboration Discussion — Conversation with members of the infrastructure, platform, and product teams to assess your ability to navigate ambiguity, communicate with diverse stakeholders, and influence architectural decisions. OpenAI values collaborative problem-solvers who can align teams around performance goals.
5
Leadership and Impact Assessment — Discussion with senior engineering leadership evaluating your track record of driving measurable business impact through performance improvements. Expect conversations about prioritization under constraints, how you've navigated competing stakeholder interests, and your philosophy on technical rigor and simplicity.
6
Offer and Onboarding Discussion — Final conversation covering compensation, benefits, equity details, and technical onboarding expectations. OpenAI will discuss team structure, current performance priorities, and how you'll ramp up on their systems and culture.

Apply for this position

You'll be redirected to the company's application page

More Jobs at OpenAI

81 other open positions

View all

Software Engineer, Monetization ML Infrastructure

San Francisco

Senior

Software Engineer, Cyber Frontier

San Francisco

Senior

Software Engineer, Ad Formats

San Francisco

Senior

Software Engineer, Ads Manager

Seattle

Senior

Software Engineer, Core Science

San Francisco

Senior

OpenAI

View all jobs

OpenAI is an American artificial intelligence research organization developing advanced AI models like GPT. Focused on ensuring AI benefits humanity, it creates tools for natural language processing and generative AI applications.

San Francisco, California, United StatesFounded 2015openai.com

Tech Stack

Languages

PythonGoC/C++

Frameworks

PyTorch or TensorFlowvLLM or Similar Serving Frameworks

Databases

PostgreSQLRedisTime-Series Databases

Tools

Flame Graphs and PerfDistributed Tracing PlatformsPrometheus and GrafanaBPF and eBPFLoad Testing FrameworksKubernetes

Other

GPU Architecture and CUDAMicroservices Architecture PatternsContinuous Integration and Deployment

Interview Guides

5 guides available for OpenAI

Apply Now

ChatGPT Performance Engineer

The role

Summary

What you'll do

What we look for

Technical

Education

Experience

Skills

Required skills

Nice to have

Compensation & benefits

Benefits

Interview process

More Jobs at OpenAI

OpenAI

Tech Stack

Interview Guides

On this page