Software Engineer, Caching Infrastructure

OpenAI9 months ago

Location

San Francisco

Type

Full Time

Salary

USD 230,000 – 385,000

Level

Senior

Role

Backend Engineer

Posted

Jul 18, 2025

Full TimeSenior

The role

Summary

OpenAI is seeking a Senior Software Engineer to design and scale their multi-tenant caching infrastructure that powers ChatGPT, APIs, and other AI products. The role involves building high-availability distributed caching systems using Redis/Memcached and Kubernetes to support inference, identity, and product experiences across OpenAI's platform.

What you'll do

Platform Architecture: Design, build, and operate OpenAI's multi-tenant caching platform used across inference, identity, quota, and product experiences

Strategic Planning: Define the long-term vision and roadmap for caching as a core infrastructure capability, balancing performance, durability, and cost

Cross-Team Collaboration: Collaborate with infrastructure teams (networking, observability, databases) and product teams to ensure caching platform meets their needs

Performance Optimization: Optimize cache performance, minimize tail latency, and ensure high availability across diverse use cases

Scalability Engineering: Build autoscaling systems that dynamically adjust to workload demands while maintaining cost efficiency

System Monitoring: Implement comprehensive observability, monitoring, and alerting for distributed caching infrastructure

Capacity Planning: Analyze usage patterns and plan infrastructure capacity to support OpenAI's growing AI model deployment needs

Incident Response: Participate in on-call rotation and lead incident response for caching infrastructure issues

What we look for

Technical

Distributed Systems5+ years experience building and scaling distributed systems with focus on caching, load balancing, or storage

Redis/Memcached ExpertiseDeep expertise with Redis, Memcached including clustering, durability configurations, client-side connection patterns, and performance tuning

Kubernetes ProductionProduction experience with Kubernetes, service meshes (Envoy), and autoscaling systems

Performance EngineeringRigorous thinking about latency, reliability, throughput, and cost in platform design

Network ProtocolsStrong understanding of networking fundamentals, TCP/IP, load balancing, and service discovery

Education

Computer Science DegreeBachelor's or Master's degree in Computer Science, Engineering, or equivalent practical experience

Experience

Senior Engineering5+ years in senior software engineering roles with increasing responsibility

Infrastructure ScaleExperience building infrastructure serving millions of users or high-throughput AI/ML workloads

Fast-Paced EnvironmentAbility to thrive in fast-paced environment balancing pragmatic engineering with long-term technical excellence

Skills

Required skills

Redis/MemcachedExpert-level knowledge of distributed caching systems

KubernetesProduction experience with container orchestration

Distributed SystemsDeep understanding of consensus algorithms, consistency models, and fault tolerance

Performance EngineeringExperience with latency optimization and throughput scaling

Service MeshHands-on experience with Envoy, Istio, or similar technologies

Monitoring & ObservabilityProficiency with Prometheus, Grafana, and distributed tracing

Nice to have

Go ProgrammingStrong Go development skills for backend systems

RustExperience with Rust for performance-critical components

Cloud PlatformsExperience with AWS, GCP, or Azure infrastructure

AI/ML InfrastructureUnderstanding of AI model serving and inference infrastructure

Database SystemsKnowledge of PostgreSQL, ClickHouse, or other database technologies

Network ProgrammingLow-level networking and protocol implementation experience

Compensation & benefits

Salary

USD 230,000 – 385,000 (annual)

Stock options

Available

Benefits

Equity Compensation

Significant equity package in one of the world's leading AI companies

Health Insurance

Comprehensive medical, dental, and vision coverage

Learning Budget

Professional development and conference attendance budget

Flexible PTO

Unlimited paid time off policy

Parental Leave

Generous parental leave for new parents

Commuter Benefits

Transportation and parking assistance

Wellness Programs

Mental health support and wellness initiatives

AI Research Access

Early access to cutting-edge AI models and research

Interview process

1
Initial Screening — 30-minute recruiter call covering background, interest in OpenAI, and basic technical experience
2
Technical Phone Screen — 60-minute technical interview focusing on distributed systems design and caching concepts
3
System Design Interview — 90-minute session designing a large-scale caching infrastructure with Redis clustering and Kubernetes
4
Code Implementation — 75-minute coding interview implementing cache algorithms, consistency patterns, or performance optimization
5
Cross-Team Collaboration — 45-minute behavioral interview with infrastructure team focusing on collaboration and communication
6
Leadership Discussion — 60-minute final interview with engineering leadership covering vision, technical direction, and culture fit

Apply for this position

You'll be redirected to the company's application page

More Jobs at OpenAI

82 other open positions

View all

Graphics Software Engineer - Consumer Devices

San Francisco

Senior

Software Engineer, ChatGPT Infrastructure

San Francisco

Senior

Software Engineer, Engineering Acceleration | Consumer Devices

San Francisco

Senior

Full Stack Software Engineer, OpenAI Edu

San Francisco

Senior

Software Engineer, Infrastructure, Consumer Devices

San Francisco

Senior

OpenAI

View all jobs

OpenAI is an American artificial intelligence research organization developing advanced AI models like GPT. Focused on ensuring AI benefits humanity, it creates tools for natural language processing and generative AI applications.

San Francisco, California, United StatesFounded 2015openai.com

Tech Stack

Languages

GoPythonRustC++

Frameworks

gRPCProtocol Buffers

Databases

RedisMemcachedPostgreSQLClickHouse

Tools

KubernetesEnvoyTerraformPrometheusGrafanaJaegerDockerHelm

Other

Redis ClusterConsistent HashingLoad BalancersService MeshAutoscaling

Interview Guides

5 guides available for OpenAI

Apply Now

Software Engineer, Caching Infrastructure

The role

Summary

What you'll do

What we look for

Technical

Education

Experience

Skills

Required skills

Nice to have

Compensation & benefits

Benefits

Interview process

More Jobs at OpenAI

OpenAI

Tech Stack

Interview Guides

On this page