Software Engineer, Reliability

Cursor3 days ago

Location

San Francisco

Type

Full Time

Salary

USD 180,000 – 250,000

Level

Senior

Role

Site Reliability Engineer

Posted

Mar 23, 2026

Full TimeSenior

The role

Summary

Cursor is seeking a highly skilled Software Engineer specializing in Reliability to enhance system performance and resilience across their AI software development platform. The ideal candidate will drive end-to-end reliability improvements, build robust guardrails, and create innovative solutions that maintain high system stability while enabling rapid engineering velocity.

What you'll do

Reliability Engineering: Own end-to-end reliability work from user-facing symptoms to root cause identification across services, infrastructure, and vendor dependencies

Resilience Patterns: Design and implement robust fallback mechanisms, routing strategies, and degraded-mode designs for upstream dependency failures

Observability Enhancement: Improve system observability through advanced metrics, logging, tracing, and client telemetry to enable rapid issue diagnosis

Operational Automation: Reduce operational overhead through strategic automation and development of sophisticated tooling

Cross-Team Collaboration: Partner with product and infrastructure teams to drive high-impact reliability improvements and technical outcomes

What we look for

Technical

Programming LanguagesExpert-level experience in Go, Node/TypeScript, or Python

Cloud InfrastructureDeep practical knowledge of AWS, Kubernetes, and/or ECS deployment patterns

Observability SystemsComprehensive experience with metrics, logs, traces, and alerting practices

Education

DegreeBachelor's degree in Computer Science, Software Engineering, or related technical field preferred

Experience

Production System ReliabilityProven track record of owning reliability for complex production systems, including incident response and long-term engineering solutions

Cross-Functional LeadershipDemonstrated ability to lead through influence, align teams, and drive technical changes across multiple codebases

Skills

Required skills

System ReliabilityExpertise in developing and maintaining high-performance, stable software systems

Incident ManagementStrong capabilities in diagnosing and resolving complex system issues

Nice to have

Multi-Region ArchitectureExperience with global distribution strategies and multi-region system design

NetworkingKnowledge of advanced networking protocols and long-lived connection workloads

Compensation & benefits

Salary

USD 180,000 – 250,000 (annual)

Stock options

Available

Benefits

Innovative Work Environment

Opportunity to work in a flat, talent-dense organization focused on cutting-edge AI technology

Professional Growth

Chance to work across multiple technology layers and solve complex reliability challenges

Interview process

1
Initial Screening — Preliminary evaluation of candidate's fit and background
2
Technical Interviews — 2-3 short technical interviews to assess engineering capabilities
3
Onsite Interview — In-office interview involving a small project, team discussions, and comprehensive evaluation