Staff Site Reliability Engineer

Replit6 months ago

Location

Remote (United States)

Type

Full Time

Salary

USD 220,000 – 325,000

Level

Staff

Role

Site Reliability Engineer

Posted

Oct 27, 2025

Full TimeStaff

The role

Summary

Replit is seeking a Staff Site Reliability Engineer to lead infrastructure reliability and scalability for their platform that serves millions of developers worldwide. This senior role involves architecting observability solutions, leading incident response, and mentoring engineering teams to embed reliability as a core value. The position requires 8-10 years of SRE experience with deep expertise in Kubernetes, distributed systems, and modern observability platforms.

What you'll do

Architect Observability Solutions: Design, build, and implement comprehensive monitoring, logging, and tracing solutions with real-time dashboards and metrics

Define Reliability Standards: Establish and track Service Level Objectives (SLOs) and Service Level Indicators (SLIs) across engineering teams

Lead Incident Management: Guide high-impact incident response, conduct blameless post-mortems, and implement preventative measures

Drive Infrastructure Automation: Build CI/CD pipelines, infrastructure as code, and self-healing systems to eliminate operational toil

Optimize Kubernetes Performance: Performance-tune large-scale cloud deployments focusing on Kubernetes, Docker, and GCP optimization

Debug Distributed Systems: Resolve complex technical problems across the stack and implement long-term architectural improvements

Provide Technical Leadership: Review system designs for reliability, scalability, security, and operational integrity across the company

Mentor Engineering Teams: Educate and guide engineers to embed reliability as a core engineering culture value at Replit

What we look for

Technical

Site Reliability Engineering8-10 years of experience in SRE, DevOps, Systems Engineering, or Infrastructure Engineering roles

Programming ProficiencyStrong coding skills in Python or Go with ability to write high-quality, well-tested production code

Distributed SystemsDeep understanding of designing, building, scaling, and maintaining production services in service-oriented architectures

Kubernetes ExpertiseExtensive experience with container orchestration platforms, specifically Kubernetes and cloud-native technologies

Observability SystemsProven track record of designing and implementing sophisticated monitoring, logging, and tracing solutions

Incident ManagementStrong incident response leadership experience for complex systems with demonstrated critical thinking under pressure

Infrastructure as CodeExperience with tools like Terraform, Pulumi, and configuration management systems

Experience

Senior Technical LeadershipExperience working with and mentoring engineers from junior to principal levels across technical teams

Stack DebuggingWillingness and ability to understand, debug, and improve any layer of the technology stack

Communication SkillsExcellent written and verbal communication with ability to explain complex technical concepts clearly

Skills

Required skills

Python/Go ProgrammingStrong programming skills in Python or Go for building production systems and internal tools

KubernetesDeep experience with container orchestration platforms, specifically Kubernetes and cloud-native technologies

Distributed SystemsExpertise in designing, building, and maintaining large-scale distributed systems and service-oriented architectures

ObservabilityProven experience designing and implementing comprehensive monitoring, logging, and tracing solutions

Incident ManagementStrong incident response leadership skills with experience managing complex system outages

Infrastructure as CodeExperience with Terraform, Pulumi, and configuration management tools

Nice to have

Google Cloud PlatformDeep experience with GCP services and cloud-native tools for large-scale deployments

Modern Observability PlatformsExpert-level knowledge of Prometheus, Grafana, Datadog, and OpenTelemetry

High-Performance SystemsExperience designing systems capable of handling high throughput and low latency requirements

Startup EnvironmentFamiliarity with rapid-growth startup environments and scaling challenges

Technical WritingExperience creating company-facing blog posts and training materials

Compensation & benefits

Salary

USD 220,000 – 325,000 (annual)

Stock options

Available

Benefits

Competitive Salary & Equity

Market-competitive compensation package with equity participation

401(k) with 4% Match

Retirement savings plan with company matching contribution up to 4%

Health Insurance

Comprehensive health, dental, vision, and life insurance coverage

Disability Coverage

Short-term and long-term disability insurance protection

Parental Leave

Paid parental, medical, and caregiver leave for family needs

Commuter Benefits

Transportation and commuting expense reimbursement

Wellness Stipend

Monthly allowance for health and wellness activities

Work From Home Setup

In-office setup reimbursement for remote work equipment

Flexible Time Off

Unlimited PTO policy with company holidays

Team Gatherings

Quarterly team building events and company gatherings

Apply for this position

You'll be redirected to the company's application page

More Jobs at Replit

30 other open positions

View all

Staff Software Engineer, Anti-Abuse & Security

Foster City, CA (Hybrid) In office M,W,F

Staff

Staff DevEx Engineer

Foster City, CA (Hybrid) In office M,W,F

Staff

Senior DevEx Engineer

Foster City, CA (Hybrid) In office M,W,F

Senior

Staff Software Engineer, Agent Platform

Foster City, CA (Hybrid) In office M,W,F

Staff

Senior Product Engineer - Product Foundry

Foster City, CA (Hybrid) In office M,W,F

Senior

Replit

View all jobs

Replit is a platform that allows developers to code in the browser.

San Francisco, California, United StatesFounded 2015replit.com

Tech Stack

Languages

PythonGo

Frameworks

OpenTelemetry

Tools

KubernetesDockerTerraformPulumiPrometheusGrafanaDatadog

Other

Google Cloud PlatformCI/CD Pipelines

Apply Now

Staff Site Reliability Engineer

The role

Summary

What you'll do

What we look for

Technical

Experience

Skills

Required skills

Nice to have

Compensation & benefits

Benefits

More Jobs at Replit

Replit

Tech Stack

On this page