Replit

Site Reliability Engineer

Replit14 months ago
Location

Remote (United States)

Type

Full Time

Salary

USD 160,000 – 250,000

Level

Mid

Role

Site Reliability Engineer

Posted

Mar 6, 2025

Full TimeMid

The role

Summary

Replit is seeking a Site Reliability Engineer to ensure the reliability, scalability, and performance of their platform serving millions of developers worldwide. The role involves implementing observability solutions, automation, and infrastructure as code while establishing SLOs and driving incident management for a cloud-native development environment.

What you'll do

Design Observability Solutions: Develop comprehensive monitoring and alerting systems using modern observability tools, create real-time dashboards, and implement logging strategies
Infrastructure Automation: Architect and implement infrastructure as code solutions using Terraform, Ansible, or Pulumi while maintaining CI/CD pipelines
Establish SLOs and SLIs: Work with product and engineering teams to define Service Level Objectives and Indicators, building systems to track reliability metrics
Incident Management: Lead incident response efforts, conduct post-mortems, develop runbooks, and build tools to reduce Mean Time To Recovery
Performance Optimization: Identify and resolve infrastructure bottlenecks, implement capacity planning, and optimize system efficiency across global regions
Build Self-Healing Systems: Create automated systems that can respond to common failure scenarios and reduce manual intervention

What we look for

Technical

Programming LanguagesStrong skills in Python, Go, or similar automation languages
Distributed SystemsDeep understanding of distributed system architecture and patterns
Container OrchestrationExperience with Kubernetes and cloud-native technologies
Infrastructure as CodeProven experience with IaC and configuration management tools
Monitoring SolutionsTrack record of implementing and maintaining observability platforms

Experience

SRE Experience4-8 years in Site Reliability Engineering, DevOps, Systems Engineering, or Infrastructure Engineering
Incident ManagementStrong experience leading incident response and post-mortem processes

Skills

Required skills

Site Reliability EngineeringCore SRE practices including monitoring, automation, and reliability engineering
Python/Go ProgrammingStrong programming skills for automation and tooling development
KubernetesContainer orchestration and cloud-native application management
Infrastructure as CodeExperience with Terraform, Ansible, or similar IaC tools
Incident ResponseLeading incident management and post-mortem processes

Nice to have

Google Cloud PlatformExperience with GCP services and cloud infrastructure
Observability PlatformsKnowledge of Prometheus, Grafana, Datadog, and modern monitoring tools
Problem-SolvingSystematic approach to complex operational challenges
CommunicationAbility to explain technical concepts to diverse audiences

Compensation & benefits

Salary

USD 160,000 – 250,000 (annual)

Stock options

Available

Benefits

Competitive Compensation

Competitive salary and equity package

Retirement Benefits

401(k) program with 4% company match

Health Insurance

Comprehensive health, dental, vision, and life insurance coverage

Disability Coverage

Short-term and long-term disability insurance

Parental Leave

Paid parental, medical, and caregiver leave policies

Commuter Benefits

Transportation and commuting expense assistance

Wellness Stipend

Monthly wellness allowance for health and fitness

Remote Work Setup

Home office setup reimbursement and autonomous work environment

Flexible Time Off

Unlimited PTO policy plus company holidays

Team Events

Quarterly team gatherings and office amenities


Interview process

  1. 1
    Initial Screening Phone or video call with recruiter to discuss background and role fit
  2. 2
    Technical Assessment Systems design and SRE-focused technical evaluation
  3. 3
    Technical Interview Deep dive into infrastructure, automation, and incident management experience
  4. 4
    Team Interview Cultural fit assessment and collaboration discussion with team members
  5. 5
    Final Interview Leadership interview focusing on values alignment and career goals

Apply for this position

You'll be redirected to the company's application page