Senior Site Reliability Engineer

Replit1 months ago

Location

Remote - Europe

Workplace

Remote

Type

Full Time

Salary

EUR 120,000 – 180,000

Level

Senior

Role

Site Reliability Engineer

Posted

May 20, 2026

Full TimeRemoteSenior

The role

Summary

Replit is seeking a Senior Site Reliability Engineer to enhance the reliability and scalability of their global development platform. The ideal candidate will design observability solutions, drive infrastructure automation, and optimize system performance for millions of developers worldwide, working remotely within Europe.

What you'll do

Observability Design: Develop comprehensive monitoring and alerting systems using modern observability tools, creating real-time dashboards and metrics for system health and performance tracking

Infrastructure Automation: Architect and implement infrastructure automation solutions using Infrastructure as Code tools, designing CI/CD pipelines and creating self-healing systems

Service Level Management: Define and implement Service Level Objectives (SLOs) and Service Level Indicators (SLIs) in collaboration with product and engineering teams

Incident Response: Lead incident management efforts, conduct thorough post-mortems, develop runbooks, and implement improvements to reduce Mean Time To Recovery (MTTR)

Performance Optimization: Identify and resolve infrastructure performance bottlenecks, implement capacity planning strategies, and optimize resource utilization across global regions

What we look for

Technical

Programming LanguagesStrong programming skills in automation languages like Python or Go

Container OrchestrationExperience with Kubernetes and cloud-native technologies

Infrastructure as CodeProficiency in configuration management and infrastructure automation tools

Education

Computer Science/EngineeringBachelor's degree in Computer Science, Software Engineering, or equivalent practical experience

Experience

SRE Experience4-8 years of experience in Site Reliability Engineering, DevOps, Systems Engineering, or Infrastructure Engineering

Skills

Required skills

Distributed SystemsDeep understanding of distributed system architectures and challenges

Monitoring ToolsProven track record implementing and maintaining observability solutions

Incident ManagementStrong skills in leading incident response and resolution

Nice to have

Cloud PlatformExperience with Google Cloud Platform (GCP) services and tools

Observability PlatformsKnowledge of platforms like Prometheus, Grafana, or Datadog

Compensation & benefits

Salary

EUR 120,000 – 180,000 (annual)

Stock options

Available

Benefits

Competitive Compensation

Attractive salary with equity compensation package

Health Insurance

Comprehensive health, dental, vision, and life insurance coverage

Leave Policies

Paid parental, medical, and caregiver leave

Flexible Work

Remote work arrangement with flexible time off and holidays

Wellness Benefits

Monthly wellness stipend and autonomous work environment

Interview process

1
Initial Screening — HR review of application and background
2
Technical Phone Screen — Discussion of technical experience and SRE background with hiring manager
3
Technical Interview — In-depth technical assessment of SRE skills, system design, and problem-solving abilities
4
Systems Design Challenge — Practical assessment of infrastructure design and reliability engineering capabilities
5
Final Interview — Meeting with team members and leadership to assess cultural fit and team alignment