Senior Site Reliability Engineer

BetterUp3 months ago

Location

Austin, TX

Type

Full Time

Salary

USD 147,000 – 205,000

Level

Senior

Role

Site Reliability Engineer

Posted

Dec 17, 2025

Full TimeSenior

The role

Summary

BetterUp is seeking a Senior Site Reliability Engineer to leverage AI-powered tools and cloud technologies to transform and optimize their production infrastructure. The ideal candidate will build and manage cloud environments using AWS and Kubernetes, with a focus on automation, reliability, and continuous improvement in an innovative, AI-forward technology company.

What you'll do

Cloud Infrastructure Management: Build and operate cloud infrastructure on AWS using Terraform to codify and version-control the entire environment

Kubernetes Operations: Manage and scale Kubernetes clusters, ensuring high availability and performance of BetterUp's platform

Observability and Monitoring: Design intelligent alerting systems and implement comprehensive observability solutions to proactively monitor production systems

AI-Powered Reliability: Leverage AI tools for advanced log analysis, anomaly detection, and predictive maintenance of infrastructure

Automation and Incident Response: Develop and implement automated incident response workflows and create self-healing infrastructure solutions

What we look for

Technical

Cloud PlatformsDeep experience with AWS cloud infrastructure and services

Infrastructure as CodeAdvanced Terraform skills for managing complex, multi-environment infrastructure

ContainerizationHands-on Kubernetes experience including deployment, scaling, debugging, and securing clusters

Education

Technical DegreeBachelor's degree in Computer Science, Software Engineering, or related technical field preferred

Experience

SRE Background4+ years of experience in Site Reliability Engineering or infrastructure-focused roles

Distributed SystemsStrong debugging skills and comfort navigating complex distributed system architectures

Skills

Required skills

AWSComprehensive experience with Amazon Web Services cloud platform

KubernetesProficient in managing, deploying, and securing Kubernetes clusters

TerraformExpert-level infrastructure as code skills using Terraform

Observability ToolsFamiliarity with modern observability stacks like Datadog, Prometheus, and OpenTelemetry

Nice to have

AI IntegrationExperience using AI tools like GitHub Copilot, AI assistants, and LLM-based development tools

Incident ManagementStrong communication skills for explaining technical incidents to both technical and non-technical stakeholders

Compensation & benefits

Salary

USD 147,000 – 205,000 (annual)

Benefits

Personal Coaching

Access to BetterUp coaching for the employee and a friend/family member

Health Insurance

Comprehensive medical, dental, and vision insurance

Flexible PTO

Flexible paid time off policy

Learning Stipend

Annual learning and development stipend

Volunteer Days

5 paid volunteer days per year

Retirement Plan

401(k) with self-contribution options

Company Breaks

Company-wide Summer and Winter breaks

Interview process

1
Initial Screening — Preliminary interview to assess candidate's background and alignment with role requirements
2
Technical Interview — In-depth technical discussion exploring SRE skills, cloud infrastructure expertise, and AI integration capabilities
3
AI Capability Showcase — Opportunity for candidates to demonstrate AI tool usage and innovative problem-solving approaches