Articul8

Senior Site Reliability Engineer (SRE) - (Dublin, CA)

Articul82 months ago
Location

Dublin, CA (HQ)

Type

Full Time

Salary

USD 180,000 – 240,000

Level

Senior

Role

Site Reliability Engineer

Posted

Jan 5, 2026

Full TimeSenior

The role

Summary

Articul8 AI is seeking a highly skilled Senior Site Reliability Engineer to optimize and maintain their cutting-edge Generative AI SaaS platform. The ideal candidate will leverage advanced cloud infrastructure, automation, and reliability engineering practices to ensure high-performance, scalable, and secure AI systems in a dynamic enterprise environment.

What you'll do

Infrastructure Architecture: Architect and maintain scalable, highly available infrastructure for Generative AI platform, focusing on performance and reliability.
Monitoring and Observability: Design and implement robust monitoring, alerting, and observability solutions to proactively ensure system health and performance.
Automation and Efficiency: Automate deployment, scaling, and management of cloud-native infrastructure to reduce operational toil and improve system efficiency.
Service Level Management: Define, measure, and continuously improve Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to deliver exceptional service quality.
Incident Response: Participate in on-call rotations, provide rapid incident response, conduct thorough post-mortems, and drive continuous improvement initiatives.

What we look for

Technical

Cloud PlatformsAdvanced proficiency with cloud platforms such as AWS, GCP, or Azure, with hands-on infrastructure management experience
Programming LanguagesProficiency in at least one programming/scripting language like Python, Go, or Bash for infrastructure automation and tooling
Infrastructure as CodeExperienced with infrastructure as code tools including Terraform, CloudFormation, and similar provisioning technologies
ContainerizationExpert-level knowledge of containerization technologies, including Docker and Kubernetes orchestration
Monitoring ToolsProficient with monitoring and observability tools such as Prometheus, Grafana, and ELK stack

Education

Computer Science DegreeBachelor's degree in Computer Science, Engineering, or related technical field, or equivalent practical experience

Experience

Site Reliability EngineeringMinimum 8+ years of experience in DevOps, Site Reliability Engineering, or equivalent infrastructure and reliability roles

Skills

Required skills

Cloud InfrastructureDeep understanding of cloud infrastructure design, deployment, and management
System ReliabilityProven ability to design and maintain highly available, scalable system architectures
Incident ManagementStrong incident response and problem-solving skills with ability to troubleshoot complex distributed systems

Nice to have

AI Systems ExperiencePrevious experience supporting AI/ML systems in production environments
GPU InfrastructureKnowledge of GPU infrastructure management and optimization techniques
Distributed SystemsFamiliarity with distributed systems architecture and high-performance computing principles

Compensation & benefits

Salary

USD 180,000 – 240,000 (annual)

Stock options

Available

Benefits

Health Insurance

Comprehensive medical, dental, and vision coverage for employees and dependents

Retirement Planning

401(k) with company matching to support long-term financial goals

Equity Compensation

Stock options or equity grants to provide ownership in the company's future

Professional Development

Budget for conferences, training, and continuous learning opportunities in cutting-edge AI and infrastructure technologies

Flexible Work Arrangements

Potential for hybrid or remote work options with competitive work-life balance


Interview process

  1. 1
    Initial Screening Technical resume review and initial phone/video screening with recruiting team
  2. 2
    Technical Assessment Comprehensive technical assessment focusing on SRE skills, system design, and infrastructure knowledge
  3. 3
    Technical Interviews Multiple technical interviews with SRE team members, covering system reliability, cloud infrastructure, and problem-solving scenarios
  4. 4
    Hiring Manager Interview In-depth discussion with engineering leadership about role expectations, team dynamics, and career growth opportunities
  5. 5
    Final Interview Potential on-site or virtual final interview to assess cultural fit and overall team alignment

Apply for this position

You'll be redirected to the company's application page