Senior Site Reliability Engineer (SRE) - (Dublin, CA)

Articul82 months ago

Location

Dublin, CA (HQ)

Type

Full Time

Salary

USD 180,000 – 240,000

Level

Senior

Role

Site Reliability Engineer

Posted

Jan 5, 2026

Full TimeSenior

The role

Summary

Articul8 AI is seeking a highly skilled Senior Site Reliability Engineer to optimize and maintain their cutting-edge Generative AI SaaS platform. The ideal candidate will leverage advanced cloud infrastructure, automation, and reliability engineering practices to ensure high-performance, scalable, and secure AI systems in a dynamic enterprise environment.

What you'll do

Infrastructure Architecture: Architect and maintain scalable, highly available infrastructure for Generative AI platform, focusing on performance and reliability.

Monitoring and Observability: Design and implement robust monitoring, alerting, and observability solutions to proactively ensure system health and performance.

Automation and Efficiency: Automate deployment, scaling, and management of cloud-native infrastructure to reduce operational toil and improve system efficiency.

Service Level Management: Define, measure, and continuously improve Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to deliver exceptional service quality.

Incident Response: Participate in on-call rotations, provide rapid incident response, conduct thorough post-mortems, and drive continuous improvement initiatives.

What we look for

Technical

Cloud PlatformsAdvanced proficiency with cloud platforms such as AWS, GCP, or Azure, with hands-on infrastructure management experience

Programming LanguagesProficiency in at least one programming/scripting language like Python, Go, or Bash for infrastructure automation and tooling

Infrastructure as CodeExperienced with infrastructure as code tools including Terraform, CloudFormation, and similar provisioning technologies

ContainerizationExpert-level knowledge of containerization technologies, including Docker and Kubernetes orchestration

Monitoring ToolsProficient with monitoring and observability tools such as Prometheus, Grafana, and ELK stack

Education

Computer Science DegreeBachelor's degree in Computer Science, Engineering, or related technical field, or equivalent practical experience

Experience

Site Reliability EngineeringMinimum 8+ years of experience in DevOps, Site Reliability Engineering, or equivalent infrastructure and reliability roles

Skills

Required skills

Cloud InfrastructureDeep understanding of cloud infrastructure design, deployment, and management

System ReliabilityProven ability to design and maintain highly available, scalable system architectures

Incident ManagementStrong incident response and problem-solving skills with ability to troubleshoot complex distributed systems

Nice to have

AI Systems ExperiencePrevious experience supporting AI/ML systems in production environments

GPU InfrastructureKnowledge of GPU infrastructure management and optimization techniques

Distributed SystemsFamiliarity with distributed systems architecture and high-performance computing principles

Compensation & benefits

Salary

USD 180,000 – 240,000 (annual)

Stock options

Available

Benefits

Health Insurance

Comprehensive medical, dental, and vision coverage for employees and dependents

Retirement Planning

401(k) with company matching to support long-term financial goals

Equity Compensation

Stock options or equity grants to provide ownership in the company's future

Professional Development

Budget for conferences, training, and continuous learning opportunities in cutting-edge AI and infrastructure technologies

Flexible Work Arrangements

Potential for hybrid or remote work options with competitive work-life balance

Interview process

1
Initial Screening — Technical resume review and initial phone/video screening with recruiting team
2
Technical Assessment — Comprehensive technical assessment focusing on SRE skills, system design, and infrastructure knowledge
3
Technical Interviews — Multiple technical interviews with SRE team members, covering system reliability, cloud infrastructure, and problem-solving scenarios
4
Hiring Manager Interview — In-depth discussion with engineering leadership about role expectations, team dynamics, and career growth opportunities
5
Final Interview — Potential on-site or virtual final interview to assess cultural fit and overall team alignment

Apply for this position

You'll be redirected to the company's application page

More Jobs at Articul8

8 other open positions

View all

Staff/Senior Software Engineer (Backend) - (Dublin, CA)

Dublin, CA (HQ)

Senior

Backend Engineer - (Python) Brazil

Brazil/RemoteRemote

Senior

Infrastructure Engineer (Brazil)

Brazil/RemoteRemote

Senior

Infrastructure Engineer - (Dublin, CA)

Dublin, CA (HQ)

Senior

Senior Site Reliability Engineer (SRE) - Chaos Engineering (Brazil)

Brazil/RemoteRemote

Senior

Articul8

View all jobs

Articul8 is an artificial intelligence company that specializes in enterprise communication and collaboration solutions. The company develops AI-powered platforms designed to enhance workplace productivity and employee engagement through intelligent automation, content generation, and communication tools. Articul8 operates in the enterprise software market, serving organizations that seek to streamline their internal communications and leverage AI to improve operational efficiency.

articul8.ai

Tech Stack

Languages

PythonGoBash

Frameworks

Kubernetes

Databases

SQL DatabasesNoSQL Databases

Tools

TerraformPrometheusGrafana

Other

CI/CD

Apply Now

Senior Site Reliability Engineer (SRE) - (Dublin, CA)

The role

Summary

What you'll do

What we look for

Technical

Education

Experience

Skills

Required skills

Nice to have

Compensation & benefits

Benefits

Interview process

More Jobs at Articul8

Articul8

Tech Stack

On this page