Articul8

Senior Site Reliability Engineer (SRE) - Chaos Engineering (Brazil)

Articul82 months ago
Location

Brazil/Remote

Workplace

Remote

Type

Full Time

Salary

BRL 120,000 – 180,000

Level

Senior

Role

Site Reliability Engineer

Posted

Jan 5, 2026

Full TimeRemoteSenior

The role

Summary

Articul8 AI is seeking a Senior Site Reliability Engineer specializing in chaos engineering to ensure the reliability and performance of their generative AI SaaS platform. The ideal candidate will architect scalable infrastructure, implement robust monitoring, and drive system resilience through advanced chaos engineering techniques in a cloud-native environment.

What you'll do

Infrastructure Architecture: Design and maintain scalable, highly available cloud-native infrastructure for generative AI platform
Monitoring and Observability: Implement comprehensive monitoring, alerting, and observability solutions to ensure proactive system health
Automation: Automate deployment, scaling, and management of cloud infrastructure to reduce operational overhead
Performance Optimization: Optimize infrastructure for performance, scalability, and cost-effectiveness of AI workloads
Incident Management: Lead incident response, conduct post-mortems, and drive continuous improvement initiatives
Chaos Engineering: Design and execute chaos experiments to validate system resilience and identify potential failure points

What we look for

Technical

Cloud ExpertiseComprehensive knowledge of cloud platforms and infrastructure
ContainerizationAdvanced skills in Docker and Kubernetes
ProgrammingStrong programming skills in Python, Go, or Bash

Education

Computer Science DegreeBachelor's degree in Computer Science, Engineering, or related field

Experience

SRE ExperienceMinimum 5 years in DevOps, Site Reliability Engineering, or similar roles
Chaos EngineeringProven experience with chaos engineering tools and methodologies

Skills

Required skills

Cloud PlatformsExpertise in AWS, GCP, or Azure cloud infrastructure
ProgrammingProficiency in Python, Go, or Bash scripting
Infrastructure as CodeExperience with Terraform, CloudFormation
ContainerizationAdvanced Docker and Kubernetes skills
Monitoring ToolsProficient with Prometheus, Grafana, ELK stack
Chaos EngineeringExperience with Chaos Monkey, Gremlin, resilience testing

Nice to have

AI/ML SystemsExperience supporting production AI/ML infrastructure
GPU InfrastructureKnowledge of GPU management and optimization
Database SystemsFamiliarity with SQL and NoSQL databases
Cloud CertificationsProfessional certifications in cloud platforms

Compensation & benefits

Salary

BRL 120,000 – 180,000 (annual)

Benefits

Remote Work

Fully remote work arrangement with flexible working hours

Cutting-Edge Technology

Opportunity to work with advanced generative AI technologies

Professional Growth

Continuous learning and development in AI and cloud infrastructure

Innovative Environment

Work with a forward-thinking AI company at the forefront of enterprise solutions


Interview process

  1. 1
    Initial Screening Review of resume and initial qualifications
  2. 2
    Technical Phone Screen Discussion of technical background and SRE experience
  3. 3
    Technical Interview Deep dive into infrastructure, chaos engineering, and system design
  4. 4
    Practical Assessment Hands-on technical challenge simulating real-world SRE scenarios
  5. 5
    Final Interview Meeting with engineering leadership to assess cultural fit and strategic alignment

Apply for this position

You'll be redirected to the company's application page