Senior Site Reliability Engineer (SRE) - Chaos Engineering (Brazil)

Articul82 months ago

Location

Brazil/Remote

Workplace

Remote

Type

Full Time

Salary

BRL 120,000 – 180,000

Level

Senior

Role

Site Reliability Engineer

Posted

Jan 5, 2026

Full TimeRemoteSenior

The role

Summary

Articul8 AI is seeking a Senior Site Reliability Engineer specializing in chaos engineering to ensure the reliability and performance of their generative AI SaaS platform. The ideal candidate will architect scalable infrastructure, implement robust monitoring, and drive system resilience through advanced chaos engineering techniques in a cloud-native environment.

What you'll do

Infrastructure Architecture: Design and maintain scalable, highly available cloud-native infrastructure for generative AI platform

Monitoring and Observability: Implement comprehensive monitoring, alerting, and observability solutions to ensure proactive system health

Automation: Automate deployment, scaling, and management of cloud infrastructure to reduce operational overhead

Performance Optimization: Optimize infrastructure for performance, scalability, and cost-effectiveness of AI workloads

Incident Management: Lead incident response, conduct post-mortems, and drive continuous improvement initiatives

Chaos Engineering: Design and execute chaos experiments to validate system resilience and identify potential failure points

What we look for

Technical

Cloud ExpertiseComprehensive knowledge of cloud platforms and infrastructure

ContainerizationAdvanced skills in Docker and Kubernetes

ProgrammingStrong programming skills in Python, Go, or Bash

Education

Computer Science DegreeBachelor's degree in Computer Science, Engineering, or related field

Experience

SRE ExperienceMinimum 5 years in DevOps, Site Reliability Engineering, or similar roles

Chaos EngineeringProven experience with chaos engineering tools and methodologies

Skills

Required skills

Cloud PlatformsExpertise in AWS, GCP, or Azure cloud infrastructure

ProgrammingProficiency in Python, Go, or Bash scripting

Infrastructure as CodeExperience with Terraform, CloudFormation

ContainerizationAdvanced Docker and Kubernetes skills

Monitoring ToolsProficient with Prometheus, Grafana, ELK stack

Chaos EngineeringExperience with Chaos Monkey, Gremlin, resilience testing

Nice to have

AI/ML SystemsExperience supporting production AI/ML infrastructure

GPU InfrastructureKnowledge of GPU management and optimization

Database SystemsFamiliarity with SQL and NoSQL databases

Cloud CertificationsProfessional certifications in cloud platforms

Compensation & benefits

Salary

BRL 120,000 – 180,000 (annual)

Benefits

Remote Work

Fully remote work arrangement with flexible working hours

Cutting-Edge Technology

Opportunity to work with advanced generative AI technologies

Professional Growth

Continuous learning and development in AI and cloud infrastructure

Innovative Environment

Work with a forward-thinking AI company at the forefront of enterprise solutions

Interview process

1
Initial Screening — Review of resume and initial qualifications
2
Technical Phone Screen — Discussion of technical background and SRE experience
3
Technical Interview — Deep dive into infrastructure, chaos engineering, and system design
4
Practical Assessment — Hands-on technical challenge simulating real-world SRE scenarios
5
Final Interview — Meeting with engineering leadership to assess cultural fit and strategic alignment

Apply for this position

You'll be redirected to the company's application page

More Jobs at Articul8

8 other open positions

View all

Staff/Senior Software Engineer (Backend) - (Dublin, CA)

Dublin, CA (HQ)

Senior

Backend Engineer - (Python) Brazil

Brazil/RemoteRemote

Senior

Infrastructure Engineer (Brazil)

Brazil/RemoteRemote

Senior

Infrastructure Engineer - (Dublin, CA)

Dublin, CA (HQ)

Senior

Senior Site Reliability Engineer (SRE) - (Dublin, CA)

Dublin, CA (HQ)

Senior

Articul8

View all jobs

Articul8 is an artificial intelligence company that specializes in enterprise communication and collaboration solutions. The company develops AI-powered platforms designed to enhance workplace productivity and employee engagement through intelligent automation, content generation, and communication tools. Articul8 operates in the enterprise software market, serving organizations that seek to streamline their internal communications and leverage AI to improve operational efficiency.

articul8.ai

Tech Stack

Languages

PythonGoBash

Frameworks

KubernetesTerraform

Databases

ElasticsearchPrometheus

Tools

DockerGrafanaChaos Monkey

Other

CI/CDGenerative AI

Apply Now

Senior Site Reliability Engineer (SRE) - Chaos Engineering (Brazil)

The role

Summary

What you'll do

What we look for

Technical

Education

Experience

Skills

Required skills

Nice to have

Compensation & benefits

Benefits

Interview process

More Jobs at Articul8

Articul8

Tech Stack

On this page