Staff Software Engineer I - SRE

Confluent1 months ago

Location

IN Remote India

Type

Full Time

Salary

USD 180,000 – 250,000

Level

Staff

Role

Staff Software Engineer

Posted

May 27, 2026

Full TimeStaff

The role

Summary

Confluent is seeking a Staff Software Engineer specializing in Site Reliability Engineering (SRE) to drive proactive reliability improvements for their cloud-native data streaming platform. The ideal candidate will combine deep technical expertise in multi-cloud environments with strategic incident management and team enablement, focusing on preventing and mitigating large-scale system failures.

What you'll do

Proactive Reliability Engineering: Analyze systemic failure patterns, design preventative improvements, define SLO/SLA frameworks, and build tooling to reduce incident response overhead

Incident Management Program: Own incident response standards, serve as Incident Commander, develop training programs, and coach teams through post-mortems

Customer Root Cause Analysis: Edit and review customer-facing incident documents, ensure technical accuracy, and drive clear communication of incident details and prevention strategies

Cross-Team Leadership: Partner with engineering leaders to elevate reliability practices and provide expert guidance across the organization

What we look for

Technical

Cloud PlatformsExpertise in at least one of AWS, GCP, or Azure; multi-cloud experience preferred

Incident Management ToolsProficiency with tools like Rootly, PagerDuty, and similar platforms

Distributed SystemsDeep understanding of distributed systems, failure modes, and event streaming (Kafka expertise preferred)

Education

Technical DegreeBachelor's or Master's degree in Computer Science, Software Engineering, or related technical field preferred

Experience

SRE Experience10+ years in Site Reliability Engineering, incident management, or reliability engineering

Large Organization ExperienceProven track record in managing reliability programs in 500+ engineer organizations

Skills

Required skills

ObservabilityAdvanced skills in metrics, logging, and tracing for complex system diagnostics

KubernetesExperience with container orchestration and infrastructure management

CI/CDDeep understanding of continuous integration and deployment pipelines

Nice to have

AI-Assisted WorkflowsExperience with modern AI tools for documentation and incident analysis

Event StreamingExpertise in Apache Kafka or similar event streaming technologies

Compensation & benefits

Salary

USD 180,000 – 250,000 (annual)

Benefits

Global Team

Part of a follow-the-sun coverage model with sustainable working hours

Professional Development

Opportunities to drive org-wide process improvements and lead technical initiatives

Inclusive Culture

Commitment to diversity, equity, and belonging across the organization

Interview process

1
Initial Screening — HR phone screen to assess basic qualifications and cultural fit
2
Technical Interview — Deep dive into SRE expertise, system design, and reliability engineering concepts
3
On-Call Scenario Assessment — Evaluation of incident management and problem-solving skills through realistic scenarios
4
Final Leadership Interview — Meeting with senior engineering leaders to discuss cross-team collaboration and strategic thinking