Confluent

Staff Software Engineer I - SRE

Confluent1 months ago
Location

IN Remote India

Type

Full Time

Salary

USD 180,000 – 250,000

Level

Staff

Role

Staff Software Engineer

Posted

May 27, 2026

Full TimeStaff

The role

Summary

Confluent is seeking a Staff Software Engineer specializing in Site Reliability Engineering (SRE) to drive proactive reliability improvements for their cloud-native data streaming platform. The ideal candidate will combine deep technical expertise in multi-cloud environments with strategic incident management and team enablement, focusing on preventing and mitigating large-scale system failures.

What you'll do

Proactive Reliability Engineering: Analyze systemic failure patterns, design preventative improvements, define SLO/SLA frameworks, and build tooling to reduce incident response overhead
Incident Management Program: Own incident response standards, serve as Incident Commander, develop training programs, and coach teams through post-mortems
Customer Root Cause Analysis: Edit and review customer-facing incident documents, ensure technical accuracy, and drive clear communication of incident details and prevention strategies
Cross-Team Leadership: Partner with engineering leaders to elevate reliability practices and provide expert guidance across the organization

What we look for

Technical

Cloud PlatformsExpertise in at least one of AWS, GCP, or Azure; multi-cloud experience preferred
Incident Management ToolsProficiency with tools like Rootly, PagerDuty, and similar platforms
Distributed SystemsDeep understanding of distributed systems, failure modes, and event streaming (Kafka expertise preferred)

Education

Technical DegreeBachelor's or Master's degree in Computer Science, Software Engineering, or related technical field preferred

Experience

SRE Experience10+ years in Site Reliability Engineering, incident management, or reliability engineering
Large Organization ExperienceProven track record in managing reliability programs in 500+ engineer organizations

Skills

Required skills

ObservabilityAdvanced skills in metrics, logging, and tracing for complex system diagnostics
KubernetesExperience with container orchestration and infrastructure management
CI/CDDeep understanding of continuous integration and deployment pipelines

Nice to have

AI-Assisted WorkflowsExperience with modern AI tools for documentation and incident analysis
Event StreamingExpertise in Apache Kafka or similar event streaming technologies

Compensation & benefits

Salary

USD 180,000 – 250,000 (annual)

Benefits

Global Team

Part of a follow-the-sun coverage model with sustainable working hours

Professional Development

Opportunities to drive org-wide process improvements and lead technical initiatives

Inclusive Culture

Commitment to diversity, equity, and belonging across the organization


Interview process

  1. 1
    Initial Screening HR phone screen to assess basic qualifications and cultural fit
  2. 2
    Technical Interview Deep dive into SRE expertise, system design, and reliability engineering concepts
  3. 3
    On-Call Scenario Assessment Evaluation of incident management and problem-solving skills through realistic scenarios
  4. 4
    Final Leadership Interview Meeting with senior engineering leaders to discuss cross-team collaboration and strategic thinking

Apply for this position

You'll be redirected to the company's application page