Senior Software Engineer, Accelerated Delivery

Snowflake2 days ago

Location

US-CA-Menlo Park

Type

Full Time

Salary

USD 200,000 – 287,500

Level

Senior

Role

Backend Engineer

Posted

Jul 1, 2026

Full TimeSenior

The role

Summary

Join Snowflake's Release Engineering team as a Senior Software Engineer to design and build large-scale continuous deployment infrastructure for multi-cloud production environments. This role combines platform engineering, distributed systems reliability, and DevOps expertise to create safe, scalable, and efficient software delivery systems augmented by AI-driven automation. You'll partner with engineering teams across Snowflake to eliminate deployment friction, implement progressive delivery patterns, and build self-service developer tooling while maintaining operational safety at global scale.

What you'll do

Design and Build Continuous Deployment Infrastructure: Design and architect continuous deployment and rollout infrastructure capable of safely shipping changes across Snowflake's large-scale, multi-cloud production environment. Focus on creating systems that provide reliability, observability, and auditability for every production deployment, supporting rapid iteration while minimizing operational risk.

Implement Progressive Delivery Capabilities: Build and evolve platform capabilities for progressive delivery patterns including staged rollouts, canary deployments, automated health checks, intelligent rollback controls, and blast radius minimization guardrails. Design mechanisms that enable teams to validate changes safely in production before full rollout.

Eliminate Release Pipeline Friction: Improve engineering velocity by identifying and removing friction from release pipelines, replacing error-prone manual workflows with durable, maintainable platform abstractions, robust automation, and self-documenting processes that reduce cognitive load on engineering teams.

Build Internal Release Orchestration Platforms: Develop large-scale release orchestration platforms supporting application rollouts on Kubernetes, production change workflows, and cross-service coordination. Create abstractions that allow teams to adopt consistent deployment patterns without requiring deep release engineering expertise.

Develop Observability and Health Evaluation Systems: Build systems that evaluate rollout health using metrics, logs, alerts, and operational signals from observability platforms like Prometheus, Datadog, or Grafana. Implement automated regression detection and safe mitigation paths that trigger intelligent rollback decisions with minimal blast radius.

Partner with Product and Infrastructure Teams: Collaborate with product and infrastructure teams to design platform capabilities that make services easier to deploy, validate, observe, and operate. Drive adoption of release best practices through well-designed developer experiences and self-service tooling that reduces operational toil.

Implement Advanced Deployment Methodologies: Research, design, and implement deployment methodologies including GitOps-inspired workflows, infrastructure-as-code practices, policy-driven automation, and progressive delivery patterns tailored to Snowflake's multi-cloud environment and operational requirements.

Build AI-Assisted and Autonomous Workflows: Design and implement AI-assisted, agentic-driven, and increasingly autonomous release workflows that enhance rollout intelligence, improve developer productivity, and strengthen deployment safety through intelligent automation and predictive analysis.

Create Self-Service Developer Tooling: Develop self-service developer tools and platforms that enable teams across Snowflake to adopt safe deployment patterns, manage their own rollouts, and participate in deployment decisions without requiring deep release engineering expertise or tribal knowledge.

Build Automation and Operational Guardrails: Design and implement guardrails and automation systems that reduce operational toil, make production change workflows more consistent and resilient, enforce best practices automatically, and ensure the right operational path remains the easiest path.

What we look for

Technical

Continuous Deployment Platform ExperienceProven experience building or operating continuous deployment, release engineering, or production change platforms at scale, with demonstrated expertise in managing safe deployments across large, complex distributed systems.

Kubernetes and Container OrchestrationStrong hands-on experience with Kubernetes-based systems and understanding of how to safely orchestrate and validate changes across distributed production environments, including knowledge of Helm, service meshes, and deployment strategies.

Systems Programming LanguagesStrong software engineering skills in at least one systems language such as Golang, Java, or C++, with ability to build performant, reliable infrastructure components and understand low-level system behavior.

Scripting and AutomationProficiency in scripting languages such as Python or Bash for infrastructure automation, creating runbooks, implementing CI/CD pipeline logic, and building operational tools that improve deployment efficiency.

Distributed Systems DesignDeep understanding of distributed systems principles including consistency models, failure modes, eventual consistency, replication strategies, and architectural patterns that apply to large-scale deployment infrastructure.

Infrastructure Automation and IaCExperience with infrastructure automation tools, infrastructure-as-code practices, configuration management, and managing large-scale deployments across multiple cloud providers and on-premises environments.

CI/CD Pipeline DesignExpertise in designing, building, and maintaining sophisticated CI/CD pipelines that support automated testing, building, and deployment processes for large-scale systems with high uptime requirements.

Multi-Cloud InfrastructureExperience working with multi-cloud environments and understanding how to abstract away cloud-specific details while leveraging platform-specific optimizations and managed services effectively.

Observability and MonitoringHands-on experience with observability platforms such as Prometheus, Datadog, Grafana, or similar tools for metrics collection, distributed tracing, log aggregation, and building data-driven deployment decisions.

Safe Production Rollout PracticesDeep commitment to and proven experience implementing safe production change practices including blast radius minimization, automated rollback mechanisms, canary deployments, and progressive delivery patterns.

Education

Computer Science or Related FieldBachelor's degree in Computer Science, Computer Engineering, or related discipline, or equivalent professional experience demonstrating strong foundational knowledge of distributed systems, algorithms, and software architecture principles.

Experience

Large-Scale Systems Operation5+ years of experience operating or building systems at significant scale, dealing with distributed system challenges, failure scenarios, and infrastructure decisions affecting high-availability production environments serving thousands of users.

Platform Engineering LeadershipDemonstrated ability to design platform abstractions that improve developer productivity across large engineering organizations, with experience translating complex operational requirements into elegant, maintainable systems.

DevOps and Release EngineeringSubstantial experience in DevOps practices, release engineering methodologies, or site reliability engineering roles with focus on improving deployment safety, reducing operational overhead, and scaling delivery capabilities.

Open Source Infrastructure ProjectsFamiliarity with open-source infrastructure and deployment projects such as Kubernetes, GitOps tools, progressive delivery platforms, or similar systems that demonstrate engagement with modern infrastructure challenges.

AI and Automation in OperationsExperience or demonstrated interest in applying AI, machine learning, and intelligent automation to operational workflows, deployment decisions, and autonomous systems that improve efficiency and reduce human error.

Skills

Required skills

Golang or JavaStrong proficiency in Golang or Java for building reliable, high-performance infrastructure and platform components that handle complex deployment orchestration and system reliability challenges.

Python or BashProficiency in Python or Bash scripting for infrastructure automation, operational tooling, CI/CD pipeline logic, and building the glue that connects various deployment system components.

KubernetesSubstantial hands-on experience with Kubernetes architecture, API resources, deployment strategies, and operational best practices for managing containerized applications at scale in production.

Distributed Systems FundamentalsSolid understanding of distributed systems concepts including consensus algorithms, eventual consistency, failure handling, state management, and architectural tradeoffs relevant to deployment infrastructure.

CI/CD PlatformsExperience with CI/CD platforms and tools such as GitHub Actions, GitLab CI, Jenkins, or similar systems for building automated delivery pipelines that support safe production deployments.

Infrastructure ObservabilityAbility to instrument systems with metrics, logs, and traces, and use observability platforms to understand system behavior, diagnose issues, and make data-driven deployment decisions.

Production Incident ManagementDemonstrated experience handling production incidents, performing root cause analysis, designing safeguards to prevent recurrence, and improving system resilience based on operational learnings.

Nice to have

Canary Deployment PatternsExperience designing or implementing canary deployment systems, progressive delivery frameworks, or similar mechanisms that reduce risk during production rollouts.

GitOps WorkflowsFamiliarity with GitOps principles and tools like Flux or ArgoCD that use git repositories as the single source of truth for infrastructure and deployment configurations.

Service Mesh TechnologiesExperience with service mesh technologies such as Istio, Linkerd, or Envoy that provide traffic management, observability, and security capabilities for distributed microservice systems.

Infrastructure as Code ToolsExpertise with IaC tools such as Terraform, Pulumi, CloudFormation, or similar platforms for declaratively managing cloud infrastructure and deployment configurations at scale.

Prometheus and GrafanaHands-on experience with Prometheus for metrics collection and Grafana for visualization, or similar observability stacks for building comprehensive deployment health dashboards and alerting systems.

Policy as CodeFamiliarity with policy-as-code frameworks such as OPA/Rego, Kyverno, or similar tools that enforce deployment rules, security policies, and operational guardrails automatically.

ML and Anomaly DetectionInterest or experience in applying machine learning or anomaly detection techniques to operational data for early regression detection, predictive rollout health analysis, or autonomous deployment decisions.

Multi-Cloud OrchestrationExperience managing deployments across multiple cloud providers (AWS, GCP, Azure) or hybrid cloud environments with understanding of cloud-agnostic abstractions and cross-cloud tooling.

Agentic AI SystemsExperience or demonstrated understanding of agentic AI systems, autonomous workflows, or AI-driven operational decision making that could enhance release engineering automation.

Database Systems KnowledgeUnderstanding of data warehousing, analytics platforms, or distributed database systems that would aid in appreciating Snowflake's product and deployment requirements.