Snowflake

Senior Software Engineer — LLM Post-Training Platform

Snowflake2 days ago
Location

US-WA-Bellevue

Type

Full Time

Salary

USD 200,000 – 287,500

Level

Senior

Role

Senior Software Engineer

Posted

Jun 11, 2026

Full TimeSenior

The role

Summary

Snowflake is seeking a Senior Software Engineer to join their ML Platform team, focusing on the Cortex Training LLM post-training platform. The ideal candidate will work on scaling distributed systems for GPU compute, designing APIs, and productionizing advanced ML infrastructure to enable enterprise-scale AI workloads.

What you'll do

Full Stack Design: Design and build across the full stack, from public training APIs and SDK to the control plane and GPU data plane
Distributed Systems Scaling: Scale multi-tenant GPU scheduling, placement, and capacity-aware routing across regional GPU pools with built-in fault tolerance
Performance Optimization: Drive end-to-end performance at scale, maintaining fast training, inference, and RL loops while keeping GPUs saturated under heavy concurrent load
Research Productionization: Partner with Snowflake Research to transform state-of-the-art training and inference techniques into reliable, scalable enterprise-ready components

What we look for

Technical

ML Systems Experience5+ years of experience building and shipping production Machine Learning systems
Distributed Systems ExpertiseStrong foundation in designing scalable, fault-tolerant services and operating them on Kubernetes in production
GPU and LLM InfrastructureProficiency with tools like PyTorch, DeepSpeed/FSDP, Ray, CUDA/NCCL, vLLM, with ability to debug across data, infrastructure, and GPU layers

Education

Minimum Educational RequirementBS in Computer Science or related field
Advanced Degree PreferenceMS or PhD is a plus

Experience

System ReliabilityDemonstrated ability to harden complex systems for reliability, throughput, and cost efficiency
LLM Post-TrainingHands-on experience with LLM post-training and modeling is highly desirable

Skills

Required skills

Distributed ComputingExpert-level understanding of distributed system design and implementation
GPU ComputingAdvanced knowledge of GPU infrastructure and parallel computing techniques
Machine Learning InfrastructureDeep expertise in building scalable ML platforms and services

Nice to have

LLM Post-TrainingAdvanced experience with large language model fine-tuning and adaptation techniques
Research TranslationAbility to convert cutting-edge research into production-ready engineering solutions

Compensation & benefits

Salary

USD 200,000 – 287,500 (annual)

Stock options

Available

Benefits

Health Insurance

Comprehensive medical, dental, and vision coverage

Stock Options

Equity compensation to align employee interests with company growth

Professional Development

Ongoing learning opportunities, conference attendance, and skill development programs

401(k) Plan

Retirement savings plan with company matching


Interview process

  1. 1
    Initial Screening Phone or video call with recruiting team to assess initial fit and background
  2. 2
    Technical Interview In-depth technical assessment focusing on distributed systems, ML infrastructure, and GPU computing expertise
  3. 3
    System Design Challenge Evaluate candidate's ability to design scalable ML platforms and solve complex distributed computing problems
  4. 4
    Team Fit Interview Discussion with potential team members to assess collaboration and alignment with Snowflake's innovative culture
  5. 5
    Final Interview Meeting with hiring manager to discuss role expectations and candidate's potential impact

Apply for this position

You'll be redirected to the company's application page