Prime Intellect

Member of Technical Staff - GPU Infrastructure

Prime Intellect3 weeks ago
Location

San Francisco

Type

Full Time

Salary

USD 180,000 – 250,000

Level

Senior

Role

Solutions Architect - GPU Infrastructure

Posted

Mar 6, 2026

Full TimeSenior

The role

Summary

Prime Intellect is seeking a highly skilled Member of Technical Staff to design and deploy cutting-edge GPU infrastructure for AI and machine learning workloads. The role focuses on creating robust, scalable compute solutions that enable advanced AI model training and deployment across research, startup, and enterprise environments.

What you'll do

Customer Architecture Design: Partner with clients to design optimal GPU cluster architectures, create technical proposals for clusters ranging from 100 to 10,000+ GPUs, and develop deployment strategies for LLM training, inference, and HPC workloads.
Infrastructure Deployment: Deploy and configure orchestration systems like SLURM and Kubernetes, implement high-performance networking, optimize GPU utilization, and configure parallel filesystems for maximum performance.
Production Operations: Serve as the primary technical escalation point for customer infrastructure issues, diagnose complex problems across the full technology stack, implement monitoring systems, and provide 24/7 on-call support for critical deployments.

What we look for

Technical

GPU Infrastructure ExpertiseMinimum 3+ years of hands-on experience with GPU clusters and HPC environments
Orchestration SystemsDeep expertise with SLURM and Kubernetes in production GPU settings
Networking KnowledgeProven experience with InfiniBand configuration and troubleshooting

Education

Computer ScienceBachelor's degree in Computer Science, Engineering, or related technical field preferred

Experience

Systems ProgrammingDemonstrated proficiency in Python, Bash, and systems-level programming
Infrastructure AutomationExperience with infrastructure automation tools such as Ansible and Terraform

Skills

Required skills

GPU ArchitectureStrong understanding of NVIDIA GPU architecture, CUDA ecosystem, and driver stack
Infrastructure AutomationProficiency with Ansible, Terraform, and other infrastructure automation tools
Programming LanguagesAdvanced skills in Python, Bash, and systems programming

Nice to have

Large-Scale DeploymentExperience with 1000+ GPU deployments
CertificationsNVIDIA DGX, HGX, or SuperPOD certification
Distributed TrainingKnowledge of distributed training frameworks like PyTorch FSDP, DeepSpeed, Megatron-LM

Compensation & benefits

Salary

USD 180,000 – 250,000 (annual)

Stock options

Available

Benefits

Equity Compensation

Stock options in an early-stage AI infrastructure company with significant funding

Cutting-Edge Technology

Work on pioneering AI infrastructure with potential for significant industry impact

Professional Growth

Direct collaboration with world-class engineering team and exposure to frontier AI technologies


Interview process

  1. 1
    Initial Screening Resume review and preliminary phone/video interview to assess technical background and experience
  2. 2
    Technical Interview In-depth technical discussion focusing on GPU infrastructure, HPC environments, and systems architecture
  3. 3
    Systems Design Challenge Practical assessment involving design of GPU cluster architecture and solution to complex infrastructure problem
  4. 4
    Final Interview Meeting with technical leadership to discuss cultural fit, career growth, and alignment with company mission

Apply for this position

You'll be redirected to the company's application page