Member of Technical Staff - GPU Infrastructure

Prime Intellect3 months ago

Location

San Francisco

Type

Full Time

Salary

USD 180,000 – 250,000

Level

Senior

Role

Solutions Architect - GPU Infrastructure

Posted

Mar 6, 2026

Full TimeSenior

The role

Summary

Prime Intellect is seeking a highly skilled Member of Technical Staff to design and deploy cutting-edge GPU infrastructure for AI and machine learning workloads. The role focuses on creating robust, scalable compute solutions that enable advanced AI model training and deployment across research, startup, and enterprise environments.

What you'll do

Customer Architecture Design: Partner with clients to design optimal GPU cluster architectures, create technical proposals for clusters ranging from 100 to 10,000+ GPUs, and develop deployment strategies for LLM training, inference, and HPC workloads.

Infrastructure Deployment: Deploy and configure orchestration systems like SLURM and Kubernetes, implement high-performance networking, optimize GPU utilization, and configure parallel filesystems for maximum performance.

Production Operations: Serve as the primary technical escalation point for customer infrastructure issues, diagnose complex problems across the full technology stack, implement monitoring systems, and provide 24/7 on-call support for critical deployments.

What we look for

Technical

GPU Infrastructure ExpertiseMinimum 3+ years of hands-on experience with GPU clusters and HPC environments

Orchestration SystemsDeep expertise with SLURM and Kubernetes in production GPU settings

Networking KnowledgeProven experience with InfiniBand configuration and troubleshooting

Education

Computer ScienceBachelor's degree in Computer Science, Engineering, or related technical field preferred

Experience

Systems ProgrammingDemonstrated proficiency in Python, Bash, and systems-level programming

Infrastructure AutomationExperience with infrastructure automation tools such as Ansible and Terraform

Skills

Required skills

GPU ArchitectureStrong understanding of NVIDIA GPU architecture, CUDA ecosystem, and driver stack

Infrastructure AutomationProficiency with Ansible, Terraform, and other infrastructure automation tools

Programming LanguagesAdvanced skills in Python, Bash, and systems programming

Nice to have

Large-Scale DeploymentExperience with 1000+ GPU deployments

CertificationsNVIDIA DGX, HGX, or SuperPOD certification

Distributed TrainingKnowledge of distributed training frameworks like PyTorch FSDP, DeepSpeed, Megatron-LM

Compensation & benefits

Salary

USD 180,000 – 250,000 (annual)

Stock options

Available

Benefits

Equity Compensation

Stock options in an early-stage AI infrastructure company with significant funding

Cutting-Edge Technology

Work on pioneering AI infrastructure with potential for significant industry impact

Professional Growth

Direct collaboration with world-class engineering team and exposure to frontier AI technologies

Interview process

1
Initial Screening — Resume review and preliminary phone/video interview to assess technical background and experience
2
Technical Interview — In-depth technical discussion focusing on GPU infrastructure, HPC environments, and systems architecture
3
Systems Design Challenge — Practical assessment involving design of GPU cluster architecture and solution to complex infrastructure problem
4
Final Interview — Meeting with technical leadership to discuss cultural fit, career growth, and alignment with company mission

Apply for this position

You'll be redirected to the company's application page

More Jobs at Prime Intellect

3 other open positions

View all

Member of Technical Staff - Full Stack Software Engineer

San Francisco

Mid

Member of Technical Staff - Training Platform

San Francisco

Senior

Member of Technical Staff - Sandbox Platform

San Francisco

Senior

Prime Intellect

View all jobs

Prime Intellect is an artificial intelligence company dedicated to creating agentic AI solutions specifically designed for software engineers. The company's primary focus is on developing tools that help engineers navigate complex codebases, identify issues, and fix bugs more efficiently. Their mission is to empower developers to build and innovate at an accelerated pace, operating within the software development and AI tools market.

San Francisco, USAFounded 2023primeintellect.ai

Tech Stack

Languages

PythonBash

Frameworks

KubernetesSLURM

Databases

Not Specified

Tools

AnsibleTerraformDocker

Other

CUDAInfiniBand

Apply Now

Member of Technical Staff - GPU Infrastructure

The role

Summary

What you'll do

What we look for

Technical

Education

Experience

Skills

Required skills

Nice to have

Compensation & benefits

Benefits

Interview process

More Jobs at Prime Intellect

Prime Intellect

Tech Stack

On this page