Cohere

Software Engineer, Internal Infrastructure (Europe & UK)

Cohere5 months ago
Location

United Kingdom

Workplace

Remote

Type

Full Time

Salary

GBP 80,000 – 150,000

Level

Senior

Role

Software Engineer

Posted

Oct 8, 2025

Full TimeRemoteSenior

The role

Summary

Cohere is seeking a Software Engineer for their Internal Infrastructure team in the UK, focusing on building and operating Kubernetes GPU superclusters across multiple clouds. The ideal candidate will support AI model training infrastructure, working closely with research teams to develop scalable, high-performance systems that accelerate AI model development.

What you'll do

Kubernetes Infrastructure Management: Build and operate Kubernetes compute superclusters across multiple cloud providers, ensuring optimal performance and reliability for AI workloads
Cloud Provider Collaboration: Partner with cloud providers to optimize infrastructure costs, performance, and reliability specifically for AI and machine learning workloads
Research Team Support: Work closely with AI research teams to understand and address infrastructure needs, improving stability, performance, and efficiency of novel model training techniques
System Design: Design and build resilient, scalable systems for training AI models with intuitive user interfaces that empower researchers to self-serve and troubleshoot
Team Best Practices: Encourage software engineering best practices across the company and participate in team processes including knowledge sharing, code reviews, and on-call rotations

What we look for

Technical

Kubernetes ExpertiseExtensive experience running Kubernetes clusters at scale, including Infrastructure as Code
Programming LanguagesStrong programming skills in Go or Python
Cloud Native InfrastructureProven ability to scale and troubleshoot Cloud Native infrastructure

Education

Computer Science or Related FieldBachelor's or Master's degree in Computer Science, Software Engineering, or equivalent practical experience

Experience

Infrastructure ScalingDemonstrated experience in managing large-scale distributed computing environments
Open Source ContributionPreference for contributing to existing Open Source solutions over building from scratch

Skills

Required skills

KubernetesAdvanced knowledge of Kubernetes cluster management and configuration
Cloud InfrastructureDeep understanding of multi-cloud infrastructure design and implementation
ProgrammingProficiency in Go or Python for infrastructure automation

Nice to have

ML InfrastructureExperience with machine learning training infrastructure and GPU workloads
Linux SystemsExpertise in low-level Linux system support and troubleshooting
RDMA NetworkingFamiliarity with RDMA (Remote Direct Memory Access) networking

Compensation & benefits

Salary

GBP 80,000 – 150,000 (annual)

Benefits

Health Benefits

Comprehensive health and dental coverage with additional mental health budget

Parental Leave

100% salary top-up for up to 6 months of parental leave

Personal Enrichment

Budget for arts, culture, fitness, well-being, and workspace improvements

Vacation

6 weeks (30 working days) of annual vacation

Work Flexibility

Remote-flexible work arrangement with global office locations and co-working stipend

Lunch Benefits

Weekly lunch stipend, in-office lunches and snacks


Interview process

  1. 1
    Initial Screening Phone or video call with recruiter to discuss background and role fit
  2. 2
    Technical Assessment Coding challenge or technical interview focusing on Kubernetes, infrastructure design, and programming skills
  3. 3
    Team Interview Multiple interviews with team members from the Internal Infrastructure team
  4. 4
    System Design Interview In-depth discussion of infrastructure scalability, cloud architecture, and AI workload optimization
  5. 5
    Final Executive Interview Meeting with senior leadership to assess overall cultural and strategic fit

Apply for this position

You'll be redirected to the company's application page