Cohere

Software Engineer, Internal Infrastructure (North America)

Cohere5 months ago
Location

Toronto

Type

Full Time

Salary

USD 130,000 – 220,000

Level

Senior

Role

Software Engineer

Posted

Oct 8, 2025

Full TimeSenior

The role

Summary

Cohere is seeking a Software Engineer for its Internal Infrastructure team in Toronto, focusing on building and operating Kubernetes GPU superclusters that support cutting-edge AI model development. The role involves creating scalable, resilient infrastructure systems that empower AI researchers and accelerate the development of frontier AI models across multiple cloud environments.

What you'll do

Kubernetes Infrastructure: Build and operate Kubernetes compute superclusters across multiple cloud providers
Cloud Optimization: Partner with cloud providers to optimize infrastructure costs, performance, and reliability for AI workloads
Research Support: Collaborate with research teams to understand and improve infrastructure for novel model training techniques
System Design: Design resilient, scalable systems for AI model training with intuitive user interfaces
Team Collaboration: Encourage software best practices, participate in knowledge sharing, code reviews, and on-call rotations

What we look for

Technical

Kubernetes ExpertiseProven experience managing Kubernetes clusters at enterprise scale
Programming LanguagesProficiency in Go or Python
Cloud InfrastructureExperience with multi-cloud infrastructure deployment and management

Education

Computer ScienceBachelor's or Master's degree in Computer Science, Software Engineering, or related technical field preferred

Experience

Infrastructure Engineering3-7 years of experience in cloud infrastructure, distributed systems, or related technical domains
Open Source ContributionDemonstrated preference for contributing to and leveraging open-source solutions

Skills

Required skills

KubernetesDeep experience running Kubernetes clusters at scale
Cloud Native InfrastructureExpertise in scaling and troubleshooting cloud infrastructure
ProgrammingStrong programming skills in Go or Python
Infrastructure as CodeExperience with infrastructure deployment and management

Nice to have

ML InfrastructurePrevious experience with machine learning training infrastructure
GPU WorkloadsFamiliarity with GPU computing and RDMA networking
Linux SystemsExpertise in low-level Linux system support and troubleshooting

Compensation & benefits

Salary

USD 130,000 – 220,000 (annual)

Stock options

Available

Benefits

Health Benefits

Comprehensive health and dental coverage with additional mental health budget

Parental Leave

100% salary top-up for up to 6 months of parental leave

Vacation

6 weeks (30 working days) of annual vacation

Lunch Stipend

Weekly lunch stipend and in-office meals

Personal Enrichment

Budget for arts, culture, fitness, and workspace improvement

Work Flexibility

Remote-flexible with co-working stipend and offices in multiple global locations


Interview process

  1. 1
    Initial Screening Review of resume and initial qualifications
  2. 2
    Technical Phone Screen Preliminary interview focusing on technical background and experience
  3. 3
    Technical Interviews Multiple rounds of in-depth technical interviews assessing infrastructure and cloud computing expertise
  4. 4
    System Design Challenge Evaluation of candidate's ability to design scalable, resilient infrastructure systems
  5. 5
    Final Interview Meeting with team leadership to assess cultural fit and long-term potential

Apply for this position

You'll be redirected to the company's application page