Wealthsimple Technologies

Staff Software Developer, Production Engineering

Location

Remote (Canada)

Workplace

Remote

Type

Full Time

Salary

CAD 180,000 – 240,000

Level

Staff

Role

Staff Software Developer

Posted

Jun 19, 2026

Full TimeRemoteStaff

The role

Summary

As a Staff Software Developer in Production Engineering at Wealthsimple, you'll drive reliability and operational excellence across Canada's leading fintech platform serving 4 million+ users. This newly created role combines technical leadership with cross-team influence, focusing on preventing incidents, building AI-assisted incident response tooling, and establishing platform-wide reliability standards. You'll work at the intersection of platform and product engineering, translating complex distributed systems problems into scalable solutions while shaping how engineering practices evolve across the organization.

What you'll do

Incident Prevention and Platform Guardrails: Design and drive adoption of engineering standards, sensible defaults, and architectural guardrails that reduce failure likelihood across distributed services. Work proactively to identify and eliminate single points of failure in critical financial flows serving millions of users.
AI-Assisted Incident Response Tooling: Build and enhance in-house incident response products leveraging artificial intelligence to reduce time to mitigation. Contribute to the development of intelligent tooling that accelerates root cause analysis and automated remediation across the platform.
Load Testing and Capacity Planning: Own comprehensive load test investigations on critical financial flows, translating findings into concrete reliability improvements. Establish capacity planning practices that prevent performance degradation under peak usage scenarios affecting millions of transactions.
Cross-Team Technical Leadership: Serve as a technical influencer across platform and product engineering teams without direct authority. Participate in architecture reviews, readiness assessments, and coaching sessions to drive adoption of scalable reliability practices and operational standards.
Failure Pattern Analysis and Platform Solutions: Identify recurring failure patterns across services and design platform-level fixes that prevent similar issues from emerging in other parts of the infrastructure. Transform individual incident learnings into systematic improvements benefiting the entire engineering organization.
Reliability Synchronization and Strategic Planning: Contribute to regular reliability syncs with product engineering teams, aligning on incident themes, critical-flow risk assessment, and prioritization of highest-leverage reliability initiatives. Communicate findings to both technical teams and senior leadership.

What we look for

Technical

Backend Systems and Distributed ArchitectureDeep proficiency in backend systems design, service-oriented architecture, and distributed systems patterns. Ability to diagnose complex failure modes, understand trade-offs in scalability, and design resilient systems handling high transaction volumes.
Container Orchestration and DeploymentStrong familiarity with Kubernetes for container orchestration, Helm for package management, and Argo for deployment automation. Experience with modern GitOps practices and continuous deployment pipelines in production environments.
Incident Response and ObservabilityExpertise in incident management processes, root cause analysis methodologies, and observability practices. Proficiency with monitoring systems, distributed tracing, and metrics collection to maintain visibility across complex infrastructure.
Scalable System DesignDemonstrated experience designing systems for high availability, handling millions of transactions, and maintaining sub-second latency requirements critical to financial applications.

Education

Computer Science or Related FieldBachelor's degree in Computer Science, Software Engineering, or equivalent professional experience demonstrating deep systems knowledge and strong fundamentals in distributed computing.

Experience

8+ Years Software Engineering ExperienceMinimum eight years of professional software engineering experience with substantial focus on platform architecture, infrastructure systems, or Site Reliability Engineering (SRE) practices in production environments.
Proven Reliability and Scale Track RecordDemonstrated history of improving system reliability at scale, including reducing incident frequency, designing guardrails, implementing operational standards across multiple engineering teams, and managing systems serving millions of concurrent users.
Distributed Systems and Load Testing ExpertiseHands-on experience with load testing frameworks, capacity planning methodologies, and the ability to diagnose complex failure modes across distributed service meshes. Strong background translating performance findings into engineering improvements.
Cross-Team Influence Without AuthorityProven ability to drive adoption of engineering standards and practices across multiple teams while operating as a technical influencer without direct managerial authority. Track record of building credibility through quality thinking and clear recommendations.

Skills

Required skills

Distributed Systems DesignAbility to architect and diagnose issues in complex distributed systems with multiple failure modes. Experience with service mesh technologies, load balancing, and eventual consistency patterns.
Backend Development ProficiencyStrong programming skills in one or more backend languages, with deep understanding of performance optimization, concurrency patterns, and memory management in production systems.
Infrastructure as CodeProficiency with Infrastructure as Code principles and tools, particularly Terraform, CloudFormation, or similar technologies for managing production infrastructure reliably and repeatably.
System Reliability EngineeringCore SRE practices including defining Service Level Indicators (SLIs), Service Level Objectives (SLOs), error budgets, and building systems with measurable reliability targets aligned to business requirements.
Technical CommunicationExcellent written and verbal communication skills with ability to present complex technical findings, recommendations, and architectural proposals to both engineering teams and non-technical senior leadership.

Nice to have

AI-Assisted Tooling ExperienceExperience building or working with artificial intelligence and machine learning tools for incident detection, root cause analysis, or anomaly detection in monitoring systems. Curiosity about emerging AI applications in reliability engineering.
Financial Systems ExperienceBackground working on payment systems, financial transactions, or fintech platforms where correctness, auditability, and regulatory compliance directly impact millions of users.
Open Source ContributionsActive involvement with open-source infrastructure projects such as Kubernetes, Prometheus, Grafana, or similar technologies. Demonstrates commitment to community-driven reliability innovation.
Performance Tuning and OptimizationProven experience identifying and eliminating performance bottlenecks in large-scale systems, optimizing database queries, and improving system latency under high transaction loads.

Compensation & benefits

Salary

CAD 180,000 – 240,000 (annual)

Benefits

Comprehensive Health and Life Insurance

Top-tier health benefits covering medical, dental, and vision care, plus life insurance protection for financial security and peace of mind.

Retirement Savings with Employer Match

Long-term group savings program through Wealthsimple for Business with employer contribution matching, helping you build wealth alongside your salary.

Generous Vacation and Wellness Time

20 vacation days annually plus 4 dedicated wellness days, combined with unlimited sick and mental health days to support work-life balance and employee wellbeing.

Global Work Flexibility

Ability to work outside Canada for up to 90 days per calendar year, enabling remote work flexibility and the opportunity to maintain client relationships or attend conferences globally.

Employee Resource Groups and Community

Active employee resource groups including Rainbow for 2SLGBTQ employees, Women of Wealthsimple, and Black at Wealthsimple, fostering inclusive community and professional development opportunities.

Hybrid Work Environment

Flexible hybrid work arrangement with access to offices across North America. Over 1,500 employees collaborate across multiple locations, combining remote flexibility with in-person collaboration.


Interview process

  1. 1
    Initial Application Review Your resume, portfolio, and background will be evaluated against requirements. Wealthsimple may use AI-assisted tools to support this initial screening, though all decisions involve human judgment.
  2. 2
    Technical Screening Conversation An initial conversation with a team member to discuss your experience with distributed systems, production reliability challenges, and your approach to system design at scale.
  3. 3
    Deep Technical Interview Comprehensive technical discussion covering system design scenarios, incident investigation methodologies, and your experience improving reliability across large teams. Expect questions about handling failure modes in distributed services.
  4. 4
    Architecture and Design Review Collaborative discussion on how you'd approach specific reliability challenges at Wealthsimple. You may review existing system designs or propose solutions to platform problems.
  5. 5
    Stakeholder and Team Alignment Conversations with platform engineering leadership and product team representatives to assess technical influence capability, communication style, and alignment on reliability priorities.
  6. 6
    Final Executive Discussion Potential conversation with senior engineering or product leadership to discuss vision for production reliability, strategic technical direction, and organizational impact of the role.

Apply for this position

You'll be redirected to the company's application page