Senior ML Platform Engineer, AI Platform

Airwallex5 months ago

Location

SG - Singapore

Type

Full Time

Level

Senior

Role

ML Engineer

Posted

Nov 19, 2025

Full TimeSenior

The role

Summary

Airwallex is seeking a Senior ML Platform Engineer to build next-generation machine learning infrastructure for their new AI team. The role involves designing and maintaining MLOps platforms using Kubernetes and cloud services, implementing CI/CD/CT pipelines, and building high-performance model serving infrastructure. The ideal candidate has 5+ years of backend development experience with 2+ years focused on AI/ML platforms, expertise in Python and distributed systems, and experience with MLOps practices including automated deployment pipelines and production lifecycle management.

What you'll do

Platform Development: Design, build, and maintain end-to-end MLOps platform using Kubernetes and cloud services

Infrastructure as Code: Use Terraform to manage, provision, and scale ML-related infrastructure securely and efficiently

Pipeline Automation: Implement and optimize CI/CD/CT pipelines for model training, testing, packaging, and deployment using Argo and Kubeflow Pipelines

Model Serving Infrastructure: Build highly available, low-latency, and high-throughput model serving infrastructure

Observability Implementation: Implement robust monitoring, alerting, and logging solutions to track infrastructure health, model performance, and data/model drift

ML Tooling Support: Evaluate, integrate, and support ML tools such as Feature Stores and distributed model training pipelines

Security & Compliance: Ensure platform security, implement RBAC, and manage secrets for sensitive data and production environments

Cross-functional Collaboration: Work closely with Data Scientists and ML Engineers to understand needs and provide technical guidance on scaling best practices

LLM Platform Development: Contribute to the evolution of unified AI Platform covering both traditional ML and growing LLM capabilities

Performance Optimization: Optimize model serving solutions for low-latency, high-throughput production environments

What we look for

Technical

Backend Development5+ years experience in backend software development

MLOps Expertise2+ years focused on AI/ML Platform or MLOps infrastructure

Model ServingProven experience designing and implementing low-latency model serving solutions

Python ProficiencyStrong programming skills in Python for ML platform development

Distributed SystemsExperience in design and development of large-scale distributed, high concurrency, low-latency systems

Code QualityAbility to write high-quality, maintainable code

Production MLOpsDeep expertise in MLOps practices including automated deployment pipelines and production lifecycle management

Education

Bachelor's DegreeRelevant degree in Computer Science, Mathematics or related technical fields

Experience

Communication SkillsExcellent communication and mentoring abilities for cross-functional collaboration

Infrastructure ManagementExperience with cloud infrastructure and Kubernetes for ML workloads

LLM OptimizationPreferred: Working knowledge of LLM serving optimization and GPU resource management

Distributed ComputingPreferred: Familiarity with distributed compute/training frameworks like Ray and Spark

Skills

Required skills

PythonPrimary programming language for MLOps development

KubernetesContainer orchestration for ML platform infrastructure

TerraformInfrastructure as Code for ML infrastructure management

MLOps PracticesAutomated deployment pipelines and production lifecycle management

Distributed SystemsLarge-scale, high-performance system design and implementation

CI/CD/CT PipelinesContinuous integration, delivery, and training automation

Model ServingLow-latency, high-throughput model deployment solutions

Nice to have

RayDistributed computing framework for ML workloads

Apache SparkBig data processing and distributed computing

KubeflowML workflow orchestration on Kubernetes

vLLMLLM serving optimization

Triton Inference ServerAI model serving platform

GPU ManagementResource optimization for ML training and inference

Feature StoresML feature management and serving

Cloud PlatformsAWS, GCP, or Azure for ML infrastructure

Compensation & benefits

Benefits

Global Team

Work with over 2,000 innovative people across 26 offices worldwide

Career Growth

Accelerated learning and true ownership in a high-growth fintech environment

Cutting-edge Technology

Work on next-generation AI and ML platforms with modern tech stack

Impact-driven Work

Build solutions that serve over 200,000 businesses globally including major brands

Innovation Culture

Join a brand-new AI team driving innovation in financial technology

Equal Opportunity

Inclusive workplace that values diversity and provides equal opportunities

Interview process

1
Application Review — Initial screening of resume and technical background focusing on MLOps experience
2
Technical Phone Screen — Discussion of MLOps concepts, system design, and Python programming skills
3
System Design Interview — Design ML platform architecture, discussing scalability, performance, and reliability
4
Coding Interview — Python coding assessment with focus on infrastructure automation and ML pipeline development
5
Technical Deep Dive — In-depth discussion of previous MLOps projects, Kubernetes experience, and platform engineering
6
Team Fit Interview — Cultural fit assessment and discussion with potential team members and stakeholders
7
Final Interview — Leadership interview focusing on collaboration skills, mentoring abilities, and long-term vision