Software Engineer, Data Infrastructure

Cursor3 months ago

Location

SF / NY

Type

Full Time

Salary

USD 180,000 – 280,000

Level

Senior

Role

Backend Engineer

Posted

Jan 18, 2026

Full TimeSenior

The role

Summary

Cursor is seeking a Software Engineer for Data Infrastructure to own the full lifecycle of data systems that power their AI-powered coding tool. This role involves building and maintaining scalable data pipelines, telemetry systems, and storage infrastructure that supports daily product releases and model improvements. The ideal candidate has deep experience with Spark, distributed systems, and production data infrastructure at scale.

What you'll do

Data Pipeline Architecture: Design and implement scalable data pipelines that process telemetry, prompts, completions, and agent run data from daily product releases

System Migration and Redesign: Evaluate existing systems and determine when to patch versus rebuild, shipping replacements while maintaining operational continuity

Privacy-Compliant Data Handling: Implement data retention and usage policies that respect Privacy Mode settings and organizational configurations

Performance Optimization: Debug and resolve performance issues across client instrumentation, streaming systems, storage layers, and model-facing workflows

Schema Evolution Management: Design and implement schema validation and evolution strategies to prevent silent degradation across multiple data consumers

Cost Management: Implement data retention policies, compression strategies, and storage optimization to control infrastructure costs

Instrumentation Gap Resolution: Identify and resolve telemetry gaps, implement monitoring contracts, and build dashboards for early detection of issues

Cross-Team Collaboration: Work closely with product and model teams to understand data requirements and prioritize infrastructure improvements by business impact

What we look for

Technical

Apache Spark ExpertiseDeep production experience with Spark (Databricks or open-source), including optimization and troubleshooting at scale

Distributed Systems ExperienceHands-on ownership of large-scale data pipelines and storage systems with proven scalability track record

Ray Data Production ExperienceReal-world experience deploying and managing Ray Data for distributed data processing workloads

Performance Debugging SkillsAbility to diagnose and resolve performance issues across multiple layers including compute, storage, networking, and application code

Data Modeling ExpertiseStrong understanding of data modeling principles, schema design, and long-term system maintainability

Education

Computer Science DegreeBachelor's degree in Computer Science, Engineering, or equivalent practical experience in software development

Experience

Large-Scale Systems5+ years building and operating production data infrastructure systems at significant scale

Data Pipeline OwnershipEnd-to-end ownership experience including design, implementation, deployment, and ongoing operations

System Architecture DecisionsProven track record of making sound technical decisions about when to refactor versus rebuild existing systems

Skills

Required skills

Apache SparkExpert-level proficiency in Spark for large-scale data processing, including performance tuning and optimization

Ray DataProduction experience with Ray Data framework for distributed ML data processing workflows

Python/ScalaStrong programming skills in languages commonly used for data infrastructure development

Distributed SystemsDeep understanding of distributed computing principles, consistency models, and fault tolerance

Data Pipeline DesignAbility to architect robust, scalable data pipelines with proper error handling and monitoring

Performance DebuggingSystematic approach to identifying and resolving performance bottlenecks across complex distributed systems

Nice to have

ClickHouseExperience running or scaling ClickHouse for real-time analytics workloads

dbtFamiliarity with dbt for data transformation and analytics engineering workflows

DagsterExperience with Dagster or similar orchestration tools for data pipeline management

Cloud PlatformsExperience with AWS, GCP, or Azure for cloud-based data infrastructure deployment

KubernetesContainer orchestration experience for deploying and managing data processing workloads

Compensation & benefits

Salary

USD 180,000 – 280,000 (annual)

Stock options

Available

Benefits

In-Person Work Environment

Cozy offices in North Beach, San Francisco and Manhattan, New York with well-stocked libraries

Equity Package

Competitive equity compensation as part of total compensation package

Professional Development

Work with cutting-edge AI technology and contribute to groundbreaking automation tools

Collaborative Culture

Small, talent-dense team with flat organization structure encouraging spirited debate and creative problem-solving

Office Amenities

Well-appointed offices with comprehensive libraries and comfortable work environments

Interview process

1
Initial Screen — Application review and initial fit assessment based on technical background and experience
2
Technical Interviews — 2-3 short technical interviews focusing on data infrastructure, distributed systems, and problem-solving approach
3
Onsite Interview — Full-day onsite visit including hands-on project work, technical discussions, and team meetings to assess cultural fit and technical depth