Cursor

Software Engineer, Data Infrastructure

Cursor1 months ago
Location

SF / NY

Type

Full Time

Salary

USD 180,000 – 280,000

Level

Senior

Role

Backend Engineer

Posted

Jan 18, 2026

Full TimeSenior

The role

Summary

Cursor is seeking a Software Engineer for Data Infrastructure to own the full lifecycle of data systems that power their AI-powered coding tool. This role involves building and maintaining scalable data pipelines, telemetry systems, and storage infrastructure that supports daily product releases and model improvements. The ideal candidate has deep experience with Spark, distributed systems, and production data infrastructure at scale.

What you'll do

Data Pipeline Architecture: Design and implement scalable data pipelines that process telemetry, prompts, completions, and agent run data from daily product releases
System Migration and Redesign: Evaluate existing systems and determine when to patch versus rebuild, shipping replacements while maintaining operational continuity
Privacy-Compliant Data Handling: Implement data retention and usage policies that respect Privacy Mode settings and organizational configurations
Performance Optimization: Debug and resolve performance issues across client instrumentation, streaming systems, storage layers, and model-facing workflows
Schema Evolution Management: Design and implement schema validation and evolution strategies to prevent silent degradation across multiple data consumers
Cost Management: Implement data retention policies, compression strategies, and storage optimization to control infrastructure costs
Instrumentation Gap Resolution: Identify and resolve telemetry gaps, implement monitoring contracts, and build dashboards for early detection of issues
Cross-Team Collaboration: Work closely with product and model teams to understand data requirements and prioritize infrastructure improvements by business impact

What we look for

Technical

Apache Spark ExpertiseDeep production experience with Spark (Databricks or open-source), including optimization and troubleshooting at scale
Distributed Systems ExperienceHands-on ownership of large-scale data pipelines and storage systems with proven scalability track record
Ray Data Production ExperienceReal-world experience deploying and managing Ray Data for distributed data processing workloads
Performance Debugging SkillsAbility to diagnose and resolve performance issues across multiple layers including compute, storage, networking, and application code
Data Modeling ExpertiseStrong understanding of data modeling principles, schema design, and long-term system maintainability

Education

Computer Science DegreeBachelor's degree in Computer Science, Engineering, or equivalent practical experience in software development

Experience

Large-Scale Systems5+ years building and operating production data infrastructure systems at significant scale
Data Pipeline OwnershipEnd-to-end ownership experience including design, implementation, deployment, and ongoing operations
System Architecture DecisionsProven track record of making sound technical decisions about when to refactor versus rebuild existing systems

Skills

Required skills

Apache SparkExpert-level proficiency in Spark for large-scale data processing, including performance tuning and optimization
Ray DataProduction experience with Ray Data framework for distributed ML data processing workflows
Python/ScalaStrong programming skills in languages commonly used for data infrastructure development
Distributed SystemsDeep understanding of distributed computing principles, consistency models, and fault tolerance
Data Pipeline DesignAbility to architect robust, scalable data pipelines with proper error handling and monitoring
Performance DebuggingSystematic approach to identifying and resolving performance bottlenecks across complex distributed systems

Nice to have

ClickHouseExperience running or scaling ClickHouse for real-time analytics workloads
dbtFamiliarity with dbt for data transformation and analytics engineering workflows
DagsterExperience with Dagster or similar orchestration tools for data pipeline management
Cloud PlatformsExperience with AWS, GCP, or Azure for cloud-based data infrastructure deployment
KubernetesContainer orchestration experience for deploying and managing data processing workloads

Compensation & benefits

Salary

USD 180,000 – 280,000 (annual)

Stock options

Available

Benefits

In-Person Work Environment

Cozy offices in North Beach, San Francisco and Manhattan, New York with well-stocked libraries

Equity Package

Competitive equity compensation as part of total compensation package

Professional Development

Work with cutting-edge AI technology and contribute to groundbreaking automation tools

Collaborative Culture

Small, talent-dense team with flat organization structure encouraging spirited debate and creative problem-solving

Office Amenities

Well-appointed offices with comprehensive libraries and comfortable work environments


Interview process

  1. 1
    Initial Screen Application review and initial fit assessment based on technical background and experience
  2. 2
    Technical Interviews 2-3 short technical interviews focusing on data infrastructure, distributed systems, and problem-solving approach
  3. 3
    Onsite Interview Full-day onsite visit including hands-on project work, technical discussions, and team meetings to assess cultural fit and technical depth

Apply for this position

You'll be redirected to the company's application page


Cursor

Cursor

View all jobs

Built to make you extraordinarily productive, Cursor is the best way to build software with AI.

San Francisco, California, United StatesFounded 2021cursor.com

Tech Stack

Languages
PythonScalaSQL
Frameworks
Apache SparkRay DatadbtDagster
Databases
ClickHousePostgreSQLRedis
Tools
DatabricksKubernetesDockerApache KafkaTerraformGrafana
Other
AWSApache ParquetProtocol Buffers
Apply Now