Software Engineer, Agent Evaluation and Quality

Cursor3 months ago

Location

San Francisco

Type

Full Time

Salary

USD 180,000 – 250,000

Level

Senior

Role

Software Engineer

Posted

Apr 13, 2026

Full TimeSenior

The role

Summary

Cursor is seeking a Software Engineer for their Agent Evaluation and Quality team to build cutting-edge AI evaluation infrastructure. The ideal candidate will design measurement systems, develop feedback loops, and create tooling to improve AI agent reliability and performance across the company's product ecosystem.

What you'll do

AI Evaluation System Design: Create comprehensive AI evaluation systems including curated datasets, offline replay mechanisms, scoring frameworks, regression alerts, and performance dashboards.

Feedback Loop Development: Design and implement robust feedback collection mechanisms to gather, clean, and interpret user signals that inform model and system improvements.

Analysis Tooling: Develop advanced debugging and analysis workflows to identify agent behavior patterns, investigate failure modes, and surface actionable insights.

Quality Measurement: Establish operational quality metrics, define performance thresholds, create alerting mechanisms, and develop triage strategies for AI agent reliability.

What we look for

Technical

AI Evaluation SystemsProven experience in building and operating evaluation systems for AI, experimentation platforms, ranking/relevance, or search quality metrics

Data AnalysisStrong data analysis skills with ability to transform abstract quality concepts into concrete metrics and decision-making pipelines

Software EngineeringSolid software engineering fundamentals with expertise in building and shipping robust production systems

Education

Computer ScienceBachelor's or Master's degree in Computer Science, Software Engineering, or related technical field preferred

Experience

AI/ML SystemsDemonstrated experience working with AI and machine learning evaluation frameworks

Production SystemsTrack record of developing and maintaining large-scale software infrastructure

Skills

Required skills

AI EvaluationExpertise in designing comprehensive AI quality measurement systems

Data Pipeline DevelopmentAbility to create robust data collection and processing pipelines

System DesignStrong skills in designing scalable and reliable software infrastructure

Nice to have

Machine Learning ResearchFamiliarity with latest AI research trends and emerging technologies

Distributed SystemsExperience with large-scale distributed computing architectures

Compensation & benefits

Salary

USD 180,000 – 250,000 (annual)

Stock options

Available

Benefits

Health Insurance

Comprehensive medical, dental, and vision coverage

Equity

Competitive stock option package for early-stage startup

Professional Development

Continuous learning opportunities and conference attendance support

Interview process

1
Initial Screening — Brief introductory call to assess initial fit and background
2
Technical Interviews — 2-3 focused technical interviews exploring problem-solving and technical expertise
3
Onsite Project — In-office project where candidates work on a small technical challenge and meet the team