Software Engineer, Data Transformation

SnowflakeYesterday

Location

DE-Berlin-Trion Building

Type

Full Time

Salary

USD 120,000 – 180,000

Level

Junior

Role

Data Engineer

Posted

Jun 19, 2026

Full TimeJunior

The role

Summary

Join Snowflake's Data Platform team as a Software Engineer specializing in Data Transformation, designing and operating large-scale distributed data systems that power the core platform. This role is ideal for talented engineers with strong foundations in distributed systems, algorithms, and data structures who are passionate about building scalable, high-throughput data processing infrastructure. You'll collaborate across product, infrastructure, and data science teams to contribute to architectural decisions and technical roadmaps while writing production-quality code that processes data at enterprise scale.

What you'll do

Design and implement scalable, high-throughput data processing systems: Architect and build distributed data systems capable of handling massive volumes of data with minimal latency, ensuring optimal performance and reliability across Snowflake's cloud infrastructure.

Build and maintain real-time and batch data pipelines: Develop robust data pipeline frameworks supporting both streaming and batch processing workloads, implementing fault-tolerant mechanisms and monitoring systems to ensure data integrity and on-time delivery.

Collaborate cross-functionally with product, infrastructure, and data science teams: Work closely with product managers, infrastructure engineers, and data science teams to understand requirements, align on technical approaches, and deliver solutions that meet diverse stakeholder needs.

Contribute to architectural decisions and technical roadmaps: Participate actively in system design reviews, propose improvements to data platform architecture, and help shape the technical direction of Snowflake's data transformation capabilities.

Write clean, tested, production-quality code at scale: Follow software engineering best practices including comprehensive testing, code reviews, documentation, and adherence to performance standards while building systems that serve millions of users and petabytes of data.

What we look for

Technical

Distributed Systems FundamentalsDeep understanding of distributed computing concepts including consistency models, fault tolerance, consensus algorithms, and trade-offs between availability, consistency, and partition tolerance (CAP theorem).

Algorithms and Data StructuresStrong grasp of fundamental and advanced algorithms, data structure design, complexity analysis, and ability to optimize solutions for performance at scale in production environments.

Programming ProficiencyExpert-level proficiency in at least one of Java, Scala, Python, or C++, with demonstrated ability to write performant, maintainable code across large-scale systems.

Large-Scale Data Systems ExperiencePractical experience with distributed data processing frameworks, cloud platforms (AWS, GCP, Azure), and understanding of database internals, indexing strategies, and query optimization techniques.

Data Pipeline ArchitectureExperience designing and implementing data ingestion, transformation, and export pipelines with focus on reliability, scalability, and monitoring in production environments.

Education

Bachelor's Degree in Computer Science or EngineeringFour-year degree in Computer Science, Software Engineering, Computer Engineering, or closely related field providing foundational knowledge in core computing principles.

Master's or Doctoral Degree (Preferred)Advanced degree (MS or PhD) in Computer Science, Engineering, or related discipline, demonstrating deeper expertise in specialized areas such as distributed systems, databases, or machine learning.

Equivalent Professional ExperienceDemonstrated mastery through equivalent software engineering experience in data systems, cloud infrastructure, or related domains, particularly for candidates without formal degree requirements.

Experience

Cloud Platform ProficiencyHands-on experience with major cloud providers' data services and infrastructure, including understanding of elastic scaling, cloud-native architecture patterns, and cost optimization strategies.

Stream and Batch ProcessingPractical experience with Apache Spark, Apache Flink, or similar distributed processing frameworks, demonstrating ability to implement and optimize both real-time and batch data workloads.

Data Infrastructure ToolsFamiliarity with Kafka, Apache pulsar, or other message streaming platforms; experience with data lakehouse architectures (Delta Lake, Apache Iceberg); knowledge of SQL engines and query optimization.

AI-Augmented Development PracticesExperience leveraging large language models, GitHub Copilot, or AI coding assistants to accelerate development workflows, prototype solutions efficiently, and improve code quality through AI-assisted development.

Production Software DevelopmentTrack record of delivering production-grade systems with emphasis on reliability, monitoring, logging, performance optimization, and ability to debug complex distributed systems issues.

Skills

Required skills

Distributed Systems DesignAbility to design systems that operate across multiple nodes with considerations for consistency, availability, fault tolerance, and scalability.

Data Processing FrameworksProficiency with Apache Spark, Apache Flink, or equivalent distributed data processing platforms for implementing scalable ETL and real-time analytics pipelines.

Systems ProgrammingStrong command of lower-level system concepts including memory management, concurrency, threading models, and performance profiling in languages like Java, C++, or Scala.

Software Engineering FundamentalsMastery of software design patterns, SOLID principles, testing methodologies (unit, integration, performance testing), version control, and code review processes.

Cloud InfrastructureUnderstanding of cloud-native architecture, containerization, orchestration platforms like Kubernetes, and deployment strategies for distributed systems.

Nice to have

Machine Learning InfrastructureExperience building systems that support ML workflows including feature engineering pipelines, model serving infrastructure, and integration with ML frameworks.

Database InternalsKnowledge of database design, query planning and optimization, indexing strategies, transaction processing, and understanding of modern cloud data warehouses.

DevOps and ObservabilityFamiliarity with CI/CD pipelines, infrastructure-as-code, monitoring systems, distributed tracing, and logging platforms for maintaining operational excellence.

Performance OptimizationDemonstrated expertise in profiling distributed systems, identifying bottlenecks, optimizing network I/O, memory usage, and CPU utilization for large-scale workloads.

Open Source ContributionsActive involvement in open source projects, particularly in data infrastructure, distributed systems, or cloud-native technologies, demonstrating community engagement and technical leadership.

Compensation & benefits

Salary

USD 120,000 – 180,000 (annual)

Stock options

Available

Benefits

Health and Wellness Coverage

Comprehensive medical, dental, and vision insurance plans providing access to preventative care, specialist services, and prescription medications with employer contributions.

Equity and Stock Options

Stock options and equity grants aligned with company performance, providing long-term wealth creation opportunities and direct stake in Snowflake's success.

Retirement Planning

401(k) savings plan with company matching contributions, enabling long-term retirement security and financial planning for eligible employees.

Professional Development

Budget for conferences, online courses, certifications, and training programs supporting continuous learning and career advancement in software engineering.

Flexible Work Arrangements

Flexible work hours and remote work options enabling work-life balance and allowing engineers to optimize their productivity and personal circumstances.

Paid Time Off

Generous paid vacation, sick leave, and personal days supporting employee wellbeing and time for rest, recovery, and personal pursuits.

Competitive Parental Leave

Comprehensive parental leave policies supporting employees during family planning, adoption, and early childcare needs.

Mental Health and Wellness Programs

Access to mental health services, counseling, fitness benefits, wellness programs, and employee assistance programs supporting holistic wellbeing.

Interview process

1
Initial Phone Screen — Conversation with recruiter to discuss your background, experience with distributed systems and data platforms, career goals, and high-level fit for the Data Transformation Engineering role.
2
Technical Phone Interview — Technical discussion with a data platform engineer focusing on distributed systems concepts, algorithms, data structure optimization, and problem-solving approach for scalable data processing challenges.
3
System Design Interview — Collaborative session where you design a large-scale data processing system, discussing architectural trade-offs, scalability considerations, fault tolerance mechanisms, and integration with existing infrastructure.
4
Coding Assessment — Live coding interview evaluating your proficiency in Java, Scala, Python, or C++, focusing on writing clean, efficient code for data processing problems with emphasis on performance and maintainability.
5
Cross-Functional Collaboration Discussion — Interview with engineers from infrastructure, product, or data science teams assessing your ability to collaborate effectively, communicate complex technical concepts, and navigate diverse stakeholder needs.
6
Hiring Manager Interview — Final conversation with the Data Platform team's hiring manager covering career expectations, motivation for data systems, alignment with Snowflake's culture, and discussion of team dynamics and growth opportunities.