System Design

132 articles tagged with System Design.

How LinkedIn Rebuilt Service Discovery to Scale to Millions of Services

LinkedIn rebuilt service discovery using Kafka and Observer, enabling scalable, push-based updates with lower latency and higher availability.

LinkedinSystem Design

Rohit LakhotiaMay 25, 2026

How Snowflake Reduced Query Time by 20% (Without You Doing Anything)

Snowflake reduces query time by 20% via continuous engine optimizations, improving real workloads automatically without user changes.

SnowflakeSystem Design

Rohit LakhotiaMay 18, 2026

How GitHub Uses CodeQL to Secure Code at Scale

GitHub uses CodeQL to scan code as data, detect vulnerabilities, and secure thousands of repos automatically at scale.

GithubSystem Design

Rohit LakhotiaMay 11, 2026

How Snowflake Improved Performance by 27% (Without Users Noticing)

Snowflake boosts performance by 27% via backend optimizations in ingestion, planning, and execution thus faster queries and lower cost automatically

SnowflakeSystem Design

Rohit LakhotiaMay 4, 2026

What are SOLID Principles?

Learn solid principles in software engineering: explained with examples to write clean, maintainable, and scalable code. A practical guide for developers.

ConceptsSystem Design

Rohit LakhotiaApr 29, 2026

How Nomad by HashiCorp Reduced Scheduler Load by 90%

Nomad reduces scheduler load by canceling redundant evaluations, improving system performance and speeding up recovery during failures.

System DesignHashicorp

Rohit LakhotiaApr 27, 2026

How Slack Built Accessibility Checks into Its Testing Pipeline

Slack added Axe-based accessibility checks to Playwright tests, balancing automation with reliability, better reports, and easy developer workflows.

SlackSystem Design

Rohit LakhotiaApr 20, 2026

How GitHub Redesigned CLI Accessibility Without a Rulebook

GitHub makes CLI accessible by improving prompts, colors, and output, helping screen readers, low-vision users, and making terminals usable for all devs

GithubSystem Design

Rohit LakhotiaApr 13, 2026

How Slack Built Secure Enterprise Search?

Slack enables secure enterprise search using real-time fetch, RAG, ACL & OAuth, no data storage, always permission-aware & private across tools.

SlackSystem Design

Rohit LakhotiaApr 6, 2026

How Airbnb Migrated a Petabyte Without Users Noticing

Airbnb rebuilt Mussel into a cloud-native KV store and migrated 1PB+ data using Apache Kafka with zero downtime.

AirbnbSystem Design

Rohit LakhotiaMar 30, 2026

API Gateway vs Load Balancer

Discover the differences between API gateway vs load balancer and find out which is best for your system's performance and security needs.

Cloud InfrastructureDistributed SystemConceptsSystem ArchitectureSystem DesignMessage Queues

Rohit LakhotiaMar 25, 2026

How Slack cut their E2E Build Time by 80%?

Slack cut E2E time 80% by skipping redundant frontend builds and reusing cached assets, saving compute, storage, and hours.

SlackSystem Design

Rohit LakhotiaMar 23, 2026

What are Immutable Data Structures?

Explore why immutable data structures: why they matter in modern coding. Discover how they enhance reliability, simplify concurrency, and prevent bugs.

ConceptsSystem Design

Rohit LakhotiaMar 18, 2026

How Shopify Made Commerce Data Queryable Without SQL

ShopifyQL Notebooks lets merchants explore business data without SQL, using commerce-focused models built for clarity, speed, and action.

ShopifySystem Design

Rohit LakhotiaMar 16, 2026

What is Backpressure?

Learn what is backpressure in distributed systems, why it’s vital for stability, and key strategies to prevent overloads in large-scale systems.

Distributed SystemSystem ArchitectureSystem Design

Rohit LakhotiaMar 11, 2026

How Slack Automatically Stops Suspicious Activity in Real Time

Slack’s AER detects suspicious activity and automatically terminates user sessions, shrinking response time from hours to minutes.

SlackSystem Design

Rohit LakhotiaMar 9, 2026

How Shopify Built Super-Fast Search at C++ Speed

Shopify built RankFlow to run ML-powered search at C++ speed, letting data scientists iterate fast without sacrificing latency or scale.

ShopifySystem Design

Rohit LakhotiaMar 2, 2026

Why Spotify’s Shuffle Never Felt Random (and What They Did About It)

Spotify kept Shuffle random but made it feel fair by choosing the least repetitive random order, so songs feel fresher without breaking true randomness.

SpotifySystem Design

Rohit LakhotiaFeb 23, 2026

Replication vs Redundancy. What's the Difference?

Learn the key differences between replication and redundancy to optimize your data protection strategies. Discover which method suits your needs best.

DevopsConceptsSystem ArchitectureSystem Design

Rohit LakhotiaFeb 18, 2026

How Dropbox Dash Uses a Feature Store for Real-Time AI

Dropbox Dash uses a hybrid feature store to deliver fast, fresh signals at scale, keeping AI search accurate, low-latency, and reliable at scale.

System DesignDropbox

Rohit LakhotiaFeb 16, 2026

What Is MLOps?

Learn what is MLOps: bridging the gap between devops and machine learning. Explore its lifecycle, tools, and best practices for scaling AI effectively.

DevopsConceptsSystem Design

Rohit LakhotiaFeb 11, 2026

How Spotify Scaled Content Annotations to Millions (Without Losing Quality)

Spotify built a scalable annotation platform by combining human experts, smart tools, and strong infrastructure to power high-quality ML training data

SpotifySystem Design

Rohit LakhotiaFeb 9, 2026

How Dropbox Dash Uses Context Engineering to Build Smarter AI

Dropbox Dash evolved into agentic AI by engineering context fewer tools, relevant data, and specialized agents making AI faster, smarter at work.

System DesignDropbox

Rohit LakhotiaFeb 2, 2026

What is Publish-Subscribe Pattern?

What is publish-subscribe pattern? Learn how pub/sub decouples components, with real-world examples and benefits for scalable systems.

DevopsDistributed SystemConceptsSystem ArchitectureSystem DesignMessage Queues

Rohit LakhotiaJan 28, 2026

How LinkedIn Uses Machine Learning to Moderate Content at Scale

LinkedIn is using ML to prioritize content smarter, not replace humans but helping reviewers act faster, scale better, and keep the platform safe without losing judgment or nuance.

LinkedinSystem Design

Rohit LakhotiaJan 26, 2026

What is Configuration Drift?

What is Configuration Drift? Learn causes, risks, and best practices to detect, prevent, and fix drift with IaC and GitOps.

NetworkingDevopsConceptsSystem ArchitectureSystem DesignSecurity

Rohit LakhotiaJan 21, 2026

How Instagram Improved HDR Video on iOS With Dolby Vision

Dolby Vision first hurt Reels due to load delays from metadata. Compression fixed it, boosting watch time and enabling rollout on Instagram iOS

InstagramSystem Design

Rohit LakhotiaJan 19, 2026

What is Lazy Loading vs Eager Loading?

lazy loading vs eager loading explained with practical examples. Learn when to apply each approach for performance and resource efficiency.

BeginnerConceptsSystem ArchitectureSystem Design

Rohit LakhotiaJan 14, 2026

How LinkedIn Rebuilt its Profile Highlights System

LinkedIn rebuilt Profile Highlights into a plug-in platform, enabling faster experiments, independent teams, better performance, and ~50% lower costs.

LinkedinSystem Design

Rohit LakhotiaJan 12, 2026

How LinkedIn Reduced Latency and Cost by Merging Two Critical Systems

LinkedIn merged identity midtier and data services, cutting network hops to reduce latency, memory use, and cost while keeping APIs unchanged.

LinkedinSystem Design

Rohit LakhotiaJan 5, 2026

How Lyft Built an In-App Messaging Without Annoying Riders

Lyft built in-app messaging by starting with simple banners and scaling into a smart, context-aware system that delivers timely messages without annoying riders.

System DesignLyft

Rohit LakhotiaDec 29, 2025

How Airbnb builds Products 10x Faster Using GraphQL and Apollo

Airbnb ships faster by using GraphQL and Apollo to power backend-driven UI, automatic types, and tooling that lets engineers focus on building features

AirbnbSystem Design

Rohit LakhotiaDec 22, 2025

How Zomato Improved their Android App Startup Time by Over 20% Using Baseline Profiles

Zomato cut Android app startup time by 20% using Baseline Profiles, pre-optimizing key code paths for faster launches and a smoother, consistent user experience.

ZomatoSystem Design

Rohit LakhotiaDec 15, 2025

What Happens During a Database Migration?

Discover what happens during a database migration. This practical guide covers planning, execution, validation, and strategies for a smooth transition.

DatabaseConceptsSystem Design

Rohit LakhotiaDec 10, 2025

How Airbnb Measures the Lifetime Value of a Listing

Airbnb’s LTV framework shows which listings drive value, supports hosts, and adapts to market changes for smarter, data-driven decisions.

AirbnbSystem Design

Rohit LakhotiaDec 8, 2025

What Is CQRS?

What is CQRS? This guide explains the CQRS pattern with simple analogies and practical examples to help you build scalable and high-performance applications.

NetworkingDistributed SystemConceptsSystem ArchitectureSystem Design

Rohit LakhotiaDec 3, 2025

How Lyft Rebuilt its Iconic Dashboard Emblem and its entire IoT Platform along with it?

Lyft’s Glow is more than an emblem, it’s a unified IoT platform with secure provisioning, real-time control, device shadowing, and safe OTA updates.

System DesignLyft

Rohit LakhotiaDec 1, 2025

How LinkedIn Made the “My Network” Tab Faster, Smoother, and More Flexible

LinkedIn sped up My Network by unifying APIs, adding pagination, and using a backend-driven render model, cutting latency and improving the overall UX.

LinkedinSystem Design

Rohit LakhotiaNov 24, 2025

What Is the N+1 Query Problem?

What is the n+1 query problem? Learn how it slows apps, why it happens, and practical fixes with code examples to speed up performance.

ConceptsSystem ArchitectureSystem Design

Rohit LakhotiaNov 19, 2025

How Swiggy Cut QA Regression Time by 66% Using Automated Event Testing

Swiggy built ARD Automator to automate mobile event verification using contracts and validators, cutting QA time by 66% and boosting accuracy.

SwiggySystem Design

Rohit LakhotiaNov 17, 2025

How Razorpay Uses Terraform to Simplify and Scale Infrastructure Management

Razorpay leverages Terraform + Atlantis to automate, secure, and scale infrastructure with GitOps workflows and modular IaC practices.

System DesignRazorpay

Rohit LakhotiaNov 10, 2025

What is Token Bucket Algorithm?

Discover how context switching lets operating systems multitask smoothly, switching between processes to keep your system fast and efficient.

DevopsConceptsSystem Design

Rohit LakhotiaNov 5, 2025

How LinkedIn Cut Build Times from 30 Minutes to 10 Seconds

LinkedIn’s RDev lets engineers code in the cloud with pre-built containers, cutting setup from 30 mins to 10 secs while keeping CI consistent.

LinkedinSystem Design

Rohit LakhotiaNov 3, 2025

How Swiggy Scaled and Maintained Postgres

Swiggy scaled Postgres by cleaning unused indexes, controlling auto-vacuum, and using pg_repack for online maintenance and better performance.

SwiggySystem Design

Rohit LakhotiaOct 27, 2025

How Razorpay prepared for Chrome’s Third-Party Cookie Deprecation

Razorpay uses partitioned cookies (CHIPS) to tackle Chrome’s 3P cookie phaseout, cutting drop-offs while ensuring a smooth, reliable checkout.

System DesignRazorpay

Rohit LakhotiaOct 20, 2025

How LinkedIn Built a Faster, Safer, and Smarter HDFS Ecosystem

LinkedIn scaled HDFS with HA, Observer nodes, encryption & Wormhole, boosting speed, reliability & secure data access for massive growth.

LinkedinSystem Design

Rohit LakhotiaOct 13, 2025

How Salesforce Reinvented Task Execution for the Cloud Era

Salesforce built a cloud-native task execution system in Hyperforce, replacing SSH with secure, scalable, multi-cloud automation using recipes & workers.

SalesforceSystem Design

Rohit LakhotiaOct 6, 2025

How Zomato Handles 100 Million Daily Search Queries

Zomato fixed search scale issues by moving from Field Cache to DocValues and using nested docs, cutting costs, OOM errors & boosting speed.

ZomatoSystem Design

Rohit LakhotiaSep 29, 2025

How Salesforce migrated 200,000 Machines from CentOS 7 to RHEL 9

Using automation for zero downtime, stronger security & faster parallel upgrades, Salesforce successfully migrated 200,000 machines from CentOS 7 to RHEL 9

SalesforceSystem Design

Rohit LakhotiaSep 22, 2025

How Salesforce migrated 200,000 Machines from CentOS 7 to RHEL 9

Using automation for zero downtime, stronger security & faster parallel upgrades, Salesforce successfully migrated 200,000 machines from CentOS 7 to RHEL 9

SalesforceSystem Design

Rohit LakhotiaSep 22, 2025

Edge Computing vs Fog Computing: Making the Right Choice

When comparing edge computing vs. fog computing, the main difference comes down to a simple question: where does the data processing happen?

DevopsBeginnerConceptsSystem ArchitectureSystem Design

Rohit LakhotiaSep 20, 2025

How Swiggy Improved Video Performance with Smart Caching

Swiggy boosted video cache hits & cut costs by clustering widths with K-means, reducing redundant processing while keeping playback seamless.

SwiggySystem Design

Rohit LakhotiaSep 15, 2025

Circuit Breaker vs Retry in Microservices

When building resilient systems, the debate of circuit breaker vs retry is about choosing the right tool for the right kind of failure. A Retry pattern is...

BeginnerConceptsSystem Design

Rohit LakhotiaSep 13, 2025

Sharding vs Partitioning: What's the Difference?

Partitioning splits data within one database for faster retrieval, while sharding spreads data across multiple databases to handle scale and traffic.

Distributed SystemBeginnerConceptsSystem Design

Rohit LakhotiaSep 10, 2025

Latency vs Throughput: A Guide for System Performance

When you hear engineers talk about latency vs throughput, they are discussing two sides of the same coin: speed versus capacity.

Distributed SystemBeginnerConceptsSystem Design

Rohit LakhotiaSep 9, 2025

Write-Through, Write-Back & Write-Around in Cache: A Practical Guide

Your app writes data every second but how it writes can change everything. Write-Through, Write-Back & Write-Around hide big trade-offs.

DatabaseConceptsSystem Design

Rohit LakhotiaSep 8, 2025

How Hyperforce Edge Networking Scaled to 20 Million Domains With Less Than 30GB of RAM

Scaled from 3M→20M+ domains, Salesforce Hyperforce Edge cut memory <30GB with new storage design, boosting speed, reliability & security.

SalesforceSystem Design

Rohit LakhotiaSep 8, 2025

What Is an Application Server? Role & Importance

Ever wondered what happens behind the curtain when you log into an app, book a flight, or add something to your online shopping cart? That seamless, interactive experience is powered by an unseen engine...

NetworkingDistributed SystemSystem ArchitectureSystem Design

Rohit LakhotiaSep 1, 2025

How Razorpay Capital Detects Duplicate or Fraudulent Merchants

Razorpay scaled payments to billions of transactions by re-engineering its core systems, ensuring speed, security & reliability at scale.

System DesignRazorpay

Rohit LakhotiaSep 1, 2025

Performance and Scalability in Web Applications

Ever wondered why some apps stay smooth at 100 users but crash at 10k? That is where performance meets scalability.

CdnDistributed SystemBeginnerConceptsSystem ArchitectureSystem Design

Rohit LakhotiaAug 28, 2025

Data Management in Applications

Whether you’re building a simple note-taking app, a social media platform, or a large-scale e-commerce system, your application’s success depends on how well...

DatabaseConceptsSystem Design

Rohit LakhotiaAug 28, 2025

Authentication & Access Control

You sign in to your bank account and can only view your balance. The bank manager logs in and can approve loans. Same system, different powers but how does the app decide?

DevopsDistributed SystemConceptsSystem ArchitectureSystem DesignSecurity

Rohit LakhotiaAug 28, 2025

System Design Tutorial

When applications grow beyond a handful of users, writing code alone isn’t enough. To scale, stay reliable, and support complex features, software needs strong...

CdnDistributed SystemBeginnerConceptsSystem ArchitectureSystem Design

Rohit LakhotiaAug 28, 2025

How Salesforce Migrated 760+ Kafka Nodes Handling 1M Messages per Second with Zero Downtime

Salesforce upgraded 760+ Kafka nodes handling 1M+ msg/sec with zero downtime, scaling Marketing Cloud seamlessly for the future.

SalesforceSystem Design

Rohit LakhotiaAug 25, 2025

Vertical vs Horizontal Scaling

Is it better to make one server stronger or add more servers?

Distributed SystemSystem ArchitectureSystem Design

Rohit LakhotiaAug 20, 2025

How X (Formerly Twitter) Handles Millions of Tweets Every Second

X scaled from Ruby to Java, microservices, real-time data, and AI to handle millions of tweets, searches, and users with speed and reliability.

TwitterSystem Design

Rohit LakhotiaAug 18, 2025

How Spotify Powers Music Streaming for Millions

Spotify uses Kafka, microservices, and ML to deliver real-time, personalized music to millions, powered by a fast, scalable cloud backend.

SpotifySystem Design

Rohit LakhotiaAug 11, 2025

How Meta Powers its Cloud Gaming Infrastructure at Scale

Meta streams games from cloud GPUs to your device with ultra-low latency, using real-time encoding, smart networking, and fast decoding.

MetaSystem Design

Rohit LakhotiaAug 4, 2025

Understanding API Gateway in Microservices: Key Benefits & Use Cases

Learn how an API gateway in microservices optimizes architecture. Explore core functions, patterns, and best practices to enhance your system.

ConceptsSystem Design

Rohit LakhotiaJul 30, 2025

How Amazon Key Unlocks 100 Million Doors a Year

Amazon Key lets drivers unlock gates for faster deliveries. From serverless to microservices, it now powers 100M+ secure unlocks yearly.

AmazonSystem Design

Rohit LakhotiaJul 28, 2025

EP 88: How Pinterest Evolved its Architecture to Serve 500 Million Users

Pinterest began as a simple side project and scaled by simplifying tech, embracing microservices, and building strong pipelines and monitoring.

PinterestSystem Design

Rohit LakhotiaJul 21, 2025

EP 87: How Uber Handles 40 Million+ Reads Per Second Using an Integrated Cache

Uber serves 40M+ reads/sec by pairing Docstore with a smart Redis cache, using CDC for near-instant updates and clever sharding for scale.

UberSystem Design

Rohit LakhotiaJul 14, 2025

EP 86: How Facebook Scales Live Streaming for Millions of Viewers at Once?

Facebook scaled Live streaming for millions by building robust ingestion, delivery, and ISP optimizations, powering events like the UEFA Final.

MetaSystem Design

Rohit LakhotiaJul 7, 2025

How Uber Eats Scaled Search to Handle Billions of Daily Queries

Uber Eats scaled search by revamping indexing, geo-sharding & ranking, supporting billions of queries daily without compromising latency.

SearchUberSystem ArchitectureSystem Design

Rohit LakhotiaJun 30, 2025

EP 84: How Pinterest Built Text-to-SQL to make Data analysis easier

Pinterest built a Text-to-SQL tool using LLMs and RAG to help analysts convert questions into SQL and find the right data faster and easier.

PinterestSystem Design

Rohit LakhotiaJun 23, 2025

EP 83: How Pinterest Rebuilt its $3B+ Ads System without any Downtime

Pinterest rebuilt its \$3B+ ad system with a graph-based design for better scale, safety & dev speed, launched with zero downtime and big cost wins.

PinterestSystem Design

Rohit LakhotiaJun 16, 2025

EP 82: How Pinterest uses LLMs to make your Search Results more Relevant?

Pinterest's AI teacher-student system improved search by 19.7%, understanding user intent beyond keywords for better relevance globally

PinterestSystem Design

Rohit LakhotiaJun 9, 2025

EP 81: How Pinterest Built “Holiday Finds” to make Gift Shopping easier?

Pinterest Holiday Finds uses smart recommendations, auto wishlists and a fresh UI to make holiday gifting easy!

PinterestSystem Design

Rohit LakhotiaJun 2, 2025

EP 80: How Pinterest improved ABR Video Performance?

Pinterest sped up video playback by embedding manifests in API responses and using Memcache to reduce startup latency.

PinterestSystem Design

Rohit LakhotiaMay 26, 2025

EP 79: How Grab enabled near Real-Time analytics on their Data Lake

Grab used Apache Hudi with Flink and Spark to enable near real-time analytics, ensuring fast ingestion and low-latency queries on their data lake.

System Design

Rohit LakhotiaMay 19, 2025

How Discord’s "Go Live" streaming works

Discord’s “Go Live” streams in real-time by capturing, encoding, transmitting, and decoding adapting quality to your network and device.

Cloud InfrastructureNetworkingDevopsSystem DesignDiscord

Rohit LakhotiaMay 12, 2025

EP 77: How GitHub made Push Processing faster and more Reliable

GitHub sped up and stabilized push processing by splitting one big job into parallel Kafka-triggered tasks with better retries and monitoring.

System Design

Rohit LakhotiaMay 5, 2025

EP 76: How Mixpanel Fixed Their Load Balancing Problem using Power of 2 Choices

Mixpanel fixed Compacter’s load imbalance using Power-of-2-Choices, boosting efficiency and cutting costs by 70% with minimal changes!

System Design

Rohit LakhotiaApr 28, 2025

EP 75: How Netflix built a Distributed Counter for Billions of User Interactions

Netflix uses a smart Distributed Counter system to track billions of user actions daily with speed, accuracy, and massive scale.

System Design

Rohit LakhotiaApr 21, 2025

How Stripe Scales its APIs using Rate Limiters

Stripe uses token buckets, concurrency limits & load shedders to scale APIs, prevent abuse & keep critical traffic flowing reliably.

Cloud InfrastructureStripeNetworkingDevopsSystem ArchitectureSystem Design

Rohit LakhotiaApr 14, 2025

How does UPI work?

UPI enables instant bank-to-bank transfers using just a UPI ID or mobile number, no bank details needed, just your app and secure PIN.

System Design

Rohit LakhotiaApr 7, 2025

EP 71: How PayPal Solved the Thundering Herd Problem Efficiently

PayPal’s Braintree fixed the Thundering Herd Problem using Exponential Backoff with Jitter and simplified their architecture for better scaling.

System ArchitectureSystem Design

Rohit LakhotiaMar 24, 2025

EP 70: How Wayfair built their Ad Bidding System?

Wayfair built a smart Ad Bidding System using automation, ML, and real-time data to optimize bids, maximize ROI, and scale efficiently.

System Design

Rohit LakhotiaMar 17, 2025

EP 69: How Airbnb Rebuilt its Payment System to achieve 150x performance gains?

Airbnb rebuilt its payments system with SOA, a unified read layer & denormalization, boosting scalability, reliability & 150x faster transactions.

AirbnbSystem Design

Rohit LakhotiaMar 10, 2025

EP 68: How Stripe uses Similarity Clustering to detect fraud

Stripe uses similarity clustering with XGBoost to detect fraud, linking accounts by shared traits to block fraud rings in real-time and reduce false positives.

StripeSystem Design

Rohit LakhotiaMar 3, 2025

EP 67: How BBC uses Serverless to handle Millions of visitors

BBC uses AWS Lambda to scale instantly, optimize caching, and reduce cold starts, ensuring fast, cost-efficient performance for millions of visitors.

System Design

Rohit LakhotiaFeb 24, 2025

EP 66: How Meta distributes Exabytes of Data across the World so fast?

Meta uses Owl, a hybrid system that mixes peer-to-peer caching with smart tracking, making data move faster, smoother, and at scale.

MetaSystem Design

Rohit LakhotiaFeb 17, 2025

EP 65: How Quora Improved its Search System with Qdrant?

Quora moved to Qdrant for faster, scalable embedding search, improving recommendations with real-time updates, bulk loads, and optimized storage.

SearchSystem Design

Rohit LakhotiaFeb 10, 2025

How Jira moved from JSON to Protobuf saved them 55% cost and 75% CPU?

Jira cut data size by 80%, reduced Memcached CPU by 75%, and saved 55% in costs by switching from JSON to Protobuf, improving speed and efficiency.

System Design

Rohit LakhotiaFeb 3, 2025

EP 63: How Quora Optimized their Databases?

Quora optimized databases with caching, MyRocks for storage efficiency, and MySQL sharding to boost performance, cut costs, and handle scale.

DatabaseSystem Design

Rohit LakhotiaJan 27, 2025

EP 62: How Robinhood prevents Fraud using Graph Algorithms

Robinhood prevents fraud using graph algorithms to analyze user connections, detect patterns, and enable real-time, smarter fraud detection.

System Design

Rohit LakhotiaJan 20, 2025

EP 61: How Stripe achieved 99.999% uptime with DocDB (Document Database)

Stripe achieved 99.999% uptime by building DocDB, a custom solution on MongoDB, enabling efficient data migration, scaling, and high availability.

DatabaseStripeSystem Design

Rohit LakhotiaJan 13, 2025

How DoorDash transitioned from Monolith to Microservices

DoorDash used the strangler fig pattern, scream tests, and multi-tenant architecture to smoothly transition from monolith to microservices.

System ArchitectureSystem Design

Rohit LakhotiaJan 6, 2025

EP 59: How Reddit designed their Metadata Store to serve 100k req/sec?

Reddit built a high-performance metadata store using Aurora Postgres, range-based partitions, PgBouncer, and JSONB fields, handling 100k req/sec.

ConceptsSystem Design

Rohit LakhotiaDec 30, 2024

EP 58: How Facebook built its Video Delivery System?

Facebook unified Reels, Watch, and Live, optimizing ranking, servers, and mobile to deliver personalized, efficient, and fresh video experiences.

MetaSystem Design

Rohit LakhotiaDec 23, 2024

EP 57: How Airbnb Processes a Million User Events Every Second?

How Airbnb made 1.9 Billion in 6 months and how Airbnb’s User Signals Platform uses Apache Flink & Lambda Architecture to process millions of events per second for real-time personalization.

AirbnbSystem Design

Rohit LakhotiaDec 16, 2024

EP 56: How LinkedIn Scaled to 1 billion Users?

By shifting to microservices from monoliths, using tools like Hadoop, Kafka, Rest.li, LinkedIn scaled to a billion of users globally.

ConceptsLinkedinSystem Design

Rohit LakhotiaDec 9, 2024

EP 55: How did Magic Pocket help Dropbox save millions?

Dropbox scaled its storage with its custom-built system- Magic Pocket, and utilized high-density SMR drives, increasing its gross revenue by 75%.

System DesignDropbox

Rohit LakhotiaDec 2, 2024

EP 54: How Dropbox scaled its storage infrastructure?

Dropbox scaled its storage infrastructure with a custom-built system called Magic Pocket, utilizing high-density SMR drives and advanced data replication for durability and scalability.

System DesignDropbox

Rohit LakhotiaNov 25, 2024

EP 53: How TikTok Optimizes Video Streaming

TikTok boosts streaming by preloading videos, optimizing buffers, and reusing media players, with on-device upscaling and task distribution for smooth playback on all networks.

Distributed SystemSystem ArchitectureSystem Design

Rohit LakhotiaNov 18, 2024

EP 52: How GitHub manages continuous integration and deployment

GitHub manages CI/CD by automating testing, building, and deploying code changes, allowing developers to release updates faster and with confidence.

DevopsConceptsSystem Design

Rohit LakhotiaNov 11, 2024

EP 51: How Instagram handled user growth and scale?

Instagram achieved rapid user growth by maintaining a simple and efficient tech stack, utilizing AWS, Django, and Postgres also effectively managing traffic with load balancing, caching, and data sharding to handle the increasing demand.

InstagramSystem Design

Rohit LakhotiaNov 4, 2024

EP 50: How Google search works?

Google Search works by using crawlers to scan and index web pages, then processes your queries to rank and display relevant results in seconds.

ConceptsSystem Design

Rohit LakhotiaOct 28, 2024

EP 49: How Stripe Handles Global Payments Technology

Stripe utilizes a tech stack of Ruby and JavaScript to enable secure, compliant global payments and currency conversion.

StripeSystem Design

Rohit LakhotiaOct 21, 2024

EP 48: How Tinder Streams to 75 Million Users with HTTP Live Streaming

Tinder used HTTP Live Streaming (HLS) & AWS CloudFront to deliver Swipe Night videos efficiently, ensuring seamless, adaptive playback.

System ArchitectureSystem Design

Rohit LakhotiaOct 14, 2024

How Netflix Secures Content Delivery using Open Connect CDN?

Netflix secures content delivery through its proprietary Open Connect CDN, which caches content on local servers, ensuring low-latency streaming and minimizing network congestion.

CdnNetflixSystem Design

Rohit LakhotiaOct 7, 2024

EP 46: How Uber Manages Real-Time Analytics with Apache Flink

Uber Eats uses real-time data processing with Apache Kafka, Flink, and Pinot to manage order updates, optimize delivery logistics, and provide quick analytics for efficient and accurate food delivery.

UberSystem Design

Rohit LakhotiaSep 30, 2024

EP 45: How Slack Maintains Reliability and Uptime

Slack maintains reliability and uptime through automated incident detection, real-time collaboration, proactive monitoring, and a resilient microservices architecture.

SlackSystem Design

Rohit LakhotiaSep 23, 2024

How Zoom Ensures Low Latency Video Calls

Zoom ensures low latency by using distributed data centers, optimized video encoding, and adaptive bitrate streaming to maintain real-time communication quality.

ZoomCloud InfrastructureNetworkingDevopsSystem ArchitectureSystem Design

Rohit LakhotiaSep 16, 2024

EP 43: How Amazon Personalizes Product Recommendations

Amazon personalizes product recommendations using machine learning, collaborative filtering, and user interaction data to tailor suggestions based on individual preferences

ConceptsAmazonSystem Design

Rohit LakhotiaSep 9, 2024

EP 42: How Pinterest Scales Their Image Search with Elasticsearch

Pinterest scales its image search by using Elasticsearch for fast indexing, real-time search, and advanced machine learning features.

SearchSystem Design

Rohit LakhotiaSep 2, 2024

EP 41: How Facebook Handles Billions of Messages Daily

Facebook manages billions of daily messages using scalable servers, distributed systems and advanced algorithms for efficient processing and real-time delivery.

DevopsDistributed SystemConceptsMetaSystem Design

Rohit LakhotiaAug 26, 2024

EP 39: How Twitter Manages High Availability with Kubernetes

Twitter achieves high availability with Kubernetes through multi-node deployments, load balancing, and data center redundancy.

TwitterSystem Design

Rohit LakhotiaAug 12, 2024

EP 38: How Spotify Optimized Their Recommendation System

Spotify optimized recommendations by combining collaborative filtering, content-based filtering, and audio analysis to deliver highly personalized music recommendations.

ShopifySystem Design

Rohit LakhotiaAug 5, 2024