How to Think in System Design: Mental Models

TL;DR / Key Takeaways

Interviews test your thinking process more than a perfect architecture.
Start with requirements, then quantify load and constraints.
Define the core metrics: throughput, latency, availability, durability.
A good design is a series of explicit trade-offs, not a single right answer.

What System Design Interviews Actually Test

Most interviews are not asking for the most complex system. They are asking whether you can:

Clarify the problem before building a solution.
Translate vague goals into measurable requirements.
Reason about scale using simple math.
Explain trade-offs calmly and clearly.

A strong answer looks like an architect thinking out loud, not a checklist of technologies.

Requirements First: Functional and Non-Functional

Functional requirements define what the system must do. Non-functional requirements define how well it must do it.

graph TD
  R[Requirements] --> F[Functional]
  R --> N[Non-Functional]
  N --> S[Scalability]
  N --> A[Availability]
  N --> L[Latency]
  N --> C[Cost]

Ask questions early:

Who uses it, and how often?
What is the read vs write mix?
What is the acceptable latency and error rate?
What happens when a dependency fails?

The Four Core Metrics

Use these terms precisely:

Throughput: requests per second or transactions per second.
Latency: time for a single request, usually p50 and p95.
Availability: percentage of time the system is usable (for example 99.9%).
Durability: probability that committed data is not lost.

A design that optimizes one metric often harms another. That is the point.

Translate Requirements into Constraints

Turn vague goals into hard constraints you can design against:

Read vs write mix drives caching, storage choice, and scaling direction.
Consistency expectations drive data model and replication strategy.
Data retention and growth define storage and backup needs.
Latency targets define how much work each hop can do.

Write these constraints down early so you can defend trade-offs later.

Latency Budgets and Critical Paths

If the user target is p95 = 200 ms, split that budget across components. Example:

30 ms network and TLS
20 ms load balancer and routing
50 ms service logic
80 ms database or cache
20 ms for safety margin

Any single dependency that routinely exceeds its budget will break the latency target.

Back-of-the-Envelope Sizing

Quick math sets the scale before you design.

Example:

10 million daily active users
5 actions per day each
50 million actions per day
50,000,000 / 86,400 seconds = about 580 QPS average

Now add realistic peaks and storage:

Peak factor 5x -> 2,900 QPS peak
1 KB per action -> 50 GB written per day
30 days retention -> 1.5 TB raw data (before replication)

You now know the system must handle a few thousand QPS peak and multi-terabyte storage.

Worked Example: From Requirements to Size

Assume 2,000,000 daily active users and 4 actions per day:

Actions per day = 2,000,000 * 4 = 8,000,000
Average QPS = 8,000,000 / 86,400 = about 93
Peak factor 5x -> about 465 QPS
Read/write split 90/10 -> about 420 reads/s and 47 writes/s
Each write is 2 KB -> about 16 GB/day (8,000,000 * 2 KB)
30-day retention -> about 480 GB raw
3x replication -> about 1.4 TB total

The Assumptions Ledger

Keep a short list of assumptions you are willing to defend:

Peak factor and traffic distribution (steady vs spiky).
Read/write ratio and request fan-out.
Availability target and data loss tolerance.

If any assumption changes, explicitly describe how the architecture changes.

Common Failure Modes in Interviews

Skipping requirements and jumping to architecture.
Ignoring read vs write patterns.
Not stating assumptions or estimating load.
Treating latency, availability, and cost as free.

How to Narrate Trade-offs

A good structure:

Clarify the use case and constraints.
State assumptions and estimate load.
Propose a simple baseline design.
Identify bottlenecks and scale points.
Offer alternatives with trade-offs.

This keeps the conversation grounded and shows senior-level thinking.

Checklist for a Strong Start

Define the users and core user actions.
Split functional from non-functional requirements.
Estimate QPS, data size, and storage growth.
Call out the key trade-offs you will make.

A clean mental model is the foundation for every system you design.