Home > Articles

๐Ÿค”โš™๏ธ๐Ÿงฉ๐Ÿ—๏ธ๐Ÿ’ก Everything I know about good system design

๐Ÿค– AI Summary

  • ๐Ÿ“‰ Good design looks underwhelming: nothing goes wrong for a long time.
  • ๐Ÿงฑ Complex systems reflect poor design; working complex systems evolve only from simple working systems.
  • โš ๏ธ Minimize stateful components because they cannot be automatically repaired when they fail.
  • โœ๏ธ Contain all writing logic within a single, state-aware service; multiple services must not write to the same table.
  • ๐Ÿ“ Design tables to be human-readable, balancing flexibility with application complexity.
  • ๐Ÿ”Ž Index tables to match common queries, placing highest-cardinality fields first.
  • โšก Get the database to do the work, using JOIN instead of in-memory stitching.
  • ๐Ÿ“š Send read queries to replicas; use in-memory updates to work around replication lag.
  • โš™๏ธ Split slow operations: do minimum useful work for the user immediately, queue the rest in background jobs.
  • ๐ŸงŠ Caching introduces statefulness and must never replace first speeding up the original operation, such as adding a database index.
  • ๐Ÿ“ฃ Use events when the sender is indifferent to the consumers, or for high-volume, non-time-sensitive data.
  • ๐ŸŽฏ Focus on โ€œhot paths,โ€ the most critical, data-heavy parts of the system, as they have fewer viable solutions.
  • ๐Ÿšจ Log aggressively during unhappy paths, recording the specific condition hit.
  • ๐Ÿ“ˆ Monitor basic observability metrics, watching p95/p99 for user-facing latency.
  • ๐Ÿ”‘ Use idempotency keys when retrying writes that may or may not have succeeded.
  • ๐Ÿ›ก๏ธ Define failure policy (fail open vs. fail closed) based on the specific feature requirement.

๐Ÿค” Evaluation

  • ๐Ÿ†š This perspective contrasts sharply with advice focused on leveraging complex distributed patterns like microservices or event sourcing early in a projectโ€™s lifecycle, which often prioritize future scalability over present-day simplicity.
  • โš–๏ธ A legitimate perspective would argue that for a hyper-growth startup, not using message queues or a well-sharded database from day one can lead to a costly re-architecture later, contradicting the philosophy that a complex system must evolve from a simple one.
  • ๐Ÿ’ก The emphasis on simplicity and โ€œboringโ€ technology aligns with the โ€œworse is betterโ€ design philosophy, prioritizing immediate practical value and ease of maintenance over theoretical completeness or advanced features.
  • ๐Ÿ“š Topics to explore for a better understanding:
    • ๐Ÿ”„ The trade-offs between โ€œfail openโ€ and โ€œfail closedโ€ policies in non-security-critical but high-stakes systems (e.g., fraud detection, high-volume ad serving).
    • โฑ๏ธ Specific techniques for measuring and optimizing p99 latency in database queries and service-to-service communication.
    • ๐Ÿงฉ The point at which a simple system must be split into microservices, and whether the overhead of complexity is truly earned before or after catastrophic scale issues appear.

๐Ÿ“š Book Recommendations