Home > Articles

πŸ€”βš™οΈπŸ§©πŸ—οΈπŸ’‘ Everything I know about good system design

πŸ€– AI Summary

  • πŸ“‰ Good design looks underwhelming: nothing goes wrong for a long time.
  • 🧱 Complex systems reflect poor design; working complex systems evolve only from simple working systems.
  • ⚠️ Minimize stateful components because they cannot be automatically repaired when they fail.
  • ✍️ Contain all writing logic within a single, state-aware service; multiple services must not write to the same table.
  • πŸ“ Design tables to be human-readable, balancing flexibility with application complexity.
  • πŸ”Ž Index tables to match common queries, placing highest-cardinality fields first.
  • ⚑ Get the database to do the work, using JOIN instead of in-memory stitching.
  • πŸ“š Send read queries to replicas; use in-memory updates to work around replication lag.
  • βš™οΈ Split slow operations: do minimum useful work for the user immediately, queue the rest in background jobs.
  • 🧊 Caching introduces statefulness and must never replace first speeding up the original operation, such as adding a database index.
  • πŸ“£ Use events when the sender is indifferent to the consumers, or for high-volume, non-time-sensitive data.
  • 🎯 Focus on β€œhot paths,” the most critical, data-heavy parts of the system, as they have fewer viable solutions.
  • 🚨 Log aggressively during unhappy paths, recording the specific condition hit.
  • πŸ“ˆ Monitor basic observability metrics, watching p95/p99 for user-facing latency.
  • πŸ”‘ Use idempotency keys when retrying writes that may or may not have succeeded.
  • πŸ›‘οΈ Define failure policy (fail open vs. fail closed) based on the specific feature requirement.

πŸ€” Evaluation

  • πŸ†š This perspective contrasts sharply with advice focused on leveraging complex distributed patterns like microservices or event sourcing early in a project’s lifecycle, which often prioritize future scalability over present-day simplicity.
  • βš–οΈ A legitimate perspective would argue that for a hyper-growth startup, not using message queues or a well-sharded database from day one can lead to a costly re-architecture later, contradicting the philosophy that a complex system must evolve from a simple one.
  • πŸ’‘ The emphasis on simplicity and β€œboring” technology aligns with the β€œworse is better” design philosophy, prioritizing immediate practical value and ease of maintenance over theoretical completeness or advanced features.
  • πŸ“š Topics to explore for a better understanding:
    • πŸ”„ The trade-offs between β€œfail open” and β€œfail closed” policies in non-security-critical but high-stakes systems (e.g., fraud detection, high-volume ad serving).
    • ⏱️ Specific techniques for measuring and optimizing p99 latency in database queries and service-to-service communication.
    • 🧩 The point at which a simple system must be split into microservices, and whether the overhead of complexity is truly earned before or after catastrophic scale issues appear.

πŸ“š Book Recommendations