Home > Reflections | ⏮️ ⏭️
2024-06-11
🏋 Practice 1
The Problem
5. Longest Palindromic Substring
Given a string
s
, return the longest palindromic substring ins
.
🪞 Reflections
-
This “Medium” level problem was a lot more difficult than I expected it to be.
-
I identified the brute force solution pretty quickly. I recognized that it wasn’t fast enough. I tried to find a faster solution, then gave up at about 25 minutes in. Implementing and testing the brute force solution only took about 6 minutes, which is pretty good.
-
The total time I spent searching for a fast enough solution was too much for an interview.
-
But I am pretty happy that I found a fast enough solution without having to look anything up.
-
How did I find that solution?
-
Well, I tried little optimizations on brute force to begin, which was unfruitful. In general, that does seem to be a theme. If I’m going to find an algorithmically faster solution, it’s probably not going to be a small tweak on brute force (at least, according to my vague recollection of past experience). I kind of knew this, but decided to try it anyway because I wasn’t so sure about the runtime complexity including the little tweaks, and it seemed easy (and fast) enough to be worth trying… plus I wasn’t confident I’d find a truly better solution.
-
Maybe a key insight is that finding a faster solution will usually involve a new way of thinking of the problem or at least some significant insight. It won’t just be an optimization on brute force. So when searching for something algorithmically faster, maybe I should start from scratch.
-
In this case, the key insight was that palindromes always start and end on the same character, thus we can avoid looking for palindromes when the first and last character aren’t the same. Additionally, it was critical to precompute the map of indices so that we wouldn’t have to use linear search when finding the last location of any given character each time we look for it.
-
By avoiding checking for palindromes between indices that don’t contain the same character, we drastically reduced the search space.
-
By building our map up front, we spent linear time to precompute answers we’d otherwise need linear time inside a linear loop for… thus we avoided an O(N^2) overall complexity by precomputing the inner loop. We were able to precompute the inner loop because we knew exactly what we were looking for at each step of the outer loop: the other occurrences of the same character. Because we knew the answer we were looking for in advance, a constant-time lookup data structure (hash map) allowed us to jump straight to the answer.
My Solutions
System Design
I haven’t spent much time practicing system design recently. I found some course material that walks through the design of various real products and services. Before reading the course material, I’ll try to do it myself. Afterword, I’ll read the course material and may make some updates to my notes.
🪞 Reflections
- It feels good to get some practice in system design.
- I was pleasantly surprised by how much similarity there was in the design notes I took before hand and the design presented in the course material.
- I think there are enough universal design principles for it to be worth writing them out on their own.
- I don’t agree with all of the design decisions presented in the course material, but I think that’s okay. For example, the course recommended storing videos in blob storage (as did I) but they recommended storing thumbnails in a separate, proprietary storage system that’s supposed to be particularly good at storing small files. Without seeing evidence of a production problem (which I wouldn’t anticipate) it seems like unnecessary added complexity to not just keep thumbnails in blob storage with videos.
Design YouTube
What is YouTube?
A platform where users can upload and download (watch) videos.
Clarifying questions:
- Who are the users?
- How many users do we expect to have?
- What are the goals of this project? (e.g. profit? revenue? open source?)
- Who are the key stakeholders? (e.g. managers, users, teams, etc)
- How broadly should the service be available? (YouTube is available everywhere in the world)
- What are the key features we want to focus on?
- Video uploading?
- Video watching?
- Search?
- User ratings?
- What about non-functional requirements?
- What kinds of latencies are acceptable?
- What kind of availability guarantees should we have?
- Are there any security concerns?
- Should we consider account management?
- DDOS protections?
Focusing on the most essential elements that embody YouTube, we have a system that
-
Has a web front end where users can
-
Upload videos
-
Watch videos
-
(optional) rate videos
-
(optional) search videos
-
The back end will need
-
video storage (blob storage)
-
a relational database to organize content (this can also maintain video ratings, comments, and other meta-data)
-
it’s probably good to include a search engine (e.g. elastic search for full-text videos searching by name)
-
application servers to decouple web clients from back end architecture, which is also good for security
1. Note: we probably don’t want to funnel video data through our back end servers, as network bandwidth is expensive. Instead, the back end can tell the client where to point its video player. -
What about scalability concerns?
-
Do we need to replicate video data across regional data centers, or is it okay to have a single global blob storage location?
-
We may not need full replication across regions, but we could utilize a CDN to cache frequently accessed content in local regions.
-
There are some interesting usage pattern distributions to consider.
-
Many more users will likely watch videos than upload them.
-
There will likely be a power-law distribution in which videos are watched - a few being watched many times, and a lot never being watched.
-
How can these patterns influence our design decisions?
-
We could, for example, allocate more resources to more popular videos, choosing to replicate that content proactively.
-
There are interesting choices that can be made on how to present content to users.
-
What content should we show brand new users? Most popular content?
-
As users watch, like, dislike, and search content, how should we modify the content we show them by default (i.e. personalization; MLcontent recommendation systems)
-
How can we design search such that users can find content that doesn’t necessarily match video title keywords? (e.g. tags, synonyms, phonetic search)
-
There are interesting design decisions related to video format and quality
-
Video format affects video player compatibility, playback quality, and storage requirements. Standardizing storage formats can help reduce storage cost and delivery delays.
-
Storing video in multiple qualities allows us to offer variable quality and bandwidth for users watching from devices and networks with varying capacity and capabilities.
flowchart
WebClient --> LoadBalancer --> AppServer
AppServer --> RDS
AppServer --> SearchCluster
AppServer --> Encoder --> BlobStorage
WebClient --> CDN --> BlobStorage
🏋 Practice 2
The Problem
Given the
root
node of a binary search tree and two integerslow
andhigh
, return the sum of values of all nodes with a value in the inclusive range[low, high]
.
🪞 Reflections
- 🎉 < 15 minutes
- 🎉 identified recursive solution in planning phase
- 🤔 Technically, there’s a simpler implementation that’s still O(N) that doesn’t conditionally descend on left and right sides. I’m not sure if it was worth adding that optimization (and risking bugs) or not.
- It’s a good thing that I identified the recursive solution early, because implementing the iterative version seems challenging.
- Maybe that means I should implement the iterative solution, too.