Software and Abstraction
I came across a youtube video recently. It was a conversation between an engineer from Amazon and the video creator. They role play the interview of a Software algorithm to print all the left nodes of a Binary Tree. Those who are software developers in product companies and preparing to become a Software Developer at one of the product companies, would have come across such algorithms. After seeing the video, I casually started chatting with my wife. What is the use case of printing all the left nodes of binary tree? Will there any real world application of it?
Let us zoom out and understand what do software developers in companies like Amazon build. Let’s take an example. Think of the software team responsible for showing a Listing page in Amazon.com like this PS5 @ https://www.amazon.com/Console-Horizon-Forbidden-Bundle-PlayStation-5/dp/B0B16656Z2 . If we deconstruct this URL, we will find that id ‘B0B16656Z2’ seems to be a unique identifier for the listing. When this identifier is given, software will have to look for this identifier and fetch the details of PS5 like.. description, seller name, manufacturer, item name etc.
How does the software achieve this?
Data Structure
The data of all the listings is neatly tabulated an stored something like this. Of course, this is extreme over simplification of the how all the details of listing is stored. The details are stored in multiple table like Pricing details, Discount details, videos, images etc.
Algorithm
Now that we know the data is available in a table, we need to search and get all the relevant details, given a listing ID (like B0B16656Z2 in the PS5 example) . How do we do the search? We can do the search linearly by going through each row and see if the row’s listing ID matches the given listing ID. But you can guess, it is not going to be efficient for some one with billions of listings. It’s probably going to take hours to fetch the result and user is not going to wait more than 30 seconds for the page to load. How do we improve the efficiency of this search. By doing something known as indexing. It is like simply linking a number to listing ID like 1->B0B16656Z2, 2->B08G9J44ZN. The indexes are stored in a binary tree like below.
Why a binary tree? It’s easier to search in a binary tree sorted as show above. As the rows increase, the number searches need to be done increase non-linearly. In fact, the proportion is logarithmic. For 4 rows, we need maximum 2 search operations. for 64 rows, we need maximum 6search operations. For 10284 rows, we need maximum 10 search operation. You probably get it now.
Does the software developer really do all of this?
No, this is a solved problem. Indexing is available to use with all the database storages, so that the softwares don’t have to necessarily code them from the scratch? In fact, while building a lot of real world applications, Software developers don’t write everything scratch. They will reuse the open source applications and build the applications for the requirements they are coding for. To draw an analogy, it is building a toy variant from a basic prototype which serves as a blue print. It is no way as simple as it sounds.
But then why do software developers are asked questions like the ones we started the article with?
The worth of true invention is to build something unique from zero. Like building a basic prototype. It requires a lot of abstraction and lateral thinking. We learnt factorisation of number in mathematics. I was excited when I came to know that the some of early authentication mechanisms relied on providing the same factorisation combination of a very large number. There are possibly million ways to factorise a very large number and chance of hacking it is improbable. Learning abstract concepts is like working a various drills in football like curved passing, curved air balls, dribbling which are all not useful until you combine them with in a match to make a beautiful play. The more skills you master, the better you game. The more abstract concepts you learn, the better you are equipped to invent new application.