Trees & Graphs

Trees and Graphs are the first non-linear (not all elements are one after the other) data structure that we see in this guide. The non-linearity of the data structure makes it more difficult to think and solve problems involving them, however, both are extremely useful to many practical problems so you should definitely familiarize with these two data structures before your interview.

We start with the more generic data structure from the two, which is the Graph Data Structure. This data structure consists of some nodes (finite in number which also called vertices) together with connections between these nodes (these connections between the nodes are called edges). A real world example could be an application using Facebook. Imagine that the profile of each person is a node and if two people are friends on Facebook then there is an edge between these two profiles. Using this model, someone could write an app using connections on Facebook.

There are many operations that the Graph data structure could support, some of them for example are; given two nodes return whether they share an edge, given a node return all neighbor nodes, add/remove a node, add/remove and edge etc. However, the operations on Graphs are not as important as the possible representations of Graphs. There are two main possibilities of representing a graph; the first one uses Adjacency Lists and the second one uses an Adjacency Matrix. The idea of using adjacency lists is that for each Node \(A\) we attach a list of all the nodes that are connected to the Node \(A\). The adjacency matrix on the other hand is a square matrix containing only \(0\)’s or \(1\)’s. If the \(i\)-th node is connected with an edge to the \(j\)-th node then at the positions \((i, j)\) and \((j, i)\) there are \(1\)’s, otherwise if this connection does not exist then in these positions there are \(0\)’s. The solution to graph problems heavily depend on the representation of the given graph.

These two representations of Graphs are fundamentally different. An adjacency matrix gives you a good overview of the whole graph and it is perhaps easier to do operations on it. However, if we have \(n\) nodes in the graph the size of the matrix would need to be of size \(O(n^2)\). In some situations this is not practical, for example imagine that you want to represent the internet with a graph such that each site to be a node and a connection between two sites exists if there is at least a link from one to the other one. This would result in an extremely big graph that would be impossible to be represented by an adjacent matrix. We could represent this graph using adjacent lists. The reason that we can do this with this approach is that the information stored about the graph is only for a smaller local area of the graph. However, with this approach we have only a local view of a given graph. With this approach we can save a lot of space representing a given graph if it does not have many edges.

Some basic problems that can be given in an interview on graphs are solved with similar concepts as the problems we will see now on trees. We leave the potential problems for graphs to focus on the bit easier problems on trees which otherwise are quite similar.

The Tree Data Structure is a special case of a graph, a tree is a graph that has an order relationship between its nodes, i.e. there is no node that has a path starting from this node to itself passing from different nodes using edges of the graph. Except one node, each node has an immediate previous node which called ‘parent’ and each node can have nodes immediate after it (one or more) which called ‘children’. The special node that has no parent is called the root and the special nodes that have no children are called leaves. Note that each node could be the root of a ‘sub-tree’, this fact makes us think that a tree is very relative to recursion and we can use this fact to solve problems on trees. A real world example could be the diagram of people in a business represented by a Tree. The node corresponding to the CEO of that company would be the root and all the nodes corresponding to the people who report directly to her would be the children. The rest of the tree would be created by similar logic for any employee of the company. Modeling this relationship with this data structure could be useful for an HR application.

A basic problem on trees is to search for an element in the tree. The approaches to solve this problem can be classified into two categories: search each level of the tree before you move to the next lever, and search each branch of the tree before you move to the next branch. By a level we mean the distance between a node to the root, for example if a tree has as a root the Node \(A\) which has exactly two children, Node \(B\) and Node \(C\). Node \(B\) has also two children nodes \(D\) and \(E\), and Node \(C\) has one child the Node \(F\). Then, the nodes \(B\) and \(C\) would be at the same level. Similarly, nodes \(D\), \(E\), and \(F\) are in the same level (a different level than the one that has nodes \(B\) and \(C\)). This approach is called Breadth First Search (BFS). For the other category of searches on trees the idea is that we go deep into the tree before we move to the next step of the search. In our example of the previous tree, we first search the sub-tree that has the nodes \(B\), \(D\), and \(E\). Then, we would have seen first all the nodes of this sub-tree (nodes \(B\), \(D\), and \(E\)) before we see Node \(C\). In the BFS approach we would have seen first both of the nodes \(B\) and \(C\) before we search any of the \(D\), \(E\), and \(F\). This approach is called Depth First Search (DFS).

The DFS approach could be implemented in many different ways. However, three of them are well known and perhaps they could come up in interview questions; pre-order, in-order, and post-order. To simplify the description of how these implementations work we assume that each node has at most two nodes as children, these type of trees are called Binary Trees. In a pre-order search, we visit the root node first, then recursively do a pre-order search of the the left sub-tree, followed by a recursive pre-order search of the right sub-tree. The order that we would get the nodes of the above example by implementing pre-order would be \(A, B, D, E, C, F\). In an in-order search, we recursively do an in-order search on the left sub-tree, visit the root node, and finally do a recursive in-order search of the right sub-tree. The order that we would get the nodes of the above example by implementing in-order would be \(D, B, E, A, F, C\). In a post-order search, we recursively do a post-order search of the left sub-tree and the right sub-tree followed by a visit to the root node. The order that we would get the nodes of the above example by implementing post-order would be \(D, E, B, F, C, A\). These kind of searches in trees are also called Tree Traversals.