Algorithms - 7. Depth First Search (DFS)

When to use Depth First Search (DFS)

Use DFS when you need to explore all paths or dive deep before backtracking. While most commonly used for tree traversal, it can be adapted for:

Tree traversals (classic use case - inorder, preorder, postorder)
Path finding (checking if path exists between nodes)
Connected components (finding islands in a grid)
Cycle detection (detecting loops in graphs)
Topological sorting (ordering dependencies)
Backtracking problems (generating permutations, solving puzzles)
Any problem where you need to explore as far as possible along each branch before backtracking

Example problem

File System Exploration: Given a folder structure, find all files with a specific extension. Your file system looks like:

root/
├── documents/
│   ├── report.pdf
│   └── notes.txt
├── code/
│   ├── main.js
│   └── utils/
│       └── helper.js
└── images/
    └── logo.png

Find all .js files in the system.

Answer: DFS will explore root → documents → report.pdf → notes.txt (backtrack) → code → main.js (found!) → utils → helper.js (found!) (backtrack) → images → logo.png. Result: [main.js, helper.js]

Single most important insight

DFS commits to exploring one path completely before trying another - by recursively diving deep into each branch until hitting a dead end, then backtracking to explore alternatives, we systematically visit every node exactly once without missing any paths.

Mental note: The algorithm's genius lies in using the call stack as a natural backtracking mechanism. At each node, we simply say: "Process myself, then explore my children completely." The recursion automatically handles returning to unexplored branches when we hit dead ends.

Decision Framework

At every node, you follow this simple order:

Process current node (when and how depends on traversal type)
Recursively explore children (left/right for trees, neighbors for graphs)
Let recursion handle backtracking (automatic via call stack)

The only decision is when to process the current node:

Preorder: Process before exploring children (root → left → right)
Inorder: Process between children (left → root → right)
Postorder: Process after exploring children (left → right → root)

When to Use Each Traversal Type (Simple Examples)

Pre-order (Root → Left → Right)

When it's required: You need to process or visit the parent node before you can process its children - typically when creating copies, printing hierarchies, or when the parent contains information needed by its children.

Simple example: Copying a folder structure - When you copy a folder to a new location, you must:

Create the parent folder first (root)
Then copy/create its sub-folders and files (children)

You can't put files in a folder that doesn't exist yet!

In-order (Left → Root → Right)

When it's required: You need to retrieve data in sorted order from a binary search tree, or when you need to process nodes in their natural sequential order - the left subtree contains smaller values, and the right subtree contains larger values.

Simple example: Printing all employees by salary in a company tree where:

Left branch = employees with lower salaries
Current node = this employee's salary
Right branch = employees with higher salaries

In-order traversal gives you: $50k, $60k, $70k, $80k... perfectly sorted without any sorting algorithm!

Post-order (Left → Right → Root)

When it's required: You need to process or collect information from all children before you can process the parent - typically when deleting, calculating sizes, or when the parent depends on results from its children.

Simple example: Calculating total folder size on your computer:

First, get the size of all files in subfolder A (left child)
Then, get the size of all files in subfolder B (right child)
Finally, add them together to get the parent folder's total size

You can't know a folder's total size until you've measured all its contents!

Code

Finding Files in a Folder Structure

Let's implement DFS to solve a real problem: searching through a project folder to find all JavaScript files. This example shows how DFS explores deep into each folder branch before moving to the next, using a path stack to track where we are and backtrack when needed.

// Practical example: Finding a specific file in a folder structure
// Let's search for all JavaScript files in our project

class FileNode {
  constructor(name, isFolder = false) {
    this.name = name;
    this.isFolder = isFolder;
    this.children = [];
  }

  addChild(child) {
    this.children.push(child);
    return child;
  }
}

// The main DFS algorithm - finding files with a specific extension
function findFilesWithExtension(rootFolder, extension) {
  const foundFiles = [];
  const pathStack = []; // Track current path for full file paths

  function searchFolder(node) {
    // Base case: if null, nothing to explore
    if (node == null) {
      return;
    }

    // Add current item to path
    pathStack.push(node.name);
    const currentPath = pathStack.join("/");

    // DECISION FRAMEWORK: Check if this is what we're looking for
    const isTargetFile = !node.isFolder && node.name.endsWith(extension);
    if (isTargetFile) {
      foundFiles.push(currentPath);
    }

    // Recursively explore all children (go deep!)
    for (const child of node.children) {
      searchFolder(child); // Dive into this child completely
    }

    // Backtrack: Remove current from path when done exploring
    pathStack.pop();
  }

  searchFolder(rootFolder);
  return foundFiles;
}

// Build our example file system
const root = new FileNode("project", true);

// Add src folder with JavaScript files
const src = root.addChild(new FileNode("src", true));
src.addChild(new FileNode("index.js"));
src.addChild(new FileNode("app.js"));

const components = src.addChild(new FileNode("components", true));
components.addChild(new FileNode("Button.js"));
components.addChild(new FileNode("Button.css")); // Not a JS file

// Add docs folder with markdown files
const docs = root.addChild(new FileNode("docs", true));
docs.addChild(new FileNode("README.md"));
docs.addChild(new FileNode("guide.md"));

// Add config files at root
root.addChild(new FileNode("package.json"));
root.addChild(new FileNode("config.js"));

// Find all JavaScript files
const jsFiles = findFilesWithExtension(root, ".js");
console.log("Found JavaScript files:");
console.log(jsFiles);
// Output:
// [
//   'project/src/index.js',
//   'project/src/app.js',
//   'project/src/components/Button.js',
//   'project/config.js'
// ]

// Let's trace through ONE path to see DFS in action:
// 1. Start at 'project'
// 2. Go to first child 'src' (DIVE DEEP!)
// 3. Go to 'src's first child 'index.js' (DEEPER!)
// 4. 'index.js' has no children, process it, backtrack
// 5. Go to 'src's next child 'app.js' (EXPLORE SIBLING)
// 6. 'app.js' has no children, process it, backtrack
// 7. Go to 'src's next child 'components' (DIVE INTO NEW BRANCH)
// ... and so on

Visualizing the Three Traversal Orders

Let's use a simple tree to see how each traversal order visits nodes differently:

With this tree structure, each traversal type will visit the nodes in a different sequence based on when it processes the parent relative to its children.

class SimpleNode {
  constructor(value) {
    this.value = value;
    this.left = null;
    this.right = null;
  }
}

// Pre-order: Process node BEFORE children (Root → Left → Right)
function preOrder(node) {
  if (node == null) return [];

  const result = [];

  function traverse(current) {
    if (current == null) return;

    // DECISION FRAMEWORK: Process current FIRST
    result.push(current.value);
    traverse(current.left); // Then go left
    traverse(current.right); // Then go right
  }

  traverse(node);
  return result;
}

// In-order: Process node BETWEEN children (Left → Root → Right)
function inOrder(node) {
  if (node == null) return [];

  const result = [];

  function traverse(current) {
    if (current == null) return;

    traverse(current.left); // First go left
    // DECISION FRAMEWORK: Process current IN MIDDLE
    result.push(current.value);
    traverse(current.right); // Then go right
  }

  traverse(node);
  return result;
}

// Post-order: Process node AFTER children (Left → Right → Root)
function postOrder(node) {
  if (node == null) return [];

  const result = [];

  function traverse(current) {
    if (current == null) return;

    traverse(current.left); // First go left
    traverse(current.right); // Then go right
    // DECISION FRAMEWORK: Process current LAST
    result.push(current.value);
  }

  traverse(node);
  return result;
}

// Build example tree
const tree = new SimpleNode("A");
tree.left = new SimpleNode("B");
tree.right = new SimpleNode("C");
tree.left.left = new SimpleNode("D");
tree.left.right = new SimpleNode("E");

console.log("\nTraversal Orders:");
console.log("Pre-order  (Root-Left-Right):", preOrder(tree)); // [A, B, D, E, C]
console.log("In-order   (Left-Root-Right):", inOrder(tree)); // [D, B, E, A, C]
console.log("Post-order (Left-Right-Root):", postOrder(tree)); // [D, E, B, C, A]

// Real-world meaning of each traversal:
// Pre-order:  Create a copy of the tree (process parent before children)
// In-order:   Get sorted values from BST (left < root < right)
// Post-order: Delete the tree safely (delete children before parent)

Doubt buster

Common doubt: "Won't DFS get stuck in infinite loops or revisit nodes? What if I have a graph with cycles, or what if the tree is actually huge - won't we run out of memory?"

Why you might think DFS could fail

Consider exploring a social network where people can be friends with each other (creating cycles):

Alice → Bob → Charlie
  ↑             ↓
  └─────────────┘

You might worry:

"If we start at Alice, go to Bob, then Charlie, then back to Alice... we're stuck forever!"
"What if the tree is 1000 levels deep? Won't the call stack overflow?"
"How do we know we won't process the same node multiple times?"

Why this thinking is incomplete

Key realization: DFS has two simple solutions for these concerns:

For cycles: Use a visited set to track explored nodes
For depth: The call stack space is O(height), not O(nodes)

Let's see how we handle graphs with cycles:

function dfsGraph(startNode, graph) {
  const visited = new Set();
  const result = [];

  function explore(node) {
    // Prevent revisiting - this breaks cycles!
    if (visited.has(node)) {
      return;
    }

    visited.add(node);
    result.push(node);

    // Explore all unvisited neighbors
    for (const neighbor of graph[node]) {
      explore(neighbor);
    }
  }

  explore(startNode);
  return result;
}

// Example with our cyclic graph
const socialNetwork = {
  Alice: ["Bob", "Charlie"],
  Bob: ["Charlie"],
  Charlie: ["Alice"], // Creates cycle!
};

console.log(dfsGraph("Alice", socialNetwork));
// ['Alice', 'Bob', 'Charlie'] - no infinite loop!

Concrete example: DFS handles all tree shapes efficiently

Let's trace through different tree shapes to see why DFS is robust:

Extremely deep tree (linked list shape):

1 → 2 → 3 → 4 → ... → 1000

Space complexity: O(1000) for call stack
But we still visit all 1000 nodes exactly once
Modern systems handle thousands of recursive calls easily

Extremely wide tree:

        1
   /  /  |  \  \
  2  3   4   5  6  (1000 children)

Space complexity: O(2) - just current path depth!
We explore child 2 completely, backtrack, then child 3, etc.
Never need to store all siblings simultaneously

Balanced tree:

Space complexity: O(log n) for balanced trees
Most efficient case for recursive exploration

Why O(log n)? In a balanced tree with n nodes, the height is log₂(n). For example, a balanced tree with 1,000,000 nodes has a height of only ~20 levels. Since DFS only keeps the current path in memory (not all nodes), the call stack never exceeds 20 frames, even with a million nodes!

Mathematical Proof: Space Complexity is O(log n)

Let's prove step-by-step why a balanced binary tree with n nodes has height log₂(n):

Step 1: Understand the structure of a balanced binary tree

Each level i has at most 2^i nodes (level 0 has 1, level 1 has 2, level 2 has 4, etc.)
A perfectly balanced tree with height h has exactly 2^(h+1) - 1 nodes

Step 2: Derive the relationship between nodes (n) and height (h)

For a balanced tree with height h:
- Maximum nodes = 1 + 2 + 4 + 8 + ... + 2^h
- This is a geometric series = 2^(h+1) - 1

Therefore: n ≤ 2^(h+1) - 1

Why n ≤ 2^(h+1) - 1? This inequality represents the maximum number of nodes in a binary tree of height h:

Each level i can have at most 2^i nodes
Total nodes = 2^0 + 2^1 + 2^2 + ... + 2^h (sum of all levels)
Using geometric series formula: Sum = a(r^n - 1)/(r - 1) where a=1, r=2, n=h+1
Result: n = (2^(h+1) - 1)/1 = 2^(h+1) - 1

We use ≤ (not =) because a tree might not be completely filled at the last level. For example, both these height-2 trees are valid:

Complete (n = 7):        Incomplete (n = 4):
      1                        1
     / \                      / \
    2   3                    2   3
   / \ / \                  /
  4  5 6  7                4

Continuing the derivation:

          n + 1 ≤ 2^(h+1)
          log₂(n + 1) ≤ h + 1
          h ≥ log₂(n + 1) - 1

Since h must be at least log₂(n), we say h = O(log n)

Step 3: Concrete examples with real numbers

Tree with 7 nodes (perfectly balanced):
      1           Height = 2
     / \          Nodes at each level:
    2   3         Level 0: 1 node
   /|   |\        Level 1: 2 nodes
  4 5   6 7       Level 2: 4 nodes
                  Total: 7 nodes = 2^3 - 1
                  Height: 2 = floor(log₂(7)) ✓

Tree with 1,000,000 nodes:
log₂(1,000,000) ≈ 19.93
Height ≈ 20 levels

Tree with 1,000,000,000 nodes (1 billion):
log₂(1,000,000,000) ≈ 29.9
Height ≈ 30 levels

Step 4: Why this matters for DFS space complexity

DFS only stores the current path from root to the current node on the call stack:

Call stack during DFS traversal:
[root]                    → 1 frame (at root)
[root, left]              → 2 frames (at left child)
[root, left, left.left]   → 3 frames (at left.left child)
...
[root, ..., leaf]         → h+1 frames (at deepest leaf)

Maximum stack frames = height + 1 = O(log n) for balanced trees

Visual comparison of node count vs. height:

Nodes (n)     | Height (h) | Stack frames needed
------------- | ---------- | ------------------
7             | 2          | 3
15            | 3          | 4
31            | 4          | 5
1,023         | 9          | 10
1,048,575     | 19         | 20
1,073,741,823 | 29         | 30

Notice how even with over a billion nodes, we only need 30 stack frames! This is the power of logarithmic growth - the height grows much slower than the number of nodes.

Why the call stack is our friend

The brilliance of DFS is that the call stack automatically provides:

Memory of where to return - When we finish exploring a branch, the stack knows exactly where we left off
Implicit backtracking - No manual tracking of "where to go back to"
Natural depth limiting - Stack overflow is actually a feature that prevents infinite recursion

Each recursive call adds just one frame to the stack:

explore(root)           // Stack: [root]
  explore(left)         // Stack: [root, left]
    explore(left.left)  // Stack: [root, left, left.left]
    return              // Stack: [root, left] - automatic backtrack!
    explore(left.right) // Stack: [root, left, left.right]
    return              // Stack: [root, left]
  return                // Stack: [root]
  explore(right)        // Stack: [root, right]

The guarantee: Every node is visited exactly once, and the maximum memory used is proportional to the tree's height, not its total size. For graphs, the visited set ensures we never process a node twice, completely eliminating infinite loops.