Algorithms - 11. Union-Find (Disjoint Set Union)

When to use Union-Find

Use Union-Find when you need to efficiently track connectivity in a dynamic graph or manage disjoint sets. While most commonly used for finding connected components, it can be adapted for:

Connected components in graphs (classic use case)
Cycle detection in undirected graphs
Kruskal's MST algorithm for finding minimum spanning trees
Network connectivity problems (servers, friends, accounts)
Percolation problems (physics simulations, maze generation)
Any problem where you need to efficiently merge groups and check if elements belong to the same group

Example problem

Social Network: Given a list of friendships like [[1,2], [2,3], [4,5], [6,7], [5,6]], efficiently answer:

Are person 1 and person 3 friends (directly or through mutual connections)?
How many separate friend groups exist?
Can we merge two friend groups when a new friendship forms?

After processing all friendships:

Group 1: {1, 2, 3} - all connected through friendships
Group 2: {4, 5, 6, 7} - all connected through friendships
Total: 2 separate friend groups

Answer: Person 1 and 3 are connected (same group), we have 2 total groups, and yes we can efficiently merge groups.

What Union-Find Actually Does

Union-Find is a data structure that manages groups of elements. It provides exactly two operations:

FIND: Determine which group an element belongs to (returns the group's representative/root)
UNION: Merge two groups into one single group

Think of it like managing friend circles on social media:

FIND: "Which friend circle does Alice belong to?"
UNION: "Bob and Carol just became friends, merge their friend circles"

How the Operations Work

FIND Operation - "Which group is this element in?"

When you call find(5), it:

Starts at element 5
Follows parent pointers up to the root (the root is the group identifier)
Always returns the root - this root serves as the unique ID for the entire group
Optimization: Compresses the path while traversing, making future lookups faster

UNION Operation - "Connect these two groups"

When you call union(3, 7), it:

Finds the root of element 3's group
Finds the root of element 7's group
If same root: they're already connected, does nothing
If different roots: makes one root point to the other, instantly merging entire groups

When These Operations Happen

These operations only happen when you explicitly call them:

const uf = new UnionFind(10);

// UNION: Explicitly connect elements
uf.union(1, 2); // Connect groups containing 1 and 2
uf.union(2, 3); // Connect groups containing 2 and 3

// FIND: Check which group an element belongs to
let groupID = uf.find(3); // ALWAYS returns the root (the group's unique ID)

// Common usage: Check if two elements are connected
uf.isConnected(1, 3); // Internally calls find(1) and find(3)

The beauty of Union-Find is that connecting just two roots instantly merges entire groups - you don't need to update every single element in either group!

Single most important insight

Every element points to exactly one parent, creating trees that represent sets - and by making trees flat (path compression) and merging by tree height (union by rank), we achieve nearly constant-time operations for both finding representatives and merging sets.

Mental note: The algorithm's genius lies in treating sets as trees where all elements eventually point to one "root" representative. At each operation, we only need to: (1) Follow parent pointers to find the root (with compression to speed up future lookups), or (2) Connect two roots to merge entire sets. This simple parent-pointer mechanism automatically maintains disjoint sets efficiently.

Decision Framework

Union-Find involves two key decisions:

During Find (path compression):

Follow parent pointers to the root
Compress the path by making nodes point directly to root (or grandparent)

During Union (union by rank):

Find roots of both elements
If different roots: attach smaller tree under larger tree
If same root: elements already connected, do nothing

Why "Merging by Tree Height" Matters

Union by rank means always attaching the shorter tree under the taller tree to keep trees as flat as possible. Here's why this is crucial:

Without Union by Rank (Bad):

Always attaching to first tree creates chains:
1 → 2 → 3 → 4 → 5 → 6 → 7 → 8
Height: 8 (finding element 8 takes 8 steps!)

With Union by Rank (Good):

Attaching shorter to taller creates balanced trees:
        1
      / | \
     2  3  6
    / \   / \
   4   5 7   8
Height: 3 (finding any element takes at most 3 steps!)

The "rank" is the tree's height. When merging two trees:

If one tree is taller: attach the shorter tree under the taller tree's root
If both same height: attach either way, but increase the resulting tree's rank by 1
Result: Maximum height grows logarithmically instead of linearly

This optimization ensures that even without path compression, operations remain O(log n), and with path compression, we achieve nearly constant time!

Code

class UnionFind {
  constructor(numberOfElements) {
    // Store parent pointers: parent[element] = element's parent
    // Initially, each element is its own parent (meaning it's a root)
    this.parent = new Map();

    // Store tree ranks: rank[element] = upper bound of tree height
    // Used to keep trees balanced during union operations
    // NOTE: After path compression, this becomes an estimate, not actual height!
    this.rank = new Map();

    // Initialize: create numberOfElements separate groups
    // Each element starts in its own group (points to itself)
    for (let element = 1; element <= numberOfElements; element++) {
      this.parent.set(element, element); // Self-loop means "I am the root"
      this.rank.set(element, 0); // Single element has rank 0
    }
  }

  // FIND: Returns the root/group ID of the element
  // Also compresses the path to speed up future operations
  find(element) {
    // Start at the element and follow parent pointers up
    let current = this.parent.get(element);

    // Keep climbing up the tree until we find the root
    // (Root is identified by: parent[root] === root)
    while (current !== this.parent.get(current)) {
      // DECISION FRAMEWORK: Path compression optimization
      // While climbing, make each node point to its grandparent
      // This flattens the tree structure
      const parentOfCurrent = this.parent.get(current);
      const grandparent = this.parent.get(parentOfCurrent);
      this.parent.set(current, grandparent);

      // Move up to the next level
      current = this.parent.get(current);
    }

    // We've reached the root - this is the group ID
    return current;
  }

  // UNION: Merge the groups containing element1 and element2
  // Returns true if merged, false if already in same group
  union(element1, element2) {
    // Step 1: Find which group each element belongs to
    const root1 = this.find(element1);
    const root2 = this.find(element2);

    // DECISION FRAMEWORK: Check if already connected
    if (root1 === root2) {
      // Both elements are already in the same group
      return false;
    }

    // Step 2: Merge the two groups by connecting their roots
    // DECISION FRAMEWORK: Union by rank - keep trees balanced
    const rank1 = this.rank.get(root1);
    const rank2 = this.rank.get(root2);

    if (rank1 > rank2) {
      // Tree 1 has higher rank - make it the parent to maintain balance
      this.parent.set(root2, root1);
      // Rank doesn't change (smaller rank tree attached to larger rank tree)
    } else if (rank1 < rank2) {
      // Tree 2 has higher rank - make it the parent to maintain balance
      this.parent.set(root1, root2);
      // Rank doesn't change (smaller rank tree attached to larger rank tree)
    } else {
      // Both trees same rank - pick either as parent
      this.parent.set(root1, root2);
      // Rank increases by 1 (two equal rank trees merged)
      this.rank.set(root2, rank2 + 1);
    }

    return true; // Successfully merged two groups
  }

  // Helper: Check if two elements are in the same group
  isConnected(element1, element2) {
    // Elements are connected if they have the same root
    return this.find(element1) === this.find(element2);
  }
}

// === EXAMPLE USAGE: Social Network Problem ===
// We have 7 people (numbered 1-7)
const socialNetwork = new UnionFind(7);

// List of friendships to process
const friendships = [
  [1, 2], // Person 1 and 2 become friends
  [2, 3], // Person 2 and 3 become friends (now 1,2,3 are all connected!)
  [4, 5], // Person 4 and 5 become friends
  [6, 7], // Person 6 and 7 become friends
  [5, 6], // Person 5 and 6 become friends (now 4,5,6,7 are all connected!)
];

// Process each friendship (merge friend groups)
console.log("Processing friendships:");
for (const [person1, person2] of friendships) {
  const wasNewConnection = socialNetwork.union(person1, person2);
  if (wasNewConnection) {
    console.log(`Connected person ${person1} and ${person2}'s groups`);
  } else {
    console.log(`Person ${person1} and ${person2} already in same group`);
  }
}

// Check if specific people are friends (directly or indirectly)
console.log("\nChecking connections:");
console.log(`Are 1 and 3 friends? ${socialNetwork.isConnected(1, 3)}`); // true
console.log(`Are 1 and 4 friends? ${socialNetwork.isConnected(1, 4)}`); // false
console.log(`Are 4 and 7 friends? ${socialNetwork.isConnected(4, 7)}`); // true

// Count how many separate friend groups exist
const uniqueGroups = new Set();
for (let person = 1; person <= 7; person++) {
  const groupID = socialNetwork.find(person);
  uniqueGroups.add(groupID);
}
console.log(`\nTotal friend groups: ${uniqueGroups.size}`); // 2 groups

Doubt buster

Common doubt: "Path compression changes the tree structure dramatically, but we NEVER update the ranks afterward. Won't this mess up our carefully maintained ranks? Isn't this a bug?"

This seems like a serious inconsistency - we use ranks to keep trees balanced, but then path compression flattens the tree without updating ranks. Let's explore why this apparent "bug" is actually a brilliant design decision.

Why this seems wrong

Consider what happens with path compression:

Before find(5):                After find(5) with path compression:
    1 (rank=3)                      1 (rank=3 - UNCHANGED!)
    |                              /|\\
    2 (rank=2)                    2 3 4 5
    |
    3 (rank=1)
    |
    4 (rank=0)
    |
    5 (rank=0)

Actual height: 5                Actual height: 2
Stored rank: 3                  Stored rank: STILL 3!

The tree went from height 5 to height 2, but the rank stayed at 3. This looks completely wrong!

The key insight: Ranks become meaningless for height

Here's the crucial realization: After path compression starts happening, ranks become completely disconnected from actual tree height.

The rank can be TOTALLY WRONG - not even an estimate:

A tree with rank 10 might have actual height 1 (after aggressive compression)
A tree with rank 3 might have actual height 3 (if never compressed)
The rank tells you NOTHING about current structure

Why we don't need to update ranks

1. Ranks are only used for one thing: choosing parents during union

// The ONLY place ranks matter:
if (rank1 > rank2) {
  parent[root2] = root1; // Higher rank becomes parent
}

Once we've decided which root becomes the parent, the internal structure of each tree doesn't affect future unions - only the root's rank matters.

2. Path compression only affects non-root nodes

Before compression:          After compression:
    1 (rank=2)                  1 (rank=2)
    |                          /|\
    2                         2 3 4
    |
    3
    |
    4

Node 1 is still the root!
Node 1's rank is still valid for union decisions!
Nodes 2,3,4 don't have meaningful ranks anyway (they're not roots)

3. Even completely wrong ranks still prevent the worst case

Here's the surprising part - even when ranks are COMPLETELY WRONG, they still help:

Tree A (rank=10, actual height=1 after massive compression):
    A
   /|\\...
  (50 nodes all direct children)

Tree B (rank=2, actual height=2):
    B
    |
    C
    |
    D

Union(A,B) → B attaches under A (rank 10 > rank 2)

Why this still helps:
- Tree A has rank 10 because it WAS tall at some point
- It got that rank by absorbing many trees
- Even though it's flat now, it likely has MORE NODES
- Attaching the smaller tree to the larger (by node count) is still good!

The key insight: Ranks accidentally correlate with tree size (number of nodes) even when they're completely wrong about height!

Can this be proven mathematically?

Actually, YES! There's a mathematical relationship between rank and minimum number of nodes:

Theorem: A tree with rank r contains at least 2^r nodes.

Proof by induction:

Base case: rank 0 = single node = 2^0 = 1 node ✓
When we merge two rank-r trees: we get rank r+1 with at least 2^r + 2^r = 2^(r+1) nodes ✓
Path compression NEVER changes ranks or removes nodes
Therefore: rank r → at least 2^r nodes ALWAYS holds

Example:

Tree with rank 10 → has at least 2^10 = 1,024 nodes
Tree with rank 2 → has at least 2^2 = 4 nodes
So attaching rank-2 tree under rank-10 tree makes sense!

Why attaching smaller-rank to larger-rank is optimal:

Consider what happens to the maximum possible find() path length:

Option A: Attach rank-2 tree under rank-10 tree

Nodes in rank-10 tree: their paths unchanged
Nodes in rank-2 tree: their paths increase by 1
Worst affected: at most 2^2 = 4 nodes get longer paths
Total "cost": 4 nodes × 1 extra step = 4 units

Option B: Attach rank-10 tree under rank-2 tree

Nodes in rank-2 tree: their paths unchanged
Nodes in rank-10 tree: their paths increase by 1
Worst affected: at least 2^10 = 1,024 nodes get longer paths
Total "cost": 1,024 nodes × 1 extra step = 1,024 units

Mathematical conclusion:

Option A affects ≤ 4 nodes negatively
Option B affects ≥ 1,024 nodes negatively
Option A is 256× better than Option B!

Most importantly, this prevents the worst-case scenario:

Without union by rank, we could create a chain of n elements with height n:

1 → 2 → 3 → 4 → ... → n (height = n, find() = O(n))

With union by rank, the maximum height is logarithmic:

Maximum height ≤ log₂(n)
Tree with n nodes → rank ≤ log₂(n) → height ≤ log₂(n)

Why this guarantee holds:

To get rank r, you must merge two rank r-1 trees
Each rank increase doubles the minimum nodes
Therefore: n nodes → maximum rank = log₂(n)
Even without path compression, find() is O(log n), not O(n)!

This is why union by rank works: it prevents linear chains AND minimizes total path lengths. Even when ranks are "wrong" about height, they're RIGHT about preventing worst-case degradation to O(n) operations!

Concrete example: Why updating ranks would be expensive and unnecessary

Imagine if we DID update ranks after every path compression:

find(deepElement) {
  // ... do path compression ...

  // Now we'd need to:
  // 1. Traverse the ENTIRE tree to find actual height
  // 2. Update rank for the root
  // 3. This turns O(α(n)) into O(n) - terrible!
}

Updating ranks would require examining the entire tree structure after each compression - this would destroy the near-constant time complexity we worked so hard to achieve!

Why this "broken" system actually works

The brilliant insight is that we don't need accurate heights at all. Here's why:

Ranks become a proxy for tree size - Mathematically proven: rank r guarantees ≥ 2^r nodes
Higher rank = exponentially more nodes - rank 10 has 256x more nodes than rank 2 (minimum)
Attaching smaller to larger (by node count) keeps trees balanced - And ranks achieve this!
Path compression fixes everything - Even if unions aren't perfect, compression flattens all paths

The beautiful paradox

By accepting that ranks become completely meaningless as height measurements:

We avoid expensive height recalculation (keeps O(α(n)) complexity)
Ranks still provide useful heuristic for union decisions (via correlation with size)
Path compression covers for any bad decisions
The two optimizations work together despite ranks being "wrong"

The bottom line: Ranks being completely disconnected from actual height isn't a bug - it's a feature! The algorithm works because it doesn't actually need accurate heights. The ranks serve as a "fossil record" of past merges that happens to correlate with tree size, and that's good enough when combined with path compression.