Union-Find Operations and Data Structure
Disjoint Set Union (DSU) / Union-Find
Welcome to the Disjoint Set Union (DSU) data structure, also known as Union-Find. This specialized data structure is incredibly efficient at managing a collection of disjoint (non-overlapping) sets. It's a key component in many advanced graph algorithms, most notably Kruskal's algorithm for finding Minimum Spanning Trees.
1. The Core Idea: Managing Disjoint Sets
Imagine you have a group of elements, and you want to partition them into different groups. The DSU data structure allows you to do two things very quickly:
- Find: Determine which group (or set) a particular element belongs to.
- Union: Merge two groups into a single group.
Internally, the DSU represents the sets as a forest (a collection of trees), where each tree corresponds to a set. The root of the tree is the representative of that set.
2. Core DSU Operations
2.1. The find Operation
The find(i) operation returns the representative (the root of the tree) for the set containing element i. To do this, it starts at the node for i and follows the parent pointers up the tree until it reaches the root (a node whose parent is itself).
2.2. The union Operation
The union(i, j) operation merges the two sets containing elements i and j. It first finds the representatives of both sets. If they are already the same, it means the elements are already in the same set, and nothing needs to be done. Otherwise, it makes the root of one tree a child of the root of the other tree.
3. Optimizations: The Key to Efficiency
A naive implementation of DSU can be inefficient. Two key optimizations, Path Compression and Union by Size/Rank, make its operations almost constant time on average.
-
Path Compression: During a
find(i)operation, after finding the root, we can make all nodes along the path fromito the root point directly to the root. This dramatically flattens the tree, speeding up futurefindoperations for those elements. -
Union by Size/Rank: When performing a
unionoperation, instead of arbitrarily merging the trees, we can be smarter. We can always attach the smaller tree to the root of the larger tree. This helps to keep the trees from becoming too tall. We can track either the "size" (number of nodes) or the "rank" (height) of the trees.
4. Implementation
Here is a modern JavaScript implementation that uses both Path Compression and Union by Size.
class DSU {
constructor(n) {
// parent[i] stores the parent of element i
this.parent = Array.from({ length: n }, (_, i) => i);
// size[i] stores the size of the set with representative i
this.size = Array(n).fill(1);
}
// Find the representative of the set containing i, with path compression
find(i) {
if (this.parent[i] === i) {
return i;
}
// Path compression: set the parent of i to the root
return this.parent[i] = this.find(this.parent[i]);
}
// Merge the sets containing i and j, with union by size
union(i, j) {
const rootI = this.find(i);
const rootJ = this.find(j);
if (rootI !== rootJ) {
// Union by size: attach the smaller tree to the root of the larger tree
if (this.size[rootI] < this.size[rootJ]) {
this.parent[rootI] = rootJ;
this.size[rootJ] += this.size[rootI];
} else {
this.parent[rootJ] = rootI;
this.size[rootI] += this.size[rootJ];
}
return true; // The sets were merged
}
return false; // The elements were already in the same set
}
}
5. Complexity Analysis
With both Path Compression and Union by Size/Rank, the amortized time complexity of the find and union operations is nearly constant, specifically O(α(N)), where α(N) is the inverse Ackermann function. This function grows so slowly that for all practical purposes, it can be considered a constant (less than 5).
Key Takeaway: The Disjoint Set Union is a highly optimized data structure for managing partitions of a set. Its near-constant time operations make it an essential tool for solving problems related to connectivity, such as in Kruskal's algorithm or for detecting cycles in an undirected graph.

