Photo by Clay Banks on Unsplash

Coding Pattern : Union-Find Algorithm

Nishant Tanwar 🕉️
9 min readFeb 2, 2024

--

In this article we will be deep diving into one of the most important graph algorithms called Union-Find.

Union-Find Algorithm is composed of two operations, Union and Find.

But what do they actually mean and why do we use them?

These are the bare bones definitions of the two terms. Now we are going to find out what do they signify in terms of the data and the usage of that data.

So from the above example it should have been clear now that union find algorithm is all about connected components. It gives us a way to first merge(Union) disjoint sets (non-overlapping sets), and also find out if an element belongs to particular Union using the operation Find.

But How do we achieve this programmatically?

We basically have two arrays:-

  1. Parents : To store the parent pointer of the sets themselves.
  2. Weights : To store the total weight of the set.

Each index of the parents array represents the disjoint set value and the contents of the array at that index represents its parent. In the starting when all the disjoint sets are independent, we initialize the parents array with their respective index itself.

Whenever we merge two sets, we manipulate the values of these two arrays. While merging lets say 0 and 1, we set the parents index value of one to another like below.

Now the parent pointer of ‘0’ set points to ‘1’ as its parent. This array now represents the below connected components.

Now again, for some reason if we have to associate set with index ‘2’ with the set ‘3’ we do so by modifying the array like below.

This was about Union operation.

Weights array is used to decide which of the two sets (the sets which we are making a union of) should become the parent. The set whose root node has the maximum weight out of the two becomes the parent (we will see this in the upcoming simulation example and code). Also the weight of all the children of the root node gets added to the weight of the root node(this will also be demonstrated below).

Coming to the Find operation, lets say now we need to connect the ‘3’ with the ‘4’ index set. But since set ‘3’ is part of set ‘2’, do we connect ‘4’ to ‘2’ or ‘3’ to ‘4’. So instead of connecting ‘3’ with ‘4’ we should connect ‘2’ to ‘4’ since ‘2’ is the root now. But how do we identify if ‘3’ is already part of some set, we do so using Find operation.

We execute the Find operation using Path compression algorithm. As mentioned above we basically recursively look for the parent of the current set until we encounter a set whose parent is same as the set itself. Only the root of the connected component will have itself as the parent. Hence we break our recursion at that point. In the above example we stop our recursion at set ‘2’ since it is the set who points to itself i.e. it has the value ‘2’ itself in its index position.

The code for Disjoint-Set class in C# :

public class DisjointSet
{
public int[] parents;
public int[] weights;

public DisjointSet(int n)
{
parents = new int[n];
weights = new int[n];

for(int i = 0 ; i < n ; i++)
{
parents[i] = i;
weights[i] = i;
}
}
public int Find(int a)
{
/*Path compression, recursively move towards the parent
pointer*/
while(a != parents[a])
{
a = parents[parents[a]];
}
return a;
}
public void Union(int a,int b)
{
/*Finding the roots of the two nodes
using the Find operation because we
always perform merge operation at the root*/
int rootA = Find(a);
int rootB = Find(b);

/*Which amoung the two root nodes needs to become
the parent is decided based on the weight. The
root which has a lesser weight becomes the child
of the other root. Also the weight of the lesser weight
root gets added to the heavier one, to reflect the change
in the structure*/
if(weights[rootA] > weights[rootB])
{
weights[rootA] += weights[rootB];
parents[rootB] = rootA;
}
else
{
weights[rootB] += weights[rootA];
parents[rootA] = rootB;
}
}
}

Simulation of the Union and Find operation :-

Sample Problems:

1. Number of Provinces

Below is the link for the “Number of Provinces” LeetCode Problem.

In this problem we need to find the number of provinces. Provinces are a group of directly or indirectly connected cities. Basically provinces are the connected components in graph terminology.

In the above diagram, “1”, “2” , “3” ,”4" are all cities. But since “1” , “2” and “3” are connected they form a province(i.e. a conncted component) and “4” forms a separate province. Hence in the above diagram there are in total 2 provinces. Hope this is clear.

In the first sample test case of the problem we are given a structure like below, where there are three cities represented by index “0”, “1” and “2”.Note that input array is 0 index based and our parents array is also 0 index based so this matches perfectly. Now the “isConnected” array represents the relationship as below. “0” is connected to itself (hence 1 in index 0,0), “1” is connected to itself(hence 1 in index 1,1) and “2” is connected to itself. Apart from itself, “0” is also connected to “1” (hence this relationship is also represented by 1 at index (0,1) and (1,0)).

To solve this questions we first iterate over the “isConnected” array and we call the Union function for the cities we detect a relationship for.

Once we have the properly populated and connected Disjoint Set data structure we once again iterate over all the cities and try to find out the roots for them. We store the roots in a HashSet data structure so that we only store the unique ones. At the end of this loop we are left with a unique collection of root nodes which are nothing but our provinces.

public class DisjointSet
{
public int[] parents;
public int[] weights;

public DisjointSet(int n)
{
parents = new int[n];
weights = new int[n];

for(int i = 0 ; i < n ; i++)
{
parents[i] = i;
weights[i] = i;
}
}
public int Find(int a)
{
/*Path compression, recursively move towards the parent
pointer*/
while(a != parents[a])
{
a = parents[parents[a]];
}
return a;
}
public void Union(int a,int b)
{
/*Finding the roots of the two nodes
using the Find operation because we
always perform merge operation at the root*/
int rootA = Find(a);
int rootB = Find(b);

if(weights[rootA] > weights[rootB])
{
weights[rootA] += weights[rootB];
parents[rootB] = rootA;
}
else
{
weights[rootB] += weights[rootA];
parents[rootA] = rootB;
}
}
}
public class Solution {
public int FindCircleNum(int[][] isConnected)
{
int n = isConnected.Length;

DisjointSet set = new(n);

int numberOfProvinces = 0;

/*Create the union for the connected cities*/
for(int i = 0 ; i < n; i++)
{
for(int j = 0; j < n; j++)
{
if(isConnected[i][j] == 1)
{
set.Union(i,j);
}
}
}

HashSet<int> provinces = new();
for(int i = 0 ; i < n; i++)
{
/*Every unique root is a province for us since
the connected cities will all share a common root city*/
int root = set.Find(i);
if(!provinces.Contains(root))
{
provinces.Add(root);
}
}

return provinces.Count;

}
}

2. Number of Unique Categories

Below is the link for “Number of Unique Categories” Leetcode problem.

Just like in the previous problem, here too we are to find the connected components of sorts. Here those connected components are called “Unique Categories”.

We have n elements here numbered from 0 to n-1. We are to find unique categories out of them. Just like connected cities here a handler function “haveSameCategory” gives us the information about the connection between two elements.

/**
* Definition for a category handler.
* class CategoryHandler {
* public CategoryHandler(int[] categories);
* public bool HaveSameCategory(int a, int b);
* }
*/
public class DisjointSet
{
public int[] parents;
public int[] weights;

public DisjointSet(int n)
{
parents = new int[n];
weights = new int[n];

for(int i = 0 ; i < n ; i++)
{
parents[i] = i;
weights[i] = i;
}
}
public int Find(int a)
{
while(a != parents[a])
{
a = parents[parents[a]];
}
return a;
}
public void Union(int a,int b)
{
int rootA = Find(a);
int rootB = Find(b);

if(weights[rootA] > weights[rootB])
{
weights[rootA] += weights[rootB];
parents[rootB] = rootA;
}
else
{
weights[rootB] += weights[rootA];
parents[rootA] = rootB;
}
}
}

public class Solution {
public int NumberOfCategories(int n, CategoryHandler categoryHandler)
{
DisjointSet set = new DisjointSet(n);

int groups = 0;
bool belongsToAnExistingGrp = false;

for(int i = 0 ; i < n ; i++)
{
for(int j = 0 ; j < i ; j++)
{
if(categoryHandler.HaveSameCategory(i,j))
{
set.Union(i,j);
belongsToAnExistingGrp = true;
break;
}
}
if(!belongsToAnExistingGrp)
{
groups++;
}
belongsToAnExistingGrp = false;
}

return groups;
}
}

In the above code, unlike the previous example where we wrote the code in 2 passes, we are doing both the union and group calculation in a single pass. We are doing this by keeping track of a unique category, by setting the “belongToAnExistingGrp” flag as true whenever we detect a non unique category. So for the cases where we have this flag as false we have encountered a unique category and we can increment our category count.

3. The Earliest Moment when everyone became friends

I had already published an article before for this problem, please click below to navigate to that.

--

--