PHP Topological Sort Function

While profiling some code to improve its performance, I discovered that a topological sort function was to blame. Failing to find a suitable PHP topological sort function on the web, I rewrote it myself to be more general and more efficient.

Why Topological Sorting?

A traditional sort algorithm, such as quicksort or bubble sort, takes a set of items and returns them in a new order based on a certain criteria (for example they may be returned in alphabetical order). For this to be possible, the algorithm needs to be able to pick out any two items from the list and be able to figure out which comes first. Sometimes, however, we have a set of items for which this doesn’t hold true. There may be some pairs of items for which we can’t work out the correct order just by looking at those two items, and some we can. Such a set is know as a partial order. While it may seem that elements where the order is unknown could just be put in any random order, this may not always be the case. While we can’t tell by looking at them that A comes before C, we may be able to look and see that A comes before B and B comes before C, implying that A does come before C. But that requires a more global view than taken by traditional sort function.

What is Topological Sorting?

This is where topological sort algorithms come in. They provide a means of sorting a partial order, such that the order constraints are all honoured. While they are a range of ways of performing a topological sort, the simplest is to iteratively remove nodes with no inbound edges from the partial order (viewed as a directed acyclic graph with the set elements as nodes and the order constraints as edges). The nodes placed in order of their removal then form the topologically sorted list of set elements.

Topological Sorting in PHP

So after that, here is my topological sorting function. It’s reasonably efficient, and has a smaller code-footprint than any alternatives I could find on the web.

// PHP topological sort function
// Author: Dan (http://www.calcatraz.com)
// Licensing: None - use it as you see fit
// Updates: http://www.calcatraz.com/blog/php-topological-sort-function-384
// 
// Args: 
//		$nodeids - an array of node ids, 
//					e.g. array('paris', 'milan', 'vienna', ...);
// 		$edges - an array of directed edges, 
//					e.g. array(array('paris','milan'), 
//							   array('milan', 'vienna'), 
//							   ...)
// Returns: 
// 		topologically sorted array of node ids, or NULL if graph is 
//		unsortable (i.e. contains cycles)

function topological_sort($nodeids, $edges) {

	// initialize variables
	$L = $S = $nodes = array(); 

	// remove duplicate nodes
	$nodeids = array_unique($nodeids); 	

	// remove duplicate edges
	$hashes = array();
	foreach($edges as $k=>$e) {
		$hash = md5(serialize($e));
		if (in_array($hash, $hashes)) { unset($edges[$k]); }
		else { $hashes[] = $hash; }; 
	}

	// Build a lookup table of each node's edges
	foreach($nodeids as $id) {
		$nodes[$id] = array('in'=>array(), 'out'=>array());
		foreach($edges as $e) {
			if ($id==$e[0]) { $nodes[$id]['out'][]=$e[1]; }
			if ($id==$e[1]) { $nodes[$id]['in'][]=$e[0]; }
		}
	}

	// While we have nodes left, we pick a node with no inbound edges, 
	// remove it and its edges from the graph, and add it to the end 
	// of the sorted list.
	foreach ($nodes as $id=>$n) { if (empty($n['in'])) $S[]=$id; }
	while (!empty($S)) {
		$L[] = $id = array_shift($S);
		foreach($nodes[$id]['out'] as $m) {
			$nodes[$m]['in'] = array_diff($nodes[$m]['in'], array($id));
			if (empty($nodes[$m]['in'])) { $S[] = $m; }
		}
		$nodes[$id]['out'] = array();
	}

	// Check if we have any edges left unprocessed
	foreach($nodes as $n) {
		if (!empty($n['in']) or !empty($n['out'])) {
			return null; // not sortable as graph is cyclic
		}
	}
	return $L;
}

PHP Topological Sorting Example

To use the function just call it with an array of node ids and an array of edges (where each edge is an array of two elements, the start node followed by the end node). For example:

$nodes = array('1','2','3','4','5');
$edges = array(array('1','2'),
               array('3','1'),
               array('3','4'));
print_r(topological_sort($nodes, $edges));

This outputs the sorted partial order, like so:

Array
(
    [0] => 3
    [1] => 5
    [2] => 1
    [3] => 4
    [4] => 2
)

Comments

  1. Alex says:

    I am getting duplicated entries in the result.

    Is this normal or an issue?

    print_r($nodes);
    print_r($edges);
    print_r(topological_sort($nodes, $edges));

    leads to

    Array
    (
    [0] => nominal
    [1] => subtotal
    [2] => shipping
    [3] => grand_total
    [4] => msrp
    [5] => freeshipping
    [6] => discount
    [7] => tax_subtotal
    [8] => tax_shipping
    [9] => tax
    [10] => weee
    [11] => customerbalance
    [12] => giftcardaccount
    [13] => giftwrapping
    [14] => tax_giftwrapping
    [15] => bonuspoints
    )
    Array
    (
    [0] => Array
    (
    [0] => nominal
    [1] => subtotal
    )

    [1] => Array
    (
    [0] => subtotal
    [1] => grand_total
    )

    [2] => Array
    (
    [0] => nominal
    [1] => subtotal
    )

    [3] => Array
    (
    [0] => shipping
    [1] => grand_total
    )

    [4] => Array
    (
    [0] => subtotal
    [1] => shipping
    )

    [5] => Array
    (
    [0] => freeshipping
    [1] => shipping
    )

    [6] => Array
    (
    [0] => tax_subtotal
    [1] => shipping
    )

    [7] => Array
    (
    [0] => subtotal
    [1] => grand_total
    )

    [8] => Array
    (
    [0] => freeshipping
    [1] => tax_subtotal
    )

    [9] => Array
    (
    [0] => freeshipping
    [1] => shipping
    )

    [10] => Array
    (
    [0] => subtotal
    [1] => freeshipping
    )

    [11] => Array
    (
    [0] => discount
    [1] => grand_total
    )

    [12] => Array
    (
    [0] => subtotal
    [1] => discount
    )

    [13] => Array
    (
    [0] => shipping
    [1] => discount
    )

    [14] => Array
    (
    [0] => tax_subtotal
    [1] => tax
    )

    [15] => Array
    (
    [0] => tax_subtotal
    [1] => discount
    )

    [16] => Array
    (
    [0] => freeshipping
    [1] => tax_subtotal
    )

    [17] => Array
    (
    [0] => tax_shipping
    [1] => tax
    )

    [18] => Array
    (
    [0] => tax_shipping
    [1] => discount
    )

    [19] => Array
    (
    [0] => shipping
    [1] => tax_shipping
    )

    [20] => Array
    (
    [0] => tax
    [1] => grand_total
    )

    [21] => Array
    (
    [0] => subtotal
    [1] => tax
    )

    [22] => Array
    (
    [0] => shipping
    [1] => tax
    )

    [23] => Array
    (
    [0] => discount
    [1] => tax
    )

    [24] => Array
    (
    [0] => weee
    [1] => tax
    )

    [25] => Array
    (
    [0] => weee
    [1] => discount
    )

    [26] => Array
    (
    [0] => subtotal
    [1] => weee
    )

    [27] => Array
    (
    [0] => tax_subtotal
    [1] => weee
    )

    [28] => Array
    (
    [0] => weee
    [1] => customerbalance
    )

    [29] => Array
    (
    [0] => discount
    [1] => customerbalance
    )

    [30] => Array
    (
    [0] => tax
    [1] => customerbalance
    )

    [31] => Array
    (
    [0] => tax_subtotal
    [1] => customerbalance
    )

    [32] => Array
    (
    [0] => grand_total
    [1] => customerbalance
    )

    [33] => Array
    (
    [0] => giftcardaccount
    [1] => customerbalance
    )

    [34] => Array
    (
    [0] => giftcardaccount
    [1] => customerbalance
    )

    [35] => Array
    (
    [0] => weee
    [1] => giftcardaccount
    )

    [36] => Array
    (
    [0] => discount
    [1] => giftcardaccount
    )

    [37] => Array
    (
    [0] => tax
    [1] => giftcardaccount
    )

    [38] => Array
    (
    [0] => tax_subtotal
    [1] => giftcardaccount
    )

    [39] => Array
    (
    [0] => grand_total
    [1] => giftcardaccount
    )

    [40] => Array
    (
    [0] => subtotal
    [1] => giftwrapping
    )

    [41] => Array
    (
    [0] => tax_giftwrapping
    [1] => grand_total
    )

    [42] => Array
    (
    [0] => tax
    [1] => tax_giftwrapping
    )

    [43] => Array
    (
    [0] => bonuspoints
    [1] => tax
    )

    [44] => Array
    (
    [0] => subtotal
    [1] => bonuspoints
    )

    )
    Array
    (
    [0] => nominal
    [1] => msrp
    [2] => subtotal
    [3] => subtotal
    [4] => freeshipping
    [5] => giftwrapping
    [6] => bonuspoints
    [7] => tax_subtotal
    [8] => tax_subtotal
    [9] => shipping
    [10] => weee
    [11] => tax_shipping
    [12] => discount
    [13] => tax
    [14] => tax_giftwrapping
    [15] => grand_total
    [16] => giftcardaccount
    [17] => customerbalance
    [18] => customerbalance
    )

  2. Alex says:

    Okay it is because there are duplicated edges…

    What is the license of your code?

    Can you release it as public domain or so ?

    Alexander

  3. Alex says:

    Okay i just saw the license info “// Licensing: None – use it as you see fit”

    – thanks!

  4. admin says:

    Hey Alexander,

    Yeah, I managed to update the post earlier but then ran out of time to reply to your comments directly.

    As you saw, I’m more than happy for the function to be used elsewhere. Thanks for taking the time to check.

    All the best,
    Dan

  5. admin says:

    Alexander,

    Yup, the duplicated edges were resulting in node duplication in the output. I’ve updated the function to remove duplicate edges and nodes from the input arrays, which stops duplicates from appearing in the output. I’ve also added some much needed comments.

    Cheers,
    Dan

  6. Dan says:

    Here’s a link to the Stack Overflow discussion of this post:

    http://stackoverflow.com/questions/11953021/topological-sorting-in-php

Share your thoughts