recursion has a bad reputation amongst programmers; it’s convoluted and complicated and difficult to debug, a real footgun. it’s something you do at school (if you went to school for that sort of thing) and then never touch again if you can avoid it. which is a drag, because there’s a lot of use cases for recursion. data structures of arbitrary depth are everywhere: file systems, dom trees, that 32kb json packet your integration partner just shovelled into your api.
in this post we’re going to over two features of php that help make recursion easier: the RecursiveIterator
interface, which provides us with methods and features that make writing recursive functions easier, and the dreadfully-named RecursiveIteratorIterator
class which we can use to flatten down arbitrarily-deep data structures.
we’ll be building a recursive function using RecursiveArrayIterator
, starting with a simple loop and working up to the full function. then we’ll look at how leverage RecursiveIteratorIterator
to smash that nested array into a single level so we can extract data, either with a simple loop or a more-complex-but-powerful call to iterator_apply
.
our sample array
all of the examples here are going to use the following sample array that represents an underwhelming collection of record albums:
$albums = [ 'first_name' => 'grant', 'last_name' => 'horwood', 'albums' => [ [ 'title' => 'pottymouth', 'artist' => 'bratmobile', 'year' => 1993, ], [ 'title' => 'monks music', 'artist' => 'monk, thelonious', 'year' => 1957, ], ], ];
this array contains three top-level elements, one of which (albums
) is an array of arrays. this is good stuff for recursion.
the RecursiveIterator
interface
here are the two main things we need to know about the RecursiveIterator
interface:
- it’s an interface:
RecursiveIterator
doesn’t actually do anything itself, it’s just an interface that defines behaviour. if we want something usable, we have to look at classes that implementRecursiveIterator
; classes likeRecursiveArrayIterator
(for handling arrays) andRecursiveDirectoryIterator
(for handling filesystems). - it doesn’t actually do any recursion: classes that implement
RecursiveIterator
don’t do recursion for us. what they do is provide tools to help us build our own recursive functions more easily.
that may seem disappointing, but it shouldn’t be: those tools that RecursiveIterator
provide are useful. let’s look at the three of them.
hasChildren()
: a data structure of arbitrary depth, like an array, can have values that are, themselves, arrays. thehasChildren
method tests if the value we’re looking is an array itself. we use this to test when we need our recursive function to actually recurse. it returns a boolean.getChildren()
: ifhasChildren
returnstrue
then we’re going to need to need to get an iterator of current value so we have something to recurse with. that’s whatgetChildren
does.- a cursor: as we loop over our
RecursiveIterator
, there’s an internal cursor that keeps track of where we are in the array. this allows us to call methods likehasChildren
for the current element. this is very handy.
if that didn’t make immediate sense, don’t worry; we’re going to go over building a recursive function with RecursiveIterator
step by step, starting with a not-very-useful loop.
a simple (and not-very-useful) loop with RecursiveArrayIterator
because RecursiveArrayIterator
is an iterator, we can use it in a foreach
loop. let’s do that.
$rai = new RecursiveArrayIterator($albums); foreach($rai as $k => $v) { @print "$k => $v".PHP_EOL; }
this is pretty straighforward: we create a new instance of RecursiveArrayIterator
with our array and then do a foreach
, printing out keys and values. the results are underwhelming.
first_name => grant last_name => horwood albums => Array
this is straight iterator behaviour; our loop went over the top level of the array and that’s it. to start building a recursive function that does what we want, we’re going to need to use RecursiveIterator
‘s hasChildren
and getChildren
methods.
using hasChildren
as the name implies, the purpose of hasChildren
is to determine if the current value is an array that can build an iterator from. let’s add hasChildren
to our disappointing loop.
$rai = new RecursiveArrayIterator($albums); foreach($rai as $k => $v) { print $k; if($rai->hasChildren()) { print " has children"; } print PHP_EOL; }
here, we test if the value of the current element is an array, ie. it has a number of ‘child’ elements, and print out ‘has children’ if it is. the output is:
first_name last_name albums has children
ℹ️ important note about cursors: we see here that we call the method hasChildren
on $rai
, the RecursiveArrayIterator
object itself. we can do this because RecursiveArrayIterator
has an internal cursor pointing to the current element we’re looping over. as we loop, this cursor advances to the next element so that when we call methods on the iterator object it’s always for the element we’re working on.
once we have children, we can getChildren
now that we know albums
has children, we can get a new RecursiveArrayIterator
object for that array with getChildren
and iterate over that. for instance:
$rai = new RecursiveArrayIterator($albums); foreach($rai as $k => $v) { print $k." => "; if($rai->hasChildren()) { $rai2 = $rai->getChildren(); foreach($rai2 as $k2 => $v2) { @print PHP_EOL." $k2 => $v2"; } } else { print $v; } print PHP_EOL; }
the biggest change we have made here is that, when hasChildren
returns true
, we get a new RecursiveArrayIterator
object for those children with getChildren
and then loop over that. the output is:
first_name => grant last_name => horwood albums => 0 => Array 1 => Array
now we have all the tools to build a recursive function that will handle our entire array.
at last: building a recursive function that will handle our entire array
upgrading our loop to a recursive function now only requires us to do two things:
- put our loop in a function
- have our function call itself when
hasChildren
returntrue
that’s literally it. let’s look:
$rai = new RecursiveArrayIterator($albums); function recurse(RecursiveArrayIterator $rai) { foreach($rai as $k => $v) { if($rai->hasChildren()) { recurse($rai->getChildren()); } else { print "$k => $v".PHP_EOL; } } } // call our recursive function recurse($rai);
our function accepts one argument, a RecursiveArrayIterator
object. when we call that function, it loops over the top level of that array, printing out key/value pairs. when it encounters a value that hasChildren
, it creates a new iterator with getChildren
and then calls itself with that new iterator as an argument, looping over the next level down of the array. this keeps going until the first loop runs out of elements.
the output looks like this
first_name => grant last_name => horwood title => pottymouth artist => bratmobile year => 1993 title => monks music artist => monk, thelonious year => 1957
this is great stuff. except that we’ve lost the structure of our array. it’s been ‘flattened’ down to one level.
of course, this can still be useful. lots of languages have built-in ‘flattening’ commands to do precisely this (i write a lot of scala and use the flatten
method all the time).
but maybe we want to keep that structure. for that we need to add an accumulator.
adding an accumulator to our function
as the name implies, an accumulator is a variable where we accumulate the results of our recursive function while it executes. when our recursion is done, we return it. let’s add one to our existing function:
$rai = new RecursiveArrayIterator($albums); function recurse(RecursiveArrayIterator $rai, Array $accumlator) { foreach($rai as $k => $v) { if($rai->hasChildren()) { // add the return of our next call to recurse to our accumulator $accumlator[$k] = recurse($rai->getChildren(), []); } else { // add the key/value pair to the accumulator $accumlator[$k] = $v; } } return $accumlator; } // call our recursive function with an empty accumulator $accumlator = recurse($rai, []); // see the results print_r($accumlator);
here, our accumulator is an array. we pass in an empty one when we call recurse
, and in our function body instead of printing out our key/value pairs, we add them to the accumulator. when our function terminates, the accumulator is returned.
looking at the return value, we notice that the structure has been maintained. in fact, it is the exact same array as the one we passed to our function. success! kind of.
building a function that returns the same array it was passed is not exactly groundbreaking functionality. what we probably want to do with our recursive function is either transform or filter out some of those values.
we can do that, of course, by adding some code to the line where we assign a value to the accumulator. for instance, if we want to uppercase all of our values, we could add strtoupper
like so:
$accumlator[$k] = strtoupper($v);
or, if we wanted remove the year
element from every album, we could filter it.
if($k != 'year') { $accumlator[$k] = $v; }
now we’re doing something useful!
building a more generic solution
so, now we can traverse our entire array and apply the custom filters or transformations of our choosing. this is good, but it would be much better if we could write up a generic recursion function that took filters and transformations as arguments. let’s do that.
$rai = new RecursiveArrayIterator($albums); /** * Traverse the array applying $filter to each key and $transformer to each value * * @param RecursiveArrayIterator $rai * @param Array $accumlator * @param callable $filter * @param callable $transformer * @return array */ function recurse(RecursiveArrayIterator $rai, Array $accumlator, callable $filter = null, callable $transformer = null): Array { // a default filter that removes nothing $filter = $filter ?? fn($n) => true; // a default transformer that changes nothing $transformer = $transformer ?? fn($n) => $n; foreach($rai as $k => $v) { if($rai->hasChildren()) { $accumlator[$k] = recurse($rai->getChildren(), [], $filter, $transformer); } else { // test the filter and apply the transformer if($filter($k)) { $accumlator[$k] = $transformer($v); } } } return $accumlator; } // filter to apply to each key. remove elements with key 'year' $filter = fn($n) => $n != 'year'; // transformer to apply to each value. apply uppercase to all values $transformer = fn($n) => strtoupper($n); // call our recursive function with an empty accumulator $accumlator = recurse($rai, [], $filter, $transformer); // see the results print_r($accumlator);
we’ve added two new arguments to our function here:
$filter
: this is a callable that accepts the key of our array element as an argument. it performs a test on the key and returns a boolean. in the above example, we test if the key isyear
and returnfalse
if it is. this callable is applied to each array element and only the elements that returntrue
are added to our accumulator.$transformer
: a callable that accepts the value of the element as an argument and applies some functionality to that value. in the example, we uppercase the value withstrtoupper
. the transformed value will be added to the accumulator
when we run this example code, we get this:
Array ( [first_name] => GRANT [last_name] => HORWOOD [albums] => Array ( [0] => Array ( [title] => POTTYMOUTH [artist] => BRATMOBILE ) [1] => Array ( [title] => MONKS MUSIC [artist] => MONK, THELONIOUS ) ) )
all the elements keyed with year
have been removed and strtoupper
has been applied to all the values. exactly what we asked for.
if we pass null
s for either $filter
or $transformer
, the default callables are used: no filtering, no transformation.
this technique has real-world uses. for the day job a long time ago, i wrote a package to redact passwords and such from arbitrary json so we wouldn’t be storing those hacker goodies in our log files; recursion with filters and transformers was the solution.
using RecursiveIteratorIterator
to flatten arrays
remember back when we did the first recursion and our multi-dimensional array came back flattened to one level?
we can achieve that functionality in a far-less cumbersome way by using the RecursiveIteratorIterator
class. for example:
$rai = new RecursiveArrayIterator($albums); // create a RecursiveIteratorIterator object $rii = new RecursiveIteratorIterator($rai); // loop over the RecursiveIteratorIterator foreach($rii as $k => $v) { print "$k => $v".PHP_EOL; }
this loop replaces our entire recursive function and spits out the same results, a flattened array:
first_name => grant last_name => horwood title => pottymouth artist => bratmobile year => 1993 title => monks music artist => monk, thelonious year => 1957
essentially, the RecursiveIteratorIterator
is just an iterator that loops over our RecursiveArrayIterator
and all the RecursiveArrayIterator
s it contains. it’s an iterator that iterates over an iterator and all the iterators that iterator contains, recursively. that sentence would have sounded ridiculous at the beginning of this post, but now it makes perfect sense.
using iterator_apply
on all this stuff
flattening an array can have a lot of benefits. for instance, if we wanted to just get all of the artists in our album collection, we could write a recursive function with RecursiveArrayIterator
and extract that information, or we could just flatten the whole thing with RecursiveIteratorIterator
and grab every value keyed artist
. the flattening approach is way faster and easier.
there are a couple of ways to extract data from a flattened array, but the first one i’ll mention is way that will not work: array_filter
.
at first glance, array_filter
seems perfect for the job. we want to filter our array to get only artist
, after all. however, array_filter
preserves keys. this is a problem since, if we look at our output above, we have two elements with the same key: artist
. with array_filter
, those keys would all get clobbered and we would wind up with only one value: the last one. not good.
the most direct approach is to simply filter in our foreach
loop.
$rai = new RecursiveArrayIterator($albums); $rii = new RecursiveIteratorIterator($rai); $artists = []; foreach($rii as $k => $v) { if($k == "artist") { $artists[] = $v; } } print_r($artists);
here, we loop over our RecursiveIteratorIterator
and if the key is artist
we add it to our $artists
array. the result is an array of artist names. perfect.
or we could use iterator_apply
.
iterator_apply
applies a function to each element in an iterator. in this sense is kind of like array_map
. let’s look at how we would get our artists using it:
$rai = new RecursiveArrayIterator($albums); $rii = new RecursiveIteratorIterator($rai); // our filtering function to extract only 'artists' $filter = fn($n) => $n == 'artist'; $artists = []; iterator_apply($rii, function($n, $filter, &$artists) { // test the key with the filter if($filter($n->key())) { $artists[] = $n->current(); } return true; }, [$rii, $filter, &$artists]); print_r($artists);
looking at iterator_apply
, we see it takes three arguments. the first argument is our iterator; very straightforward.
the second a third arguments define the function we want to apply to each iterator element. argument two is the function itself, a callable. the third argument is an array representing all the arguments that we are passing to that function. if the function we defined in argument two takes five arguments, the array here needs to have five elements. the first argument of our function and, correspondingly, the first element of our array, should always be the iterator itself.
in the example above, the function we’re passing to iterator_apply takes three arguments: the iterator, a callable we’re using to filter the elements in the iterator, and an accumulator called $artists
. we’re passing $artists
by reference so we can modify it in the function body and have the results available to us after execution is done.
in the body of the function, we get the key of the current element of the iterator by calling the key()
method and pass it to our $filter
, our callable that determines if that key is artists
. if $filter
returns true, we get the value of the current element of the iterator by calling the current()
and add it to our accumulator $artists
.
in all honesty, this approach is neither easy nor obvious and, as a result, most people just use foreach
loops. but iterator_apply
is quite powerful, especially if we have a complex or pre-existing function that we would like to apply. it’s worthwhile knowing it exists, even if we have to re-read the manual every time we use it.