php: doing recursion with recursive iterator(iterator)s

recursion has a bad reputation amongst programmers; it’s convoluted and complicated and difficult to debug, a real footgun. it’s something you do at school (if you went to school for that sort of thing) and then never touch again if you can avoid it. which is a drag, because there’s a lot of use cases for recursion. data structures of arbitrary depth are everywhere: file systems, dom trees, that 32kb json packet your integration partner just shovelled into your api.

in this post we’re going to over two features of php that help make recursion easier: the RecursiveIterator interface, which provides us with methods and features that make writing recursive functions easier, and the dreadfully-named RecursiveIteratorIterator class which we can use to flatten down arbitrarily-deep data structures.

php to developers: “say ‘iterator’ five times fast”

we’ll be building a recursive function using RecursiveArrayIterator, starting with a simple loop and working up to the full function. then we’ll look at how leverage RecursiveIteratorIterator to smash that nested array into a single level so we can extract data, either with a simple loop or a more-complex-but-powerful call to iterator_apply.

our sample array

all of the examples here are going to use the following sample array that represents an underwhelming collection of record albums:

$albums = [ 
    'first_name' => 'grant',
    'last_name' => 'horwood',
    'albums' => [
        [
            'title' => 'pottymouth',
            'artist' => 'bratmobile',
            'year' => 1993,
        ],  
        [   
            'title' => 'monks music',
            'artist' => 'monk, thelonious',
            'year' => 1957,
        ],  
    ],  
];

this array contains three top-level elements, one of which (albums) is an array of arrays. this is good stuff for recursion.

the RecursiveIterator interface

here are the two main things we need to know about the RecursiveIterator interface:

  1. it’s an interface: RecursiveIterator doesn’t actually do anything itself, it’s just an interface that defines behaviour. if we want something usable, we have to look at classes that implement RecursiveIterator; classes like RecursiveArrayIterator (for handling arrays) and RecursiveDirectoryIterator (for handling filesystems).
  2. it doesn’t actually do any recursion: classes that implement RecursiveIterator don’t do recursion for us. what they do is provide tools to help us build our own recursive functions more easily.

that may seem disappointing, but it shouldn’t be: those tools that RecursiveIterator provide are useful. let’s look at the three of them.

  • hasChildren(): a data structure of arbitrary depth, like an array, can have values that are, themselves, arrays. the hasChildren method tests if the value we’re looking is an array itself. we use this to test when we need our recursive function to actually recurse. it returns a boolean.
  • getChildren(): if hasChildren returns true then we’re going to need to need to get an iterator of current value so we have something to recurse with. that’s what getChildren does.
  • a cursor: as we loop over our RecursiveIterator, there’s an internal cursor that keeps track of where we are in the array. this allows us to call methods like hasChildren for the current element. this is very handy.

if that didn’t make immediate sense, don’t worry; we’re going to go over building a recursive function with RecursiveIterator step by step, starting with a not-very-useful loop.

a simple (and not-very-useful) loop with RecursiveArrayIterator

because RecursiveArrayIterator is an iterator, we can use it in a foreach loop. let’s do that.

$rai = new RecursiveArrayIterator($albums);

foreach($rai as $k => $v) {
    @print "$k => $v".PHP_EOL;
}

this is pretty straighforward: we create a new instance of RecursiveArrayIterator with our array and then do a foreach, printing out keys and values. the results are underwhelming.

first_name => grant
last_name => horwood
albums => Array

this is straight iterator behaviour; our loop went over the top level of the array and that’s it. to start building a recursive function that does what we want, we’re going to need to use RecursiveIterator‘s hasChildren and getChildren methods.

using hasChildren

as the name implies, the purpose of hasChildren is to determine if the current value is an array that can build an iterator from. let’s add hasChildren to our disappointing loop.

$rai = new RecursiveArrayIterator($albums);

foreach($rai as $k => $v) {
    print $k;
    if($rai->hasChildren()) {
        print " has children";
    }
    print PHP_EOL;
}

here, we test if the value of the current element is an array, ie. it has a number of ‘child’ elements, and print out ‘has children’ if it is. the output is:

first_name
last_name
albums has children

ℹ️ important note about cursors: we see here that we call the method hasChildren on $rai, the RecursiveArrayIterator object itself. we can do this because RecursiveArrayIterator has an internal cursor pointing to the current element we’re looping over. as we loop, this cursor advances to the next element so that when we call methods on the iterator object it’s always for the element we’re working on.

once we have children, we can getChildren

now that we know albums has children, we can get a new RecursiveArrayIterator object for that array with getChildren and iterate over that. for instance:

$rai = new RecursiveArrayIterator($albums);

foreach($rai as $k => $v) {
    print $k." => ";
    if($rai->hasChildren()) {
        $rai2 = $rai->getChildren();
        foreach($rai2 as $k2 => $v2) {
            @print PHP_EOL."  $k2 => $v2";
        }
    }
    else {
        print $v;
    }
    print PHP_EOL;
}

the biggest change we have made here is that, when hasChildren returns true, we get a new RecursiveArrayIterator object for those children with getChildren and then loop over that. the output is:

first_name => grant
last_name => horwood
albums => 
  0 => Array
  1 => Array

now we have all the tools to build a recursive function that will handle our entire array.

at last: building a recursive function that will handle our entire array

upgrading our loop to a recursive function now only requires us to do two things:

  1. put our loop in a function
  2. have our function call itself when hasChildren return true

that’s literally it. let’s look:

$rai = new RecursiveArrayIterator($albums);

function recurse(RecursiveArrayIterator $rai) {
    foreach($rai as $k => $v) {
        if($rai->hasChildren()) {
            recurse($rai->getChildren());
        }
        else {
            print "$k => $v".PHP_EOL;
        }
    }
}

// call our recursive function
recurse($rai);

our function accepts one argument, a RecursiveArrayIterator object. when we call that function, it loops over the top level of that array, printing out key/value pairs. when it encounters a value that hasChildren, it creates a new iterator with getChildren and then calls itself with that new iterator as an argument, looping over the next level down of the array. this keeps going until the first loop runs out of elements.

the output looks like this

first_name => grant
last_name => horwood
title => pottymouth
artist => bratmobile
year => 1993
title => monks music
artist => monk, thelonious
year => 1957

this is great stuff. except that we’ve lost the structure of our array. it’s been ‘flattened’ down to one level.

of course, this can still be useful. lots of languages have built-in ‘flattening’ commands to do precisely this (i write a lot of scala and use the flatten method all the time).

but maybe we want to keep that structure. for that we need to add an accumulator.

a successful recursion

adding an accumulator to our function

as the name implies, an accumulator is a variable where we accumulate the results of our recursive function while it executes. when our recursion is done, we return it. let’s add one to our existing function:

$rai = new RecursiveArrayIterator($albums);

function recurse(RecursiveArrayIterator $rai, Array $accumlator) {
    foreach($rai as $k => $v) {
        if($rai->hasChildren()) {
            // add the return of our next call to recurse to our accumulator
            $accumlator[$k] = recurse($rai->getChildren(), []);
        }
        else {
            // add the key/value pair to the accumulator
            $accumlator[$k] = $v;
        }
    }

    return $accumlator;
}

// call our recursive function with an empty accumulator
$accumlator = recurse($rai, []);

// see the results
print_r($accumlator);

here, our accumulator is an array. we pass in an empty one when we call recurse, and in our function body instead of printing out our key/value pairs, we add them to the accumulator. when our function terminates, the accumulator is returned.

looking at the return value, we notice that the structure has been maintained. in fact, it is the exact same array as the one we passed to our function. success! kind of.

building a function that returns the same array it was passed is not exactly groundbreaking functionality. what we probably want to do with our recursive function is either transform or filter out some of those values.

we can do that, of course, by adding some code to the line where we assign a value to the accumulator. for instance, if we want to uppercase all of our values, we could add strtoupper like so:

$accumlator[$k] = strtoupper($v);

or, if we wanted remove the year element from every album, we could filter it.

if($k != 'year') {
    $accumlator[$k] = $v;
}

now we’re doing something useful!

building a more generic solution

so, now we can traverse our entire array and apply the custom filters or transformations of our choosing. this is good, but it would be much better if we could write up a generic recursion function that took filters and transformations as arguments. let’s do that.

$rai = new RecursiveArrayIterator($albums);

/**
 * Traverse the array applying $filter to each key and $transformer to each value
 *
 * @param  RecursiveArrayIterator $rai
 * @param  Array $accumlator
 * @param  callable $filter
 * @param  callable $transformer
 * @return array
 */
function recurse(RecursiveArrayIterator $rai, Array $accumlator, callable $filter = null, callable $transformer = null): Array {

    // a default filter that removes nothing
    $filter = $filter ?? fn($n) => true;

    // a default transformer that changes nothing
    $transformer = $transformer ?? fn($n) => $n;

    foreach($rai as $k => $v) {
        if($rai->hasChildren()) {
            $accumlator[$k] = recurse($rai->getChildren(), [], $filter, $transformer);
        }
        else {
            // test the filter and apply the transformer
            if($filter($k)) {
                $accumlator[$k] = $transformer($v);
            }
        }
    }

    return $accumlator;
}

// filter to apply to each key. remove elements with key 'year'
$filter = fn($n) => $n != 'year';

// transformer to apply to each value. apply uppercase to all values
$transformer = fn($n) => strtoupper($n);

// call our recursive function with an empty accumulator
$accumlator = recurse($rai, [], $filter, $transformer);

// see the results
print_r($accumlator);

we’ve added two new arguments to our function here:

  • $filter: this is a callable that accepts the key of our array element as an argument. it performs a test on the key and returns a boolean. in the above example, we test if the key is year and return false if it is. this callable is applied to each array element and only the elements that return true are added to our accumulator.
  • $transformer: a callable that accepts the value of the element as an argument and applies some functionality to that value. in the example, we uppercase the value with strtoupper. the transformed value will be added to the accumulator

when we run this example code, we get this:

Array
(
    [first_name] => GRANT
    [last_name] => HORWOOD
    [albums] => Array
        (
            [0] => Array
                (
                    [title] => POTTYMOUTH
                    [artist] => BRATMOBILE
                )
            [1] => Array
                (
                    [title] => MONKS MUSIC
                    [artist] => MONK, THELONIOUS
                )
        )
)

all the elements keyed with year have been removed and strtoupper has been applied to all the values. exactly what we asked for.

if we pass nulls for either $filter or $transformer, the default callables are used: no filtering, no transformation.

this technique has real-world uses. for the day job a long time ago, i wrote a package to redact passwords and such from arbitrary json so we wouldn’t be storing those hacker goodies in our log files; recursion with filters and transformers was the solution.

using RecursiveIteratorIterator to flatten arrays

remember back when we did the first recursion and our multi-dimensional array came back flattened to one level?

we can achieve that functionality in a far-less cumbersome way by using the RecursiveIteratorIterator class. for example:

$rai = new RecursiveArrayIterator($albums);

// create a RecursiveIteratorIterator object
$rii = new RecursiveIteratorIterator($rai);

// loop over the RecursiveIteratorIterator
foreach($rii as $k => $v) {
    print "$k => $v".PHP_EOL;
}

this loop replaces our entire recursive function and spits out the same results, a flattened array:

first_name => grant
last_name => horwood
title => pottymouth
artist => bratmobile
year => 1993
title => monks music
artist => monk, thelonious
year => 1957

essentially, the RecursiveIteratorIterator is just an iterator that loops over our RecursiveArrayIteratorand all the RecursiveArrayIterators it contains. it’s an iterator that iterates over an iterator and all the iterators that iterator contains, recursively. that sentence would have sounded ridiculous at the beginning of this post, but now it makes perfect sense.

using iterator_apply on all this stuff

flattening an array can have a lot of benefits. for instance, if we wanted to just get all of the artists in our album collection, we could write a recursive function with RecursiveArrayIterator and extract that information, or we could just flatten the whole thing with RecursiveIteratorIterator and grab every value keyed artist. the flattening approach is way faster and easier.

there are a couple of ways to extract data from a flattened array, but the first one i’ll mention is way that will not work: array_filter.

at first glance, array_filter seems perfect for the job. we want to filter our array to get only artist, after all. however, array_filter preserves keys. this is a problem since, if we look at our output above, we have two elements with the same key: artist. with array_filter, those keys would all get clobbered and we would wind up with only one value: the last one. not good.

the most direct approach is to simply filter in our foreach loop.

$rai = new RecursiveArrayIterator($albums);

$rii = new RecursiveIteratorIterator($rai);

$artists = [];
foreach($rii as $k => $v) {
    if($k == "artist") {
        $artists[] = $v;
    }
}

print_r($artists);

here, we loop over our RecursiveIteratorIterator and if the key is artist we add it to our $artists array. the result is an array of artist names. perfect.

or we could use iterator_apply.

iterator_apply applies a function to each element in an iterator. in this sense is kind of like array_map. let’s look at how we would get our artists using it:

$rai = new RecursiveArrayIterator($albums);

$rii = new RecursiveIteratorIterator($rai);

// our filtering function to extract only 'artists'
$filter = fn($n) => $n == 'artist';

$artists = [];
iterator_apply($rii,
    function($n, $filter, &$artists) {

        // test the key with the filter
        if($filter($n->key())) {
            $artists[] = $n->current();
        }

        return true;
    },
    [$rii, $filter, &$artists]);

print_r($artists);

looking at iterator_apply, we see it takes three arguments. the first argument is our iterator; very straightforward.

the second a third arguments define the function we want to apply to each iterator element. argument two is the function itself, a callable. the third argument is an array representing all the arguments that we are passing to that function. if the function we defined in argument two takes five arguments, the array here needs to have five elements. the first argument of our function and, correspondingly, the first element of our array, should always be the iterator itself.

in the example above, the function we’re passing to iterator_apply takes three arguments: the iterator, a callable we’re using to filter the elements in the iterator, and an accumulator called $artists. we’re passing $artists by reference so we can modify it in the function body and have the results available to us after execution is done.

in the body of the function, we get the key of the current element of the iterator by calling the key() method and pass it to our $filter, our callable that determines if that key is artists. if $filter returns true, we get the value of the current element of the iterator by calling the current() and add it to our accumulator $artists.

in all honesty, this approach is neither easy nor obvious and, as a result, most people just use foreach loops. but iterator_apply is quite powerful, especially if we have a complex or pre-existing function that we would like to apply. it’s worthwhile knowing it exists, even if we have to re-read the manual every time we use it.

Posted by: grant horwood

co-founder of fruitbat studios. cli-first linux snob, metric evangelist, unrepentant longhair. all the music i like is objectively horrible. he/him.

Leave a Reply