php: concurrency with processes. pt. 2: interprocess communication with shmop

php isn’t the sort of language where developers usually think about things like memory. we just sort of sling around variables and functions and let the internals figure out all that ‘ram stuff’ for us. let’s change that.

in first part of this series, we built a php script that was able to run a number of tasks concurrently by forking child processes. it worked pretty well, but there was a glaring, unaddressed problem: there was no way for those child processes to send data back to the parent process.

in this installment, we’re going to solve that issue by using shmop, php’s “shared memory operations”.

shmop!

posts in this series

contents

a very short and extremely optional explanation of shared memory

when a new process starts, the operating system assigns it a chunk of memory to use. processes can’t read or write to memory that isn’t their own because, well, that would be a security nightmare. perfectly reasonable.

this creates an issue for us when dealing with the processes we created with pcntl_fork in part one of this series, however, because it means there’s no easy way for the child processes to communicate with each other or their parent. child processes will get a copy of their parent’s memory when they’re created so all the variables assigned before the fork are accessible, but any changes to these variables will be limited to the child process. different memory and all that. if we want the child to be able to write to a variable that the parent process can read, we have a problem.

there are a number of solutions for this, all grouped under the general category of ‘inter process communications’ or ipc. the one we’re going to use for our php script is ‘shared memory’.

as the name implies, shared memory is a block of memory that an arbitrary number of processes can access. shared memory blocks are identified by a (hopefully) unique key. any process that knows what the key is can access that memory block. this makes it possible for a child processes to report back to their parent process; the child will write data to a shared memory block and, after it quits, the parent will read the shared data. it’s a borderline elegant solution.

of course, there are a few footguns we will need to avoid when doing this: we will need to ensure that the key that identifies a shared memory block is unique, and we will need to enforce that shared memory communication only goes one way to avoid multiple processes all trying to write to the same block at the same time and causing a mess. we’ll cover all this in the implementation.

a basic flyover of shmop

php has a rich and robust api for dealing with shared memory. the manual states “Shmop is an easy to use set of functions that allows PHP to read, write, create and delete Unix shared memory segments”, and… it’s not wrong.

let’s look at the core steps for using shared memory:

create a unique key: all shared memory is identified by a key. any process that knows what the key of a shared memory block is can access it. traditionally, this key is created by generating data from the filesystem (ie. a value built from the inode of an existing file) because the filesystem is something all processes have in common. we will use ftok for this.

assign a memory block using the key: a process can use the key of a shared memory block to ‘open’ it using shmop_open. if the shared memory block does not exist, this creates it. the return value from the open function is pointer that can be used for reading and writing. if you’ve ever used fopen and fwrite before, this process should be familiar.

write data to the memory block: writing to shared memory has a very similar interface as fwrite. the pointer is used, and the string to write to memory is passed as an argument. the function to do this is called shmop_write.

read data from the memory block: reading data from shared memory is done with shmop_read, again using the pointer from shmop_open. the return value is a string.

delete the memory block using the key: deleting shared memory after it’s no longer needed is important. this is done with shmop_delete.

the actual implementation

let’s start with an example. the code below works and, if you are sufficiently un-curious or a tl;dr-type, you can just copy-paste-modify this, but for everyone else, we’ll go over all the shmop steps and explain what they do and how they work.

<?php

// the file used by ftok. can be any file.
$shmop_file = "/usr/bin/php8.3";

for($i = 0; $i < 4; $i++) {
    // create the fork
    $pid = pcntl_fork();

    // an error has ocurred
    if($pid === -1) {
        echo "error".PHP_EOL;
    }

    // child process
    else if(!$pid) {

        // create a random 'word' for this child to write to shared memory 
        $random_word = join(array_map(fn($n) => range('a', 'z')[rand(0, 25)], range(1,5)));

        // write to shmop
        $shm_key = ftok($shmop_file, $i);
        $shm_id = shmop_open($shm_key, 'n', 0755, 1024);
        shmop_write($shm_id, $random_word, 0);

        print "child $i wrote '$random_word' to shmop".PHP_EOL;

        // terminate the child process
        exit(0);
    }
}

// wait for all child processes to finish
while(($pid = pcntl_waitpid(0, $status)) != -1) {
    echo "pid $pid finished".PHP_EOL;
}

// read all our shared memories
for($i = 0; $i < 4; $i++) {

    // recreate the shm key
    $shm_key = ftok($shmop_file, $i);

    // read from the shared memory
    $shm_id = shmop_open($shm_key, 'a', 0755, 1024);
    $shmop_contents = shmop_read($shm_id, 0, 1024);

    print "reading '$shmop_contents' from child $i".PHP_EOL;

    // delete the shared memory so the shm key can be reused in future runs of this script
    shmop_delete($shm_id);
}

creating a shared memory key with ftok

as we covered above, all shared memory blocks are identified by a unique integer key and before we can get down to the task of assigning memory we have create that key.

in all honesty, we can use any integer we want to, so long as it is unique, however the generally accepted, canonical way to do this is by using ftok to create an integer using an existing file in the filesystem as a reference point.

the rationale for doing this is pretty straightforward. processes don’t know anything about each other, which makes it difficult for them to share a mutually agreed-upon value. one of the few things all processes on a system do have in common, though, is the filesystem. hence, ftok.

in addition to the path to an existing file, ftok also takes a project_id argument. this is, according to the docs, a ‘one character string’, what people in every other programming language would call a ‘char’. the purpose of the project id is to prevent collisions when creating shared memory. if two projects by two separate vendors both decided to use /etc/passwd as their argument to ftok, chaos would ensue.

let’s look at a fairly straightforward example:

$shm_key = ftok('/usr/bin/php8.3', 'j');
print "shm_key = $shm_key";

here we are passing the full path to a file that we know exists on the system and providing a one character project_id, ‘j’. if we run this, the print statement will output something like:

shm_key = 855706266

that’s a good integer to use for creating our shared memory!

if you run this code on your system, you will almost certainly get a different return value, even though you have used the same arguments. this is because, under the hood, ftok uses the inode of the file, and that is different from system to system.

if, for some reason, we pass to ftok a file that doesn’t exist, we get a warning.

PHP Warning:  ftok(): ftok() failed - No such file or directory in <our script> on line <the line>
shm_key = -1

note that this is only a warning and ftok will charge ahead and give us a value of -1, which will result in problems down the road. be careful.

now, let’s revist our call to ftok on line 22:

$shm_key = ftok($shmop_file, $i);

here we have passed ftok the path of the file we have set in $shm_key, in this case /usr/bin/php8.3, a file we know exists on the system.

for our project_id we are using $i, the index of the array we are looping over. we are doing this so that each of our child processes has it’s own shared memory block to store its results. remember that if more than one process tries to write to shared memory, Bad Things happen. using the index here helps us avoid that.

opening a memory block with shmop_open

if you’ve ever done file access with tools like php’s fopen and fwrite, then using shmop will be very familiar.

let’s start with opening a shared memory block with shmop_open:

$shm_id = shmop_open($shm_key, 'n', 0755, 1024);

this function takes four arguments:

  • key: the unique key we created using ftok.
  • mode: the type of access we want. when opening a file with fopen we use modes like r for ‘read’ or w for write. shmop_open‘s mode is similar to that, but there are differences. we’ll go over all those below.
  • permissions: the read/write/execute permissions of the shared memory block in octal notation. the interface for dealing with shared memory is strongly analogous to file access, and that includes permissions. if you’re not confident with octal notation, there are file permissions calculators you can use. we’re using 0755 in this example, but you may want to tighten that up.
  • size: the size of the memory block in bytes. in the example, we are assigning one megabyte, which is clearly overkill. however, note that if we overwrite our shared memory block, the value will be truncated. if we try to read more bytes from a memory block than its size, a fatal error occurs. if we are unsure of the exact size of the data we will be writing to memory, it is better to overestimate how big we need our memory block to be.

an important note here is that calling shmop_open will create a new memory block if one at that key doesn’t already exist. this is similar to how fopen behaves, but with shmop_open this behaviour is dependent on the ‘mode’ argument we pass.

as shown in the example, shmop_open returns a pointer that can be used for access: reading or writing, depending on the mode used to open the memory block.

007 are bad permissions for a spy

a little bit more about that ‘mode’ argument

the mode argument that we pass to shmop_open determines how we can access our shared memory block. there are four options, all covered in the official documentation, but for the sake of simplicity, we’ll only look at the two we need for our purposes.

  • n: the ‘n’ stands for ‘new’ and is used when we want to create a new shared memory block. there does exist a mode c for ‘create’, but we are choosing to use n here because this mode will fail if we try to open a memory block that already exists. that’s a safety feature! in fact, the docs state that using n to create a new shared memory block is ‘useful’ for ‘security purposes’. the pointer returned from shmop_open using mode n is writeable.
  • a: this is for ‘access’; ie. reading. do not confuse this with fopen‘s a mode, which is for ‘append’. memory blocks opened with mode a are read-only.

if we look at the example, we can see that when we open the shared memory block in the child process to write our data, we use the n mode. this creates the new memory block in a safe way and returns a pointer that we can write to.

using shmop_write to… write.

once our child process has created a new shared memory block and received a pointer to it, it can write whatever it wants there using shmop_write.

in our example, doing this looks like:

shmop_write($shm_id, $random_word, 0);

the shmop_write function takes three arguments:

  • the pointer: the pointer returned from shmop_open. note that shmop_open must be called with a mode that allows writing (n in our example), otherwise attempts to use shmop_write will fail.
  • the value to write: the string to write to the shared memory block.
  • the offset: the number of bytes in memory to offset the start point of the write by. using the offset can allow us to append to a value already in the shared memory block, but doing this means keeping track of bytes written and can become unmanageable pretty quickly. in our example, we use the offset 0; we start writing at the beginning of the memory block.

shmop_write returns, as an integer, the number of bytes written.

a short note about shmop_close

if you’ve done file access using fopen, you’re probably (hopefully!) in the habit of calling fclose when you’re done writing.

we do not do that with shmop.

there is a shmop_close function, but it has been deprecated since php 8.0 and does nothing (other than throw a deprecation warning, that is). the standard practice with shmop is to just leave the pointer ‘open’ after we’re done writing. we’ll delete it later.

reading from shared memory

once all the child processes have written their data to their respective shared memory blocks an exited, all that remains is for the parent process to read that data. the strategy for this is:

  • recreate the key of the shared memory block
  • use the key to open the shared memory in ‘access only’ mode
  • read the data into a variable

let’s look again at the example we have for reading shared memory.

// read all our shared memories
for($i = 0; $i < 4; $i++) {

    // recreate the shm key
    $shm_key = ftok($shmop_file, $i);

    // read from the shared memory
    $shm_id = shmop_open($shm_key, 'a', 0755, 1024);
    $shmop_contents = shmop_read($shm_id, 0, 1024);

    print "reading '$shmop_contents' from child $i".PHP_EOL;

    // delete the shared memory so the shm key can be reused in future runs of this script
    shmop_delete($shm_id);
}

recreating the shmop key

when we made the key to create our shared memory blocks, we used ftok with two arguments: the path an existing file in the filesystem, and a ‘project id’. for the project id, we used the index of the array we looped over to fork multiple children.

we can use the exact same strategy to recreate the keys for reading. as long as we input the same two arguments into ftok, we get the same value back.

opening the shared memory

we open the shared memory block for reading almost exactly the same way as we did above for writing. the only difference is the mode.

for reading, we use the a mode. this stands for ‘access’, and gives us a read-only pointer to our shared memory block.

reading from the shared memory block

once we have a pointer to our shared memory block, we can read from it using shmop_read.

shmop_read takes three arguments:

  • the pointer we got from shmop_open.
  • the offset, in bytes. since we are reading the entirety of the memory block, starting at the beginning, this is 0 in our example (and will probably be for most real-life uses, as well)
  • the number of bytes to read. in most cases, the smart thing here is to just read the entire size of the block, in our example 1024 bytes.

the return type is a string. if there are errors reading, we get a boolean false.

deleting shared memory blocks

once we are done reading our shared memory, we can delete it.

this is an important step. unlike variables in our script, the memory we assigned with shmop will persist after our program has exited, hogging resources. we do not want to litter our system with blocks of unused, reserved memory, piling up higher and higher with each successive run of our script!

freeing up shared memory blocks is done with shmop_delete. this function takes one argument: the pointer we created with shmop_open, and returns a boolean true on success.

note that shmop_delete destroys the memory block and frees up the space for other applications to use. we should only call it when we’re completely done with using the memory.

handling errors

the example we’ve been going over doesn’t really do any error handling. this is a decision borne out of a desire for brevity, not delusional optimism. in real applications we should certainly do some error testing!

we used a path to a file as an argument for ftok; we should test that it exists. shmop_write will throw a value error if our memory block is opened read-only or we overwrite its size. that should be handled. if there’s a problem reading data, shmop_read will return false. test for that.

i am asking you to do some error handling

fixing ‘already exists’ errors with shmop_open

if we open a shared memory block and then the script terminates before we call shmop_delete, the memory block still exists. if we then try to open that memory block again with shmop_open using the n mode, we will get the error:

PHP Warning:  shmop_open(): Unable to attach or create shared memory segment "File exists" in /path/to/my/script on line <line number>

if our script is well-designed, this shouldn’t happen. but, while developing and testing we may create these orphaned memory blocks. let’s go over how to delete them.

the first step is to get the key of the memory block as a hex number. we do this by calling ftok as normal, and then converting the returned integer from base ten to base-16 like so:

$shm_key = ftok($shmop_file, $i);
$shm_key_hex = "0x".base_convert($shm_key, 10, 16);

we do this because linux comes with a number of ‘interprocess communication’ tools that we can use to manage shared memory blocks, and they all use hexadecimal numbers for their keys.

the first command line tool we’ll use is ipcs. we’re going to use this to confirm that the shared memory block we want to delete does, in fact, exist.

the ipcs command, when run without arguments, will output all interprocess communication channels, including all shared memory blocks. we’ll narrow down that output by using grep with the hexadecimal key we created above. for instance, if our shared memory block’s key in hexadecimal is 0x33010024, we could do this:

ipcs | grep "0x33010024"

if we get a line of output, the memory block exists. if nothing is returned, it does not.

once we’ve confirm that a shared memory block exists, we can remove it with ipcrm

ipcrm --shmem-key 0x33010024

knowing how to inspect and clean up (without resorting to a restart) shared memory allows us to develop and experiment without turning our ram into a ghost town of abandoned blocks.

wrapping up

achieving concurrency in php using fork and shared memory does take some effort and knowledge (and the official manual is scant help). but it does work and, if you’ve made it through this article and the first installment on pcntl_fork, you should have a good base from which to start.

Posted by grant horwood

co-founder of fruitbat studios. cli-first linux snob, metric evangelist, unrepentant longhair. all the music i like is objectively horrible. he/him.

         

Leave a Reply