php: manipulating file pointers with fseek and ftell

php developers generally don’t put a lot of thought into reading files from disk. we usually just call file or file_get_contents or similar, shovel the whole thing into ram, and then fight with the resulting array or string. if we want to do simple things with small files, it works okay. but php can do a lot more.

in this article, we’re going to look at php’s file pointer and how to manipulate it using commands like fseek and ftell, and by the end we’ll be able to build a pure php version of linux’s tail -f command.

no one knows we can do file pointer manipulation with fseek and ftell

the sample file

any discussion of file reading needs a sample file to do that reading on. for all the examples in this article we’ll be using this short, ten-line file of bands i own records by. it looks like this:

bratmobile
coltrane, john
dumb
eno, brian
fall, the
gryce, gigi
hella
idles
die kreuzen
johnson, linton kwesi

a short overview of the file pointer

in php, when we want to open a file for reading we typically use fopen:

$fp = fopen("/path/to/file", "r");

the return value from fopen is a file pointer.

file pointers keep track of where we are in the file and advance as we read data from the file. when we first create our pointer with fopen(), it is set to the beginning of the file. if we read one character from the file with fgetc(), the pointer advances one character. if we read a line with fgets(), the pointer moves to the beginning of the next line.

we can see this behaviour in action by using two loops to read parts of our file:

$fp = fopen("/path/to/bands","r");

// call fgets() three times
for($i = 0; $i < 3; $i++) {
    print fgets($fp);
}

// call fgets() three times, again
for($i = 0; $i < 3; $i++) {
    print fgets($fp);
}

the second for loop here does not read the first three lines again. instead, it picks up at line four and continue on from there. our output looks like this:

bratmobile
coltrane, john
dumb
eno, brian
fall, the
gryce, gigi

this is because the file pointer advances one line every time fgets() gets called, regardless of which loop it is in.

rewinding the file pointer

reading from a file only moves the pointer in one direction: forward. there’s no way to use fgets or fgetc to move our pointer backwards.

if we want to start reading our file from beginning again, we have to explicitly set our pointer to the start of the file with rewind. let’s look:

$fp = fopen("/path/to/bands","r");

// call fgets() three times
for($i = 0; $i < 3; $i++) {
    print fgets($fp);
}

// set the file pointer to the beginning of the file
rewind($fp);

// call fgets() three times, again
for($i = 0; $i < 3; $i++) {
    print fgets($fp);
}

here, we read the first three lines from our file, then set our pointer back to the file start and read the first three lines again. our output is:

bratmobile
coltrane, john
dumb
bratmobile
coltrane, john
dumb

setting the file pointer with fseek

if we want more control over our file pointer, we can use fseek.

fseek allows us to set our pointer to an exact byte in the file. to do this, we need to tell fseek two things:

  1. where we are starting from, aka ‘whence’
  2. how many bytes to move the pointer, aka ‘offset’

let’s look at an example:

$fp = fopen("/path/to/bands", "r");

// move our pointer four bytes
fseek($fp, 4, SEEK_SET);

here, we are moving our file pointer four bytes from the beginning of our file.

the ‘whence’ argument here is the third one. we have three possible values we can pass:

  • SEEK_SET the start of the file
  • SEEK_END the end of the file
  • SEEK_CUR the current position of the file pointer

with fseek we can move our pointer anywhere we want to. for instance:

// advance pointer 10 bytes from current position
fseek($fp, 10, SEEK_CUR);

// move pointer to 15 bytes from the end of the file
fseek($fp, -15, SEEK_END);

// set point to 5 bytes from the start of the file
fseek($fp, 5, SEEK_SET);

let’s look at that in action. here, we move the file pointer four bytes in from the start of the file, skipping the first four characters of ‘bratmobile‘ and output the remainder of the line. then we move the pointer twelve bytes from the end of the file and output.

$fp = fopen("/tmp/bands","r");

// move pointer four bytes from start of file and output line
fseek($fp, 4, SEEK_SET);
print fgets($fp);

// move pointer twelve bytes from end of file and output line
fseek($fp, -12, SEEK_END);
print fgets($fp);

the results are:

mobile
linton kwesi

doing something useful with fseek

manipulating our file pointer like this is a great trick and it will make us very popular at parties, but it’s not particularly useful. let’s change that.

we’re going to build a function that moves our file pointer to n lines from the end of the file so that we can do something like output the last five or ten (or whatever) lines. if you’ve ever used the linux tail command, you get the idea.

our approach is going to be:

  • loop backwards from the end of the file one byte at a time, using fseek
  • test each byte to see if it’s the line end character PHP_EOL
  • count our line ends
  • when we get to n lines, return the file pointer

let’s look at the implementation:

/**
 * Updates a file pointer $fp to be n bytes from the end of the file
 * so that outputting to eof prints the last $lineCount lines.
 *
 * @param mixed $fp
 * @param int $lineCount How many lines back from the end of file
 * @return mixed The file pointer resource
 */
function windToTailStart(mixed $fp, int $lineCount): mixed {
    $position = -1;
    $lineCounter = 0;
    do {
        fseek($fp, $position--, SEEK_END);
        if(fgetc($fp) == PHP_EOL) {
            $lineCounter++;
        }
    }
    while($lineCounter < $lineCount);
    return $fp;
}

the heart of the action here is the do while loop. this loop keeps running until the number of PHP_EOL characters we have looped over matches $lineCount, the number of lines we want our file pointer to be from the end.

inside the loop, we move our our pointer one byte back from the end of the file with the line:

fseek($fp, $position--, SEEK_END);

as we went over above, the SEEK_END argument tells fseek to start from the end of the file, and the $position argument is the number of bytes to add to the starting point. since we’re moving backwards here, that number is negative.

we test to see if the byte we’re inspecting is a line end character by using getc to get one char. if that byte is PHP_EOL, the line end, we increment $lineCounter. when $lineCounter equals the number of lines we want, we stop the loop and return the file pointer, which is now at the beginning of the line that is $lineCount from the end of the file.

once we have our windToTailStart function written, we can implement tail fairly easily:

// open the file
$fp = fopen("/path/to/bands","r");

// move file pointer to 3 lines from the end
$fp = windToTailStart($fp, 3); 

// output everything from $fp to the end of the file
while(!feof($fp)) {
    print fgets($fp);
}

when we run this, we get the last three lines of our ‘bands’ file. it looks like this:

idles
die kreuzen
johnson, linton kwesi

inspecting our pointer with ftell

setting our file pointer is great, but we probably also want to get it; find out exactly where our pointer is. we can do that with ftell.

ftell takes one argument, our pointer, and returns the number of bytes the pointer is away from file start. let’s look:

$fp = fopen("/path/to/bands","r");

print ftell($fp); // 0

// move pointer ahead 10 bytes
fseek($fp, 10, SEEK_SET);

print ftell($fp); // 10

here we opened our file and immediately called ftell on our pointer. the result, not surprisingly, is 0. the beginning of the file.

if we wind the pointer ahead 10 bytes with fseek and call ftell again, the number we get is 10. exactly what we expect.

let’s look at a slightly more practical example:

$fp = fopen("/path/to/bands","r");

while(!feof($fp)) {
    print fgetc($fp);
    if(ftell($fp) == 12) {
        break;
    }   
}

here, we’ve written a short block of code that outputs the first 12 bytes of our file and then exits. the implementation is basically looping over the file, outputting each character and then running ftell. when ftell returns 12, we know we’re done and call break, exiting the loop. the output looks like this:

bratmobile
c

that’s ten bytes for ‘bratmobile’, one for PHP_EOL, and ‘c’, the first letter in john coltrane’s name.

a short note about file bounds

back in our windToTailStart function, we moved our file pointer up n lines from the file end. this works great if the file we’re winding over has n or more lines in it, but if it does not then we end up pushing our file pointer past the start of the file. the results when we do this are not great.

we can protect against that by using ftell to check if we are at the start of the file. let’s look:

function windToTailStart(mixed $fp, int $lineCount): mixed {
    $position = -1;
    $lineCounter = 0;
    do {

        // guard against running past start of file
        if(ftell($fp) == 0) {
            return $fp;
        }
        
        fseek($fp, $position--, SEEK_END);
        if(fgetc($fp) == PHP_EOL) {
            $lineCounter++;
        }
    }
    while($lineCounter < $lineCount);
    return $fp;
}

here, we’ve added a small if statement to return our file pointer if we’ve reached 0, the beginning of the file. problem solved and disaster averted.

putting it all together: writing tail -f in php

now that we have a grip on fseek and ftell, let’s leverage them to do a moderately powerful file-reading task: implementing linux’s tail -f functionality in php.

for those not familiar with tail -f, it allows us to ‘watch’ a file. we run tail -f /path/to/file and the command waits for new data to be added to that file and then immediately outputs it. it’s great for watching things like logs in real time.

to write this, we’re going to use ftell to find the end of our file and ‘bookmark’ it. then, in a loop, we’re going to continuously check the end of the file with fseek. if our new end-of-file is different than our ‘bookmark’, we’ll know new data has been added to the file. we can then output the new lines.

let’s look at the implementation:

function tailf(string $file) {

    // open file at the end
    $fp = fopen($file, 'r');
    fseek($fp, 0, SEEK_END);

    // infinite loop to keep checking file
    while(true) {
        // bookmark current end of file
        $tell = ftell($fp);

        // wait for 0.1 seconds
        usleep(100000);

        // move pointer to end of file
        fseek($fp, 0, SEEK_END);

        // if the end of file is not the same as the bookmar there is new data
        if($tell != ftell($fp)) {
            // move pointer to bookmark
            fseek($fp, $tell, SEEK_SET);

            // output everything from bookmark to end of file
            while(!feof($fp)) {
                print fgets($fp);
            }
        }
    }
}

as we see here, we can monitor if new data has been added to our file by getting the pointer for the file end with ftell, waiting for 0.1 seconds then seeing if our new file end is different than the previous one. if we have new data, we move the file pointer back to the old end of file and output from there.

for those who are interested, there is a gist of the full implementation of tail -f in php.

a more comprehensive example of tail -f functionality is available in my internal-use logger package.

wrapping up

convenience functions like file_get_contents are great; they’re powerful and easy-to-use. but, ultimately, they exist for convenience. the underpinning of every file operation in php is the file pointer, and once we understand how to track and manipulate that we have far greater control over our file handling.

Posted by: grant horwood

co-founder of fruitbat studios. cli-first linux snob, metric evangelist, unrepentant longhair. all the music i like is objectively horrible. he/him.

Leave a Reply