PHP 5.3 5th Anniversary: The History of PHP Archives (PHAR Files)
June 30th 2014 marks the 5th anniversary of PHP 5.3.0, which included PHAR extension for the first time.
The initial commit for PHP_Archive is dated “Jan 25 2005” — more than nine years ago.
But the story of PHAR files started about 6 months before that, meaning that we just passed the 10th anniversary of an experiment that went on to impact the entire PHP community.
To set the stage, PHP 4.3.7 was the current stable release, and PHP 5.0.0 was still a month away — and PHP 5.0.0RC3 (the final RC) had just been released. We were still several months away from the first ever release of Ubuntu Linux (4.10 Warty Warthog).
As I remember it, it was about 3 in the morning when I had the crazy idea of single-archive PHP applications. Like Java JAR files but for PHP. A PHPJAR… no, a PHAR. And I bet you can use the then brand new streams feature to do it!
This was in the days when deployment meant using FTP to connect to a server and transferring files.
I remember being frustrated at how long phpMyAdmin took to transfer because it had so many tiny files (572 of them — or 9MB! At least according to git.), and that took even longer than one large file on my 56K modem. Wouldn’t it be great if I could upload a Zip file and unpack it on the server?
But this was on a shared host, and SSH wasn’t something you could expect back then.
The first phar was created by hand, based on the premise that I knew tar
files were basically just a bunch of non-printable binary data that separated the file data — in it’s original format. This meant that if we put PHP code into a tar, it would still be PHP code.
So, I create a tar file with a single PHP file in it, and opened it up in vim
to see what it looked like.
$ tar -cvf info.tar info.php
And it looked something like this:
info.php<binary data><?php phpinfo(); ?>
<binary data>
Amazingly, when I ran this, it basically worked. PHP spat out the filename, some random gibberish, and then… the result of phpinfo()
.
file.php<gibberish>phpinfo()
PHP Version => …
Now, obviously, this isn’t perfect, but it’s a pretty good start! How do we stop the extraneous output? The start of the archive is the file name… so what if I renamed the to <?php
? This would mean that the archive started with the open PHP tag.
Now our tar file looks like this:
<?php<binary data><?php phpinfo(); ?>
And in my excitement I ran it… with the same result as before, except instead of file.php
at the start of the output, I had <?php
… this meant it wasn’t being parsed, and the only reason it worked was that I forgot to remove the extraneous <?php
in the file!
It was pretty obvious that it wasn’t being parsed because PHP requires whitespace after the php
in the opening tag, so rename the file to <?php
(with a space) and remove the extraneous <?php
from the contents and try again:
Now the tar file looks like this:
<?php <binary> phpinfo(); ?>
I run it again and… errors.
PHP Warning: Unexpected character in input: ' in info.tar on line 1
Warning: Unexpected character in input: ' in info.tar on line 1
…
PHP Parse error: syntax error, unexpected '000767' (T_LNUMBER) in info.tar on line 1
Parse error: syntax error, unexpected '000767' (T_LNUMBER) in info.tar on line 1
Hrm… if only I could get PHP to ignore the binary junk. What if I use a comment?
So, once again I just need to rename the file, this time to: <?php \/\/
, and add a line of whitespace at the top of info.php
so that it starts after the commented out line.
Now, it’s much harder to rename a file to include a /
in it than you might think. In fact, it’s impossible.
So I did the next best thing… I edited the tar file by hand… and Success!
The first “valid” PHAR was born.
Now, at this point, we no longer have a valid tar
file, which was something I really wanted to retain — and it turned out to be an easy and obvious fix: use the #
comment syntax. We renamed our file to <?php #
, yielding a tar file that looks like this:
<?php #<binary>
phpinfo(); ?>
Now we had a completely valid tar file, that was also a completely valid PHP script without any unintended side-effects.
From this point, I moved on to generating the file with the Archive_Tar package, and from there progressed to bundling the Archive_Tar
code inside the phar, so it was self-contained.
This essentially led to the PHP_Archive
class that would (using Archive_Tar
) handle the streams part, and eventually would lead to sub-classes like PHP_Archive_Creator
for creating phars etc.
Once I got this far, I went through the PEAR Proposal, or PEPr (pepper), which stated:
PHP_Archive
PHP_Archive provides similar functionality to Javas .jar files.
PHP_Archive is a specially formatted valid TAR file.
Inclusion of files is done by using require_once ‘phar://file.php’; instead of just require_once ‘file.php’;
What PHP_Archive Does Do
- PHP_Archive allows you to distribute a single file with your entire application in it.
- PHP_Archive doesn’t have lengthy unpacking times
- PHP_Archive’s are valid TAR archives
- PHP_Archive’s are valid PHP scripts (assuming the packed code is valid
- PHP_Archive’s can be created with the *nix ‘tar’ command, just include a special file, and use
phar_default.php
as the start file for your application.- PHP_Archive works on PHP 4.3.0+ and PHP 5
- PHP_Archive_Creator also allows you specify a different file than phar_default.php as the file to run.
- PHP_Archive_Creator will have the ability to compress files
What PHP_Archive Does NOT Do
- PHP_Archive does not protect your script, all files are stored in plain-text inside a valid TAR file. Anybody can unpack them.
You can see the proposal in its entirety here including voting which happened 6 months later.
So, that’s the long story of how PHAR was created — mostly by just hacking around with tar
on the command line. You can still create tar-based PHP Archives, and they’re not too terribly different from those first hand-made (artisanal!) tar files.
Thankfully, not too long after this, two folks much smarter than me — Greg Beaver and Marcus Börger — took the idea, added Zip support, and created the phar format which is vastly superior, and turned it into the C phar extension which has been bundled with, and enabled by default, since PHP 5.3.
Share your thoughts with @engineyard on Twitter