PHP in the Command Line
There's a single line you can add to your web host's control panel that will automatically archive your content.
LISTEN CLOSELY AND YOU'LL HEAR THE OCEAN
Ever run commands in DOS? You've used a shell. A "shell" in the computer world is a place where you enter commands and run files by name rather than clicking around different windows.
Most web hosts let you operate a shell remotely. This means that you can type commands in window on your computer, that are actually run on your web host, thousands of miles away.
I'd like you to log in to your shell now. If you can't do it by going in to DOS and typing "telnet your.domain.here", your web host probably uses "SSH" -- a secure shell. You'll have to ask your host how you can log in to the shell, they might tell you to download a program called "PuTTY" and give instructions how to use it.
If you can't login to your shell, or aren't allowed, you'll just have to sit back and watch what I do.
Now that you're logged in, type: echo hi
On the next line will be printed hi
Try this: date +%Y
This prints the current year. That's 2004 for me.
So what if we combined the two? Try: echo date +%Y
Well, that doesn't work, because the computer thinks you're trying to echo the TEXT "date +%Y" instead of the actual COMMAND. What we have to do here is surround that text in what are called "back quotes". Unix will evaluate everything enclosed in back quotes (by evaluate, I mean it'll treat that text as if it were entered as a command.)
Your back quotes key should be located on the upper-left corner of your keyboard, under the Esc button.
PIPE DOWN, OVER THERE...
Type this in: echo `date +%Y`
Gives us "2004". You could even do something like this: echo `dir`
Which puts the directory listing all on one line.
But now, we put our newfound knowledge to good use. Unix has another neat feature called piping, which means "take everything you would normally output to the screen here, and shove it whatever file I tell you to." So say I had something like this:
echo "hey" > test.txt
Now type "dir" and you'll see a new file, test.txt, that wasn't there before. View it off the web, or FTP it to your computer, do whatever you have to, to read the file. It should contain the word "hey".
Likewise, dir > test.txt would store the directory listing into "test.txt".
HERE TODAY, GONE TOMORROW
But say we wanted that text file to be named according to the current date. You already have the pieces to figure all that out, if you think about it. Type: date --help to get a listing of all the possible ways to represent the date. The ones you want to represent the year, month and day are %Y, %m, and %d (capitalization *is* important here).
This is what you want: echo `date +%Y%m%d.html`
Running this today, January 8th, 2004, results in: 20040108.html
I've just echoed this year, followed by this month and this day, with an ".html" at the end. This will be our output file.
Now, to pipe it: echo "hey" > `date +%Y%m%d.html`
If this sort of thing were to run every day, it would save "hey" to a file called 20040108.html today, and tomorrow to a file called 20040109.html, then 20040110.html, and so on.
The easy part now, is figuring out what you want archived. I use wget, which takes an option to store the output file, so we don't need to use piping. Here's an example of how to use wget to save the page "http://www.google.com" to a file representing today's date:
wget http://www.google.com --output-document=`date +%Y%m%d.html`
PUT IT TOGETHER
And now, to setup your crontab. I won't explain how crontabs work, just that they're the equivalent of the Windows Task Scheduler, which automatically run a particular command at a given date and time. The following will save http://www.google.com to a different filename every day.
0 0 * * * wget http://www.google.com --output-document=`date +%Y%m%d.html` > /dev/null
Keep in mind that if you want to put it in a special directory, just put the path in, i.e. change what's in the "output document" parameter to: `date +/home/user/wwwroot/your.host/%Y%m%d.html`
I've piped the output to /dev/null because wget saves the file for us, and there's no reason to do anything else with the output.
Tip: Pipe your cron jobs to /dev/null if you aren't doing anything with the output, because some hosts e-mail you the results and no one needs an extra piece of useless e-mail every day.
Just change http://www.google.com to the page of your choice. However it's important to know that the "archive" you're taking will only be a snapshot of that page on a particular day.
What I mean by that is, if you're archiving a blog page every day, this archiver won't archive that page on a particular day, it'll just be archiving what was there at that time. So it's not useful for everything, but it's good if you have access to a page that changes constantly, once a day, whose results you'd like to store.
Add that line above into your crontab file. These days every host has a control panel so there should be a place in there to add cron jobs. If you'd like the archiver to run at a time other than midnight, or if it should run weekly, monthly, or whatever, try this tool I've made for you:
I've designed it the same way Task Scheduler is setup, you can enter a certain time, run only on weekdays, run only on certain days of the week. Anything you want.
This tip doesn't take care of everything... for example, wget won't save the images on a page unless they're referenced by full URLs. In the next installment of this article series I'll be showing you how you can use PHP to make up for some of the things wget can't do (like grabbing images).
Here's my solution: http://www.jumpx.com/tutorials/commandline/get.zip
It's not the most perfect script in the world, but it should do what you want most of the time. If you'd like to delve into what it does, I've added comments within so you can see what it does. I've commented all the functions and a few of the important parts of the code.
ARGUMENTS (NOT THE SHOUTING KIND)
But wait, you want to use it in a crontab, which is run from the command line. You can't just do something like:
Because it'll try looking for a *file* named all that, complete with the question mark and all. So what if you have ten different URLs to grab off ten different crontabs, but you only want one script.
How would you do all that? It's a long brutal ordeal so prepare yourself. Ready?
php get.php url=http://www.google.com
Yeah, that's all there is to it. PHP's pretty cool like that, it takes the arguments after the file name and stores them in the same array you'd check anyway.
One thing you might notice is that every time you run PHP from the command line, it gives you something like this:
your output here...
Those first couple of lines are the HTTP headers. But we're not using HTTP (not loading it from a browser), so in the command line it's better to call php with the "-q" option, like this:
php -q get.php url=http://www.google.com
The "q" stands for quiet, and will refrain from giving you the HTTP headers. If you're just piping the script to /dev/null (to nothing) in a crontab, it doesn't really make a difference but you should try to make this a habit when running PHP from the command line.
That's enough for you to at least get started. If you still feel liking poking about with the things PHP can do in the command line, you can try prompting a user for keyboard input, like this:
Remember, that only works when PHP is run from the shell.
If you have PHP installed in Windows on a local machine of yours, you can also see what happens when you try to read (and write) to filehandles like "COM1:" and "LPT1:" ... yep, you guessed it, the serial port and printer port. If PHP isn't installed on the computer you're using now then don't bother. But it is possible to use PHP to print and interact with your peripherals as well.