ASPit - Totally ASP JSit - Totally JavaScript
Search PHPit

Use this textbox to search for articles on PHPit. Seperate keywords with a space.

Advertisements

PHP in the Command Line

(Page 3 out of 4)

And now, to setup your crontab. I won't explain how crontabs work, just that they're the equivalent of the Windows Task Scheduler, which automatically run a particular command at a given date and time. The following will save http://www.google.com to a different filename every day.

0 0 * * * wget http://www.google.com --output-document=`date +%Y%m%d.html` > /dev/null

Keep in mind that if you want to put it in a special directory, just put the path in, i.e. change what's in the "output document" parameter to: `date +/home/user/wwwroot/your.host/%Y%m%d.html`

I've piped the output to /dev/null because wget saves the file for us, and there's no reason to do anything else with the output.

Tip: Pipe your cron jobs to /dev/null if you aren't doing anything with the output, because some hosts e-mail you the results and no one needs an extra piece of useless e-mail every day.

Just change http://www.google.com to the page of your choice. However it's important to know that the "archive" you're taking will only be a snapshot of that page on a particular day.

What I mean by that is, if you're archiving a blog page every day, this archiver won't archive that page on a particular day, it'll just be archiving what was there at that time. So it's not useful for everything, but it's good if you have access to a page that changes constantly, once a day, whose results you'd like to store.

Add that line above into your crontab file. These days every host has a control panel so there should be a place in there to add cron jobs. If you'd like the archiver to run at a time other than midnight, or if it should run weekly, monthly, or whatever, try this tool I've made for you:

http://www.robertplank.com/cron

I've designed it the same way Task Scheduler is setup, you can enter a certain time, run only on weekdays, run only on certain days of the week. Anything you want.

This tip doesn't take care of everything... for example, wget won't save the images on a page unless they're referenced by full URLs. In the next installment of this article series I'll be showing you how you can use PHP to make up for some of the things wget can't do (like grabbing images).

Here's my solution: http://www.jumpx.com/tutorials/commandline/get.zip

It's not the most perfect script in the world, but it should do what you want most of the time. If you'd like to delve into what it does, I've added comments within so you can see what it does. I've commented all the functions and a few of the important parts of the code.

« Previous: Pipe down, over there..
Next: What are arguments? »



Leave a Reply

About the author
Dennis Pallett is the main contributor to PHPit. He owns several websites, including ASPit and Chill2Music. He is currently still studying.
Article Index
  1. Introduction
  2. Pipe down, over there..
  3. Put it together
  4. What are arguments?
Bookmark Article
Download Article
PDF
Download this article as a PDF file