Copying a Drupal site from a live server to a test one is normally a straightforward process: you dump the database, and create a tar archive of the site, extract the archive, load the database dump, make a few changes to the settings.php file, and voila: you have a test site that mirrors the live site. You can then use this for development, training or any other purpose.
We recently need to do this to a a live site, but it had way too much images: with 5.6 GB's worth of images, making an exact copy on a smaller server was not an option. So, we devised the following method to copy images only from the last 7 days, to make the test site more manageable, yet reflecting what the live site has very closely.
This assumes that the images are either attachments using the core upload module, or the image module. Doing this with imagecache, and CCK requires a bit more SQL.
We login to the live site, and change to the directory where Drupal is:
We create a database dump as usual:
mysqldump -uuser -p dbname > yoursite.sql
We create a tar archive, excluding the files directory:
tar --exclude="files" -czvf yoursite.tar.gz .
Then, we find out what theUNIX epoch date now
We get a number like: 1209843315
We then calculate what 7 days worth of seconds is by multiplying 7*24*60*60. We get 604800.
We now run the SQL to get the path names of the files/images for the last 7 days, and we create a cpio archive out of them.
echo "SELECT f.filepath FROM node n INNER JOIN files f USING (nid) WHERE created > 1209843315 - 604800;"| mysql -uuser -p dbname | grep -v "^filepath$" | cpio -o -O files.cpio
In this case, we got about 100 MBs worth of files. Far more manageable than 5.6GB, and easily transferred over the network.
Now, you need to copy the files.cpio and yoursite.tar.gz to the test server.
Once you do that, you change the Drupal directory
Then you extract the tar archive:
tar xzvf yoursite.tar.gz
And load the database
mysql -uuser -p testdb < yoursite.sql
Then extract the subset of images that we created earlier.
cpio -iv -I files.cpio
We now have a functional site with a subset of the images and files ...