...making Linux just a little more fun!

<-- prev | next -->

Re-compress your gzipp'ed files to bzip2 using a Bash script (HOWTO)

By Dave Bechtel

If you were incredibly lucky (like me), perhaps you received an external USB hard drive for Christmas. Or perhaps you have one lying around already, with plenty of free space. And perhaps you also read the recent Slashdot article about compression software and have lots of fairly sizable gzipped files laying about.

After reading the comments in that article, I was dismayed to learn that my favorite compression tool of choice (gzip) has no error-correction capabilities. While I deem it to be the best all-around for quick backups with a decent compression ratio, gzip will choke if it gets a data error on restore - and there's something to be said for data integrity.

So, having this nice shiny new USB external drive and some time on my hands, I wrote a Bash utility script to re-compress gzip files to bzip2, using the external drive. It takes an order of magnitude longer to compress, but at least I'll save some space and have a hope of recovering the compressed data if things go wrong... Right??

My particular external drive is a 120-gig that came factory-formatted as a single FAT32 partition. Now, any Linux guru worth their salt knows that this thing practically begs to be customized, since Fat32 has a 2GB(Linux) or 4GB(Windows) filesize limit - depending on who's writing to it.

So, I fired up my Knoppix HD install and repartitioned it. Nothing fancy, just good old fdisk.

Here's how it looks now:

$ fdisk -l /dev/sdb

Disk /dev/sdb: 255 heads, 63 sectors, 14593 cylinders
Units = cylinders of 16065 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sdb1   *         1         1      8032   83  Linux
/dev/sdb2             2     14593 117210240    f  Win95 Ext'd (LBA)
/dev/sdb5             2        18    136552   82  Linux swap
/dev/sdb6            19      4999  40009882    c  Win95 FAT32 (LBA)
/dev/sdb7          5000      5622   5004247   83  Linux
/dev/sdb8          5623     14593  72059557   83  Linux

(I did make a note of the fact that the factory-default was one big type "c", in case I needed to go back to that.)

Notice the 40GB Fat32 partition. In my other life (sssshhh!) I run Windows 2000 Professional - and was forcibly reminded that everything after Windows ME has a 32GB partition size limit for formatting Fat32. Note that the limitation is on formatting - not accessing - this is by design, and Microsoft has publically admitted it.

After going through several free Windows tools for formatting and repartitioning (and running into a brick wall), I eventually gave up on Windows 2000 formatting the thing. The vendor has a utility on their website to restore the drive to factory-default partitioning, but that doesn't really help my intended use of the drive. I could have formatted it in Windows 98, but that's no fun - and it would need a separate driver for the OS to recognize the drive.

So, rather than give up a perfectly usable 8GB, good old Linux to the rescue again:

$ mkdosfs -F 32 -v -n wdfat40 /dev/sdb6

and reboot.

Presto! Windows 2000 recognizes the drive just fine now, and it passes all the chkdsk tests. And for all you dual-booters out there, a wonderful utility exists called Ext2IFS ( http://www.fs-driver.org/ ). This allows NT-based systems like Windows 2000 to access ext2/ext3 partitions just like a regular drive - read/write, so no need for NTFS!

The Linux partitions were formatted like so:

mke2fs -j -c -m1 -v /dev/sdbX

Here are the /etc/fstab entries I created for the drive, BTW:

/dev/sdb6  /mnt/wdfat40  auto
defaults,noauto,noatime,user,suid,noexec,uid=dave 0 0
/dev/sdb7  /mnt/wdlinux  ext3 defaults,noauto,noatime,rw 0 0
/dev/sdb8  /mnt/wdvast  ext3 defaults,noauto,noatime,rw 0 0

Note the "uid=dave" in that first line. That's so my non-root user account will have write access to the drive by default.

Now onto the good part - the "rezip" Bash script.

At first, I started out by writing a fairly basic script with a simple function call and manually-entered filenames. Then I sat down and took another look at it - and practically rewrote it from scratch, with some features that occurred to me after several test runs.

rezip Currently Features:

-- KNOWN BUG(s):

During the course of writing the script, I had hard-coded most of the defaults, such as the size of files to skip, the log file name, etc. These were eventually changed to be variables before the script was published for LG - so that you, the end-user, can have More Control (TM) over its actions. ;-)

I encourage everyone to READ THE SOURCE CODE before running rezip. You may find it handy to view it in an editor that colorizes or highlights executable syntax, such as ' mcedit ' or ' jstar '.

Comments, feature requests, bug reports, etc., are welcome.

( Don't forget to ' chmod +x rezip ' and put it somewhere in your $PATH - /usr/local/bin is suggested. )

Talkback: Discuss this article with The Answer Gang


[BIO]

Bio: Born in 1972, Dave Bechtel grew up programming in Basic with Apple ][e's, TI99 4/A, IBM PC (640K!) and a Tandy 1000SX, none of which actually had hard drives -- 360K floppy only. And we LIKED IT! ;-)

Eventually left BASIC behind, and moved on to programming in REXX and Bash.

Got interested in Linux around 1997. Started with Red Hat and went on to SuSE, tried several other distros and a *BSD or two, and has now settled on Knoppix/Debian/Ubuntu, in roughly that order. Currently living in Lake Zurich, IL.

Likes: Computers, motorcycles, Linux, reading and watching sci-fi (currently Star Trek TOS, Stargate, and Battlestar Galactica)


Copyright © 2006, Dave Bechtel. Released under the Open Publication license unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 123 of Linux Gazette, February 2006

<-- prev | next -->
Tux