Backing Up with RsyncX

In an earlier post I talked generally about my backup procedure for large amounts of data. In the post I discussed using RsyncX to back up staff Work drives over a network, as well as my own personal Work drive data, to a spare hard drive. Today I'd like to get a bit more specific.

Installing RsyncX

I do not use, nor do I recommend the version of rsync that ships with Mac OS X 10.4. I've found it, in my own personal tests, to be extremely unreliable, and unreliability is the last thing you want in a backup program. Instead I use — and have been using without issue for years now — RsyncX. RsyncX is a GUI wrapper for a custom-built version of the rsync command that's made to properly deal with HFS+ resource forks. So the first thing you need to do is get RsyncX, which you can do here. To install RsyncX, simply run the installer. This will place the resource-fork-aware version of rsync in /usr/local/bin/. If all you want to do is run rsync from the RsyncX GUI, then you're done, but if you want to run it non-interactively from the command-line — which ultimately we do — you should put the newly installed rsync command in the standard location, which is /usr/bin/.¹ Before you do this, it's always a good idea to make a backup of the OS X version. So:

sudo cp /usr/bin/rsync /usr/bin/rsync-ORIG

sudo cp /usr/local/bin/rsync /usr/bin/rsync

Ah! Much better! Okay. We're ready to roll with local backups.²

Local Backups

Creating local backups with rsync is pretty straightforward. The RsyncX version of the command acts almost exactly like the standard *NIX version, except that it has an option to preserve HFS+ resource forks. This option must be provided if you're interested in preserving said resource forks. Let's take a look at a simple rsync command:

/usr/bin/rsync -a -vv /Volumes/Work/ /Volumes/Backup --eahfs

This command will backup the contents of the Work volume to another volume called Backup. The -a flag stands for "archive" and will simply backup everything that's changed while leaving files that may have been deleted from the source. It's usually what you want. The -vv flag specifies "verbosity" and will print what rsync is doing to standard output. The level of verbosity is variable, so "-v" will give you only basic information, "-vvvv" will give you everything it can. I like "-vv." That's just the right amount of info for me. The next two entries are the source and target directories, Work and Backup. The --eahfs flag is used to tell rsync that you want to preserve resource forks. It only exists in the RsyncX version. Finally, pay close attention to the trailing slash in your source and target paths. The source path contains a trailing slash — meaning we want the command to act on the drive's contents, not the drive itself — whereas the target path contains no trailing slash. Without the trailing slash on the source, a folder called "Work" will be created inside the WorkBackup drive. This trailing slash behavior is standard in *NIX, but it's important to be aware of when writing rsync commands.

That's pretty much it for simple local backups. There are numerous other options to choose from, and you can find out about them by reading the rsync man page.

Network Backups

One of the great things about rsync is its ability to perform operations over a network. This is a big reason I use it at work to back up staff machines. The rsync command can perform network backups over a variety of protocols, most notably SSH. It also can reduce the network traffic these backups require by only copying the changes to files, rather than whole changed files, as well as using compression for network data transfers.

The version of rsync used by the host machine and the client machine must match exactly. So before we proceed, copy rsync to its default location on your client machine. You may want to back up the Mac OS X version on your client as well. If you have root on both machines you can do this remotely on the command line:

ssh -t root@mac01.systemsboy.com 'cp /usr/bin/rsync /usr/bin/rsync-ORIG'

scp /usr/bin/rsync root@mac01.systemsboy.com:/usr/bin/

Backing up over the network isn't too much different or harder than backing up locally. There are just a few more flags you need to supply. But the basic idea is the same. Here's an example:

/usr/bin/rsync -az -vv -e SSH mac01.systemsboy.com:/Volumes/Work/ /Volumes/Backups/mac01 --eahfs

This is pretty similar to our local command. The -a flag is still there, and we've added the -z flag as well, which specifies to use compression for the data (to ease network traffic). We now also have an -e flag which tells rsync that we're running over a network, and an SSH option that specifies the protocol to use for this network connection. Next we have the source, as usual, but this time our source is a computer on our network, which we specify just like we would with any SSH connection — hostname:/Path/To/Volume. Finally, we have the --eahfs flag for preserving resource forks. The easiest thing to do here is to run this as root (either directly or with sudo), which will allow you to sync data owned by users other than yourself.

Unattended Network Backups

Running backups over the network can also be

completely automated and can run transparently in the background even on systems where no user is logged in to the Mac OS X GUI. Doing this over SSH, of course, requires an SSH connection that does not interactively prompt for a password. This can be accomplished by establishing authorized key pairs between host and client. The best resource I've found for learning how to do this is Mike Bombich's page on the subject. He does a better job explaining it than I ever could, so I'll just direct you there for setting up SSH authentication keys. Incidentally, that article is written with rsync in mind, so there are lots of good rsync resources there as well. Go read it now, if you haven't already. Then come back here and I'll tell you what I do.

I'd like to note, at this point, that enabling SSH authentication keys, root accounts and unattended SSH access is a minor security risk. Bombich discusses this on his page to some extent, and I want to reiterate it here. Suffice to say, I would only use this procedure on a trusted, firewalled (or at least NATed) network. Please bear this in mind if you proceed with the following steps. If you're uncomfortable with any of this, or don't fully understand the implications, skip it and stick with local backups, or just run rsync over the network by hand and provide passwords as needed. But this is what I do on our network. It works, and it's not terribly insecure.

Okay, once you have authentication keys set up, you should be able to log into your client machine from your server, as root, without being prompted for a password. If you can't, reread the Bombich article and try again until you get it working. Otherwise, unattended backups will fail. Got it? Great!

I enable the root account on both the host and client systems, which can be done with the NetInfo Manger application in /Applications/Utilities/. I do this because I'm backing up data that is not owned by my admin account, and using root gives me the unfettered access I need. Depending on your situation, this may or may not be necessary. For the following steps, though, it will simplify things immensely if you are root:

su - root

Now, as root, we can run our rsync command, minus the verbosity, since we'll be doing this unattended, and if the keys are set up properly, we should never be prompted for a password:

/usr/bin/rsync -az -e SSH mac01.systemsboy.com:/Volumes/Work/ /Volumes/Backups/mac01 --eahfs

This command can be run either directly from cron on a periodic basis, or it can be placed in a cron-run script. For instance, I have a script that pipes verbose output to a log of all rsync activity for each staff machine I back up. This is handy to check for errors and whatnot, every so often, or if there's ever a problem. Also, my rsync commands are getting a bit unwieldy (as they tend to do) for direct inclusion in a crontab, so having the scripts keeps my crontab clean and readable. Here's a variant, for instance, that directs the output of rsync to a text file, and that uses an exclude flag to prevent certain folders from being backed up:

/usr/bin/rsync -az -vv -e SSH --exclude "Archive" mac01.systemsboy.com:/Volumes/Work/ /Volumes/Backups/mac01 --eahfs > ~/Log/mac01-backup-log.txt

This exclusion flag will prevent backup of anything called "Archive" on the top level of mac01's Work drive. Exclusion in rsync is relative to the source directory being synced. For instance, if I wanted to exclude a folder called "Do Not Backup" inside the "Archive" folder on mac01's Work drive, my rsync command would look like this:

/usr/bin/rsync -az -vv -e SSH --exclude "Archive/Do Not Backup" mac01.systemsboy.com:/Volumes/Work/ /Volumes/Backups/mac01 --eahfs > ~/Log/mac01-backup-log.txt

Mirroring

The above uses of rsync, as I mentioned before, will not delete files from the target that have been deleted from the source. They will only propagate changes that have occurred on the existing files, but will leave deleted files alone. They are semi-non-destuctive in this way, and this is often useful and desirable. Eventually, though, rsync backups will begin to consume a great deal of space, and after a while you may begin to run out. My solution to this is to periodically mirror my sources and targets, which can be easily accomplished with the --delete option. This option will delete any file from the target not found on the source. It does this after all other syncing is complete, so it's fairly safe to use, but it will require enough drive space to do a full sync before it does its thing. Here's our network command from above, only this time using the --delete flag:

/usr/bin/rsync -az -vv -e SSH --exclude "Archive/Do Not Backup" mac01.systemsboy.com:/Volumes/Work//Volumes/Backups/mac01 --delete --eahfs > ~/Log/mac01-backup-log.txt

Typically, I run the straight rsync command every other day or so (though I could probably get away with running it daily). I create the mirror at the end of each month to clear space. I back up about a half dozen machines this way, all from two simple shell scripts (daily and weekly) called by cron.

Conclusion

I realize that this is not a perfect backup solution. But it's pretty good for our needs, given what we can afford. And so far it hasn't failed me yet in four years. That's not a bad track record. Ideally, we'd have more drives and we'd stagger backups in such a way that we always had at least a few days backup available for retrieval. We'd also probably have some sort of backup to a more archival medium, like tape, for more permanent or semi-permanent backups. We'd also probably keep a copy of all this in some offsite, fireproof lock box. I know, I know. But we don't. And we won't. And thank god, 'cause what a pain in the

ass that must be. It'd be a full time job all its own, and not a very fun one. What this solution does offer is a cheap, decent, short-term backup procedure for emergency recovery of catastrophic data loss. Hard drive fails? No trouble. We've got you covered.

Hopefully, though, this all becomes a thing of the past when Leopard's Time Machine debuts. Won't that be the shit?

1. According to the RsyncX documentation, you should not need to do this, because the RsyncX installer changes the command path to its custom location. But if you'll be running the command over the network or as root, you'll either have to change that command path for the root account and on every client, or network backups will fail. It's much easier to simply put the modified version in the default location on each machine.

2. Updates to Mac OS X will almost always overwrite this custom version of rsync. So it's important to remember to replace it whenever you update the system software.

Using SSH to Send Variables in Scripts

In July I posted an article about sending commands remotely via ssh. This has been immensely useful, but one thing I really wanted to use it for did not work. Sending an ssh command that contained a variable, via a script for instance, would always fail for me, because, of course, the remote machine didn't know what the variable was.

Let me give an example. I have a script that creates user accounts. At the beginning of the script it asks me to supply a username, among other things, and assigns this to a variable in the script called $username. Kinda like this:

echo "Please enter the username for the new user:"read username

Later in the script that variable gets called to set the new user's username, and a whole bunch of other parameters. Still later in the script, I need to send a command to a remote machine via ssh, and the command I'm sending contains the $username variable:

ssh root@home.account.server 'edquota -p systemsboy $username'

This command would set the quota of the new user $username on the remote machine to that of the user systemsboy. But every time I've tried to include this command in the script, it fails, which, if you think about it, makes a whole lot of sense. See, 'cause the remote machine doesn't know squat about my script, and when that command gets to the remote machine, the remote machine has no idea who in the hell $username is. The remote machine reads $username literally, and the command fails.

The solution to this is probably obvious to hard-core scripters, but it took me a bit of thinkin' to figure it out. The solution is to create a new variable that is comprised of the ssh command calling the $username variable, and then call the new variable (the entire command) in the script. Which looks a little something like this:

quota=`ssh -t root@home.account.server "edquota -p systemsboy $username"`echo "$quota"

So we've created a variable, called $quota, which is the entire ssh command, and then we've simply called that variable in the script. That $quota variable will have the $username variable already filled in, and the command will now succeed on the remote machine. One thing that's important to note here: generally the command being sent over ssh is enclosed in single-quotes. In this instance, however, it must be enclosed in double-quotes for the command to work. I also used the -t option in this example (which tells ssh that the session is interactive, and to wait until it's told to return to the local machine) but I don't actually think it's necessary in this case. Still, it shouldn't hurt to have it there, just in case something goes funky.

But so far nothing has gone funky. This seems to work great.

Networked Home Accounts and The New RAID

We recently installed in our machine room a brand-spankin' new RAID for hosting network home accounts. We bought this RAID as a replacement for our aging, and horrendously unreliable Panasas RAID. The Panasas was a disaster for almost the entire three-year span of its lease. It used a proprietary operating system based on some flavor of *NIX (which I can't recall right at this moment), but that had all sorts of variations from a typical *NIX install that made using it as a home account server far more difficult than it ever should have been. To be fair, it was never really intended for such a use, but was rather created as a file server cluster for Linux workstations that can be easily managed directly from a web browser, as opposed to the command-line. It was really built for speed, not stability, and it was really completely the wrong product for us. (And for the record, I had nothing to do with its purchase, in case you're wondering.)

What the Panasas was, however, was instructive. For three years we lived under the shadow of its constant crashing, the near-weekly tcp dumps and help requests to the company, and angry users fed up with a system that occasionally caused them to lose data, and frequently caused their machines to lock up for the duration of a Panasas reboot, which could be up to twenty minutes. It was not fun, but I learned a lot from it, and it enabled me to make some very serious decisions.

My recent promotion to Senior Systems Administrator came just prior to the end of our Panasas lease term. This put me in the position of both purchasing a new home account server, and of deciding the fate of networked home accounts in the lab.

If I'd learned anything from the experience with the Panasas it was this: A home account server must be, above all else, stable. Every computer that relies on centralized storage for home account serving is completely and utterly dependent on that server. If that server goes down, your lab, in essence, goes down. When this starts happening a lot, people begin to lose faith in a lot of things. First and foremost, they lose faith in the server and stop using it, making your big, expensive network RAID a big, expensive waste of money. Secondly, they lose faith in the system you've set up, which makes sense because it doesn't work reliably, and they stop using it, favoring instead whatever contingency plan you've set up for the times when the server goes down. In our case, we set up a local user account for people to log into when the home account server was down. Things got so bad for a while that people began to log in using this local account more than they would their home accounts, thus negating all our efforts at centralizing home account data storage. Lastly, people begin to lose faith in your abilities as a systems administrator and lab manager. Your reputation suffers, and that makes it harder to get things done — even improvements. So, stability. Centralization of a key resource is risky, in that if that resource fails, everything else fails with it. Stability of crucial, centralized storage was key if any kind of network home account scenario was going to work.

The other thing I began to assess was the whole idea of networked home accounts themselves. I don't know how many labs use networked home accounts. I suspect there are quite a few, but there are also probably a lot of labs that don't. I know I've read about a lot of places that prefer local accounts that are not customized and that revert to some default state at every log in/out. Though I personally really like the convenience of customized network home accounts that follow you from computer to computer throughout a facility, it certainly provides a fair amount of hassle and risk. When it works it's great, but when it doesn't work, it's really bad. So I really began to question the whole idea. Is this something we really needed or wanted to continue to provide?

My ultimate decision was intimately linked to the stability of the home account server. From everything I've seen, networked home accounts can and do work extremely well when the centralized storage on which they reside is stable and reliable. And there is value to this. I talked to people in the lab. By and large, from what I could glean from my very rudimentary and unscientific conversations with users, people really like having network home accounts when they work properly. When given the choice between a generic local account or their personalized network account, even after all the headaches, they still ultimately prefer the networked account. So it behooves us to really try to make it work and work well. And, again, everything I saw told me that what this really required, more than anything else, was a good, solid, robust and reliable home account server.

So, that's what we tried our best to get. The new unit is built and configured by a company called Western Scientific, which was recommended to me by a friend. It's called the Fusion SA. It's a 24-bay storage server running Linux Fedora Core 5. We've populated 16 of the bays with 500GB drives and configured them at RAID level 5, giving us, when all is said and done, about 7TB of networked storage with room to grow in the additional bays should we ever want to do so. The unit also features a Quad-port GigE PCIX card which we can trunc for speedy network access. It's big and it's fast. But what's most important is its stability.

Our new RAID came a little later than we'd hoped, so we weren't able to test it before going live with it. Ideally, we would have gotten the unit mid-summer and tested it in the lab while maintaining our previous system as a fall-back. What happened instead was that we got the unit in about the second week of the semester, and outside circumstances eventually necessitated switching to the new RAID sans testing. It was a little scary. Here we were in the third week of school switching over to a brand new but largely untested home account server. It was at this point in time that I decided, if this thing didn't work — if it wasn't stable and reliable — networked home accounts would become a thing of the past.

So with a little bit of fancy footwork we made the ol' switcheroo, and it went so smoothly our users barely noticed anything had happened. Installing the unit was really a simple matter of getting it in the rack, and then configuring the network settings and the RAID. This was exceptionally quick and easy, thanks in large measure to the fact that Western Scientific configured the OS for us at the factory, and also to the fact that they tested the unit for defects prior to shipping it to us. In fact, our unit was late because they had discovered a flaw in the original unit they had planned to ship. Perfect! If that's the case, I'm glad it was late. This is exactly what we want from a company that provides us with our crucial home account storage. If the server itself was as reliable as the company was diligent, we most like had a winner on our hands. So, how has it been?

It's been several weeks now, and the new home account server has been up, without fail or issue, the entire time. So far our new home account server has been extremely stable (so much so that I almost forget about it, until, of course, I walk past our server room and stop to dreamily look upon its bright blue drive activity lights dutifully flickering away without pause). And if it stays that way, user confidence should return to the lab and to the whole idea of networked home accounts in fairly short order. In fact, it seems like it already has to a great extent. I couldn't be happier. And the users?... Well, they don't even notice the difference. That's the cruel irony of this business: When things break, you never hear the end of it, but when things work properly, you don't hear a peep. You can almost gauge the success or failure of a system by how much you hear about it from users. It's the ultimate in "n o news is good news." The quieter the better.

And 'round these parts of late, it's been pin-drop quiet.

Adobe Legal Hosed My System

So recently there's been a bug in, according to Adobe anyway, Mac OS X that causes certain files in certain Adobe CS2 applications to wreak untold havoc on HFS+ filesystems. In a fairly recent Adobe Support Knowledgebase article on this issue Adobe says, and I quote:

Background information

Mac OS X causes illegal file names to be reported when it reads some of the font data used in the Vietnamese End User License Agreements, which are installed in the Legal or Legal.localized folders. This problem causes severe file system and hard disk corruption if the files are not deleted or if the file system is not repaired.

Apple fixed this problem in Mac OS X 10.4.7.

The fix for this, Adobe goes on to say, is to get rid of any folder in any Adobe CS2 application folder called "Legal" (in the Finder) or "Legal.localized" (in the Terminal), and then run Disk Utility to repair the disk. They also suggest that upgrading to Tiger 10.4.7 or later is a good last step as it halts the corruption process.

I'd actually had the problem last summer on my lab systems, which had exhibited evidence of it during a clone operation. Any clone of a system would fail, and the asr command would actually report the name of the trouble file. Indeed, running Disk Utility's "Repair Disk" (or in our case, fsck from single-user mode) would fix the problem and our clones would subsequently succeed. Those systems were running 10.4.7.

My office system, on the other hand, never went through the cloning process, so I never detected the problem. But I seem to have been bitten by this bug and in a bad way. Please note the last sentence in the above quote:

This problem causes severe file system and hard disk corruption if the files are not deleted or if the file system is not repaired.

Yesterday I was running Disk Utility on a problem drive and decided to run it on my system partition as well, for good measure. This was the output:

Disk Utility: Reports Unfixable "Illegal name"
(click image for larger view)

See that "Illegal name" error in bright red? That's a telltale sign that you've got the "Adobe Legal bug" (as I like to call it). It's also, I can tell you from cold, hard, agonizing experience, a telltale sign that you are indeed fucked. Hard. I think this is what Adobe is referring to as "severe file system and hard disk corruption." I tried everything to make that "Illegal name" error go away. Actually, attempts to fix the problem took more time than it took for me to rebuild my system, which is what I ultimately had to do. Now I hate rebuilding systems. Almost as much as poking myself in the eye with white hot forks. So I spent the better part of the day attempting to fix the problem.

The first thing I did was to boot into a good system partition. I just happened to have a base Tiger OS installed on a firewire drive, so I booted into that. I then went through the aforementioned Disk Utility and fsck routines. (I also tried Disk Utility from the Tiger DVD, just to be thorough.) No luck. I always got the "Illegal name" error. I also tried fsck with some options to see if I could actually track down the file with the illegal name and delete it. Running:

sudo fsck_hfs -frd /dev/disk0s10

Produced the following output:

** /dev/rdisk0s10** Checking HFS Plus volume.** Checking Extents Overflow file.** Checking Catalog file.** Rebuilding Catalog B-tree.hfs_UNswap_BTNode: invalid node height (1)** Rechecking volume.** Checking HFS Plus volume.** Checking Extents Overflow file.** Checking Catalog file.Illegal nameillegal name is 0x00 54 00 69 00 65 03 02 03 01 00 6E 00 67 00 20 00 56 00 69 00 65 03 02 03 23 00 74 00 2E 00 68 00 74 00 6D 00 6Creplacement name is 0x00 54 00 69 00 65 03 02 03 01 00 6E 00 67 00 20 00 56 00 69 00 65 03 23 03 02 00 74 00 2E 00 68 00 74 00 6D 00 6C** Checking multi-linked files.** Checking Catalog hierarchy.** Checking Extended Attributes file.** Checking volume bitmap.** Checking volume information.Verify Status: VIStat = 0x0000, ABTStat = 0x0000 EBTStat = 0x0000        CBTStat = 0x0000 CatStat = 0x8000** Repairing volume.replacement name already existsduplicate name is 0x00 54 00 69 00 65 03 02 03 01 00 6E 00 67 00 20 00 56 00 69 00 65 03 23 03 02 00 74 00 2E 00 68 00 74 00 6D 00 6CFixIllegalNames - repair failed for type 0x23B 571** The volume SysApps could not be repaired.volume type is pure HFS+primary MDB is at block 0 0x00alternate MDB is at block 0 0x00primary VHB is at block 2 0x02alternate VHB is at block 66846718 0x3fbfffesector size = 512 0x200VolumeObject flags = 0x07total sectors for volume = 66846720 0x3fc0000total sectors for embedded volume = 0 0x00


This seemed to suggest that, yes, there is an illegally named file somewhere, but that it can't be replaced because the replacement name already exists. Ew! I'm not sure what that means, but it does not sound good.

Undaunted (alright, maybe a little daunted), I decided to try cloning the system to see if I could get the name of the illegal file like I did last summer, using the asr command. I also thought it was possible that any filesystem damage, depending on the nature of that damage, might be repaired by cloning the system to a clean, undamaged filesystem. So I created a 40GB disk image, which actually took quite some time, probably because I was booted from a slow firewire drive. But it finally completed, and once it did I cloned my sick system partition to it. This also took a great deal of time over firewire. Like hours, actually. Like I actually had a good excuse to go to lunch for once. But it did finish successfully, and it never reported an illegal name in the process of cloning. So I ran Disk Utility on it, hoping that maybe the new filesystem did the trick. No such luck. Same error.

By this time I'd spent — wasted, actually — an entire day on this problem. A problem apparently caused by "font data" robbed me of an entire day of productive work and put me in a nasty mood to boot. My options spent, I did the thing I hate to do so very much: I rebuilt.

There is at least one silver lining to all this, and that is the magic of disk partitions. You see, it's for reasons just such as these that I partition my user data from my system and application data. Every Mac system I build has a partition called SysApps — that houses the system components and applications — and one called Work — that houses all the data I generate, from home account preferences to Final Cut files. In the above scenari o it was the SysApps partition that was corrupted, but the Work partition checked out just fine. This two-partition method offers numerous advantages in such a scenario. For one, after booting into the firewire drive it was a simple matter of telling NetInfo Manager to use my home account on the Work partition and I was right back to work checking email and the like, which is pretty crucial for me to be able to do. All my setups and preferences worked normally, my browser bookmarks were all there, my calendars were all intact, and I could at least work on actual work instead of being stopped dead in my tracks by this corruption problem. Secondly, in this case reformatting the SysApps partition was necessary. Had all my data been on that partition I would have had to back it all up somehow. And what if that corruption lay somewhere among my user data? I'm not sure what I would have done then. I may just have been screwed. But because my data was walled off, it was a non-issue. Thirdly, my Work data takes up the majority of the data on my system, and it's quite large — about 170GB. In a single-partition system I'd have had to blow all that away and restore it somehow. Backing up and restoring 170GBs of data takes a long time and would have significantly increased the amount of time I spent getting back on my feet. With my two-partition system, about 30GB was all I had to worry about. And all that cloning I did in the hopes of finding a fix? With my 30GB SysApps partition it was painful and time-consuming, but it was doable (though whether it was worthwhile is debatable). If I'd had to do that with 200GB of system, application and user data combined it would have been downright impossible.

Restoring the SysApps partition was a pain, to be sure. But there was nothing there that didn't already exist somewhere among the myriad software installation discs in my office, so it wasn't so bad. And there was a lot of stuff I could restore right from the backup I'd made with asr — things like drag-n-drop applications, crontab and host files, and the like. Troubleshooting the problem took about a day, but rebuilding the system took a few short hours, in part because most of my preferences reside in my perfectly preserved home account. It was mostly just a matter of cloning a working system (from my base Tiger install on my firewire drive), reinstalling a few broken applications (Final Cut, Adobe stuff, etc.), and running Software Update as needed. Along the way I checked the health of my SysApps drive, and no "Illegal name" errors were reported. Phew!

So that's the saga of how Adobe Legal hosed my hard drive. I'm not sure if the blame really lies with Apple or Adobe. What I do know is that I'm sticking with my two-partition system, and I'm permanently deleting all those "Legal" folders associated with Adobe products. I suggest you follow this latter step yourselves, 'cause, if left untreated, it's true what they say: It really can cause severe filesystem and hard disk corruption. I'm living proof.

Journaled Filesystem Magic

I think I just saw the journaled filesystem in action. Like so much in life (or maybe not enough), it was one part freaky and one part amazing. Here's what happened.

First, some quick background info. My computer is kinda fucked. I know this, and I really need to get it repaired. It's under warranty; I'm just being lazy about it. But basically, the motherboard is hosed which has resulted in the failure of at least one firewire port and erratic behavior from certain USB ports. But worse is the fact that it's resulted in general instability on my system. This instability is at least farly predictable, and is well documented in my system logs, which frequently — particularly after wake from sleep — list I/O errors on the ATA bus. I.e. suddenly disk access is blocked to/from the hard drive. When this happens the only recourse is a hard reboot. And today, in true form, it happened.

Now for the weird part. Just prior to the crash, I had been processing some PNGs in Photoshop for my last article. I saved out the originals, saved out the modified PSD versions, then moved all these to a new folder called "Originals." Then I saved the modified PSDs back out to PNG for publishing on the blog. Shortly after doing all this, the crash happened. After I force rebooted, I logged into my account only to find that those last steps had been forgotten: The original PNGs were there, the PSDs were there, but the "Originals" folder and the new, modified PNGs were all gone. Freaky!

I can only assume that this was a result of journaling in action. My admittedly shallow understanding of how journaling works is that it keeps a running record of the state of the filesystem — particularly the last known good state of said filesystem — and when a crash occurs it reverts the filesystem to that last known state of working goodness. So it would seem that what happened in my case is that, even though I was still able to write to disk for a short while before my crash, the last known good instance of the filesystem existed before those last changes. So they were lost when the journal did its stuff.

Like I said: kinda cool; kinda freaky. Makes you feel a little delusional. You remember performing these actions, and yet somehow they've disappeared. On the other hand, I can only imagine the state my filesystem would be in at this point if it weren't for journaling. And this is the first time I've ever had data loss in two years of dealing with this issue. So journaling is definitely good and it really does work, if my experience is any indication, and if I've interpreted it correctly. If anyone out there has a better explanation of journaling, or of what might have really happened here, I'd love to hear it.

Either way, perhaps I should take this as an omen that it's time to head in to the shop.

UPDATE:
dmesg had this to say:
HFS: Removed 12 orphaned unlinked files

Hmm...