When the Cure is Worse than the Disease: Mac Anti-Virus Software

Getting anti-virus software for the Mac is like getting chemotherapy for a cold. It's totally overkill and does way more harm than good. You're better off with the cold.

Via MacFixIt:
In yet another case of AntiVirus software causing serious issues while purporting to be identifying infected files, it appears that Sophos' AntiVirus software is generating false positives for the "OSX/Inqtana.B worm", invoking users to delete critical application and system files and causing serious issues.

Again, the virus being identified by Sophos AntiVirus is marked Inqtana.B -- apparently a variant of the Inqtana.A malware that likewise spreads by copying itself to other computers via a bluetooth connection.

As previously reported, OSX/Inqtana.A -- a Java based proof of concept bluetooth worm that affects older versions of Mac OS X 10.4.x (Tiger). The vulnerability does not affect Mac OS X 10.4.5, and has not been found in the wild.

Despite that, Sophos' software is identifying "infected" files -- sometimes numbering in the thousands -- on Mac OS X 10.4.5 systems.

The results of the false positives are, in some cases, disastrous...

...We currently recommend that users disable Sophos AntiVirus until further notice, and disallow the application to automatically delete any files it deems "infected."

That really says it all. The state of commercial Mac anti-virus software is pathetic. It seems like the developers of this software are desperately trying to drum up business with scare tactics for viruses that don't even exist in the wild while simultaneously writing code that damages people's systems. Fucked up? You betcha.

I'm all for virus protection, even on the Mac. But when anti-virus software is worse than the viruses it claims to protect against, it's no wonder no one's buying it.

Mac OSX Deletes Open Documents

Astoundingly fucked up, yet true. I just got an email from a client who said she'd accidentally deleted a file she had open, and when she closed said document it was gone. I was all set to email her back saying that her file could not have been deleted if it was open, because the Mac OS won't allow that sort of thing, but I thought I'd try it first, just to make sure I wasn't talking out my ass (which is something you learn to do after many years in systems work — try stuff before you speak, that is, not talk out your ass, though that does come with the territory as well I suppose.)

So I tried it. I opened up DVD Studio Pro, created a new document, saved it to the Desktop, and then — with the document still open, mind you — I deleted it. Put it in the trash and deleted it. I received no warning message. And when I quit DVDSP, I was not prompted to re-save the document. It was gone, daddy, gone.

As astoundingly fucked up as this is, I do understand exactly why this happens. You see, active documents — that is, documents which are currently open — are stored in a temporary location, not, as one might suspect, where they are saved. The saved document and the open document are two separate files. Each application determines where the temporary storage location for open files will be. In the case of Final Cut docs, for example (and I know this from cold, hard experience), the temporary storage location is in a folder called .TemporaryItems on whatever drive is the active working drive. So, since the saved file and the active file are actually separate, it's very possible to apparently delete a file that's open in an application, because the file you're deleting is not the open file you're currently working on. It's a saved version.

I don't know if I mentioned that this is fucked up, but it really is. Astoundingly so.

In the same way that you can't delete running processes, I think Apple really needs to make it equally impossible (or at least as difficult) to delete active files. The current paradigm just doesn't make any kind of sense from where I sit.

UPDATE:
A reader has pointed out in the comments that one application actually works the way you might expect. That application is Microsoft Word. A saved Word document and an open Word document are (apparently) one-and-the-same, and Word is able to keep track of open files even when they're moved. If the fact that Word behaves more sensibly than most Apple applications isn't an indication that this is something that needs fixin', I'm not sure what is.

Re-Binding to a Mac Server

Last semester we had a lot of problems in the lab. Our main problems were due to two things: the migration to Tiger, and problems with our home account server. Our Tiger problems have largely gone away with the latest releases, and we've replaced our home account server with another machine, and, aside from a minor hiccup here and there, things seem to have quieted down. The Macs are running well, and there hasn't been a server disconnect in some time. It's damn nice.

There has been one fairly minor lingering problem, however. For some reason our workstation Macs occasionally and randomly lose their connection to our authentication server — our Mac Server. When this happens, the most notable and problematic symptom is users' inability to log in. Any attempt at login is greeted with the login screen shuffle. You know, that thing where the login window shakes violently at the failed login attempt. This behavior is an indication that the system does not recognize either the user name or the password supplied, which makes sense, because when the binding to the authentication server is broken, for all intents and purposes, the user no longer exists on that system.

I've looked long and hard to find a reason for, and a solution to this problem. I have yet to discover what causes the systems to become unbound from the server (though I'm starting to suspect some DNS funkiness, or anomalies in my LDAP database as the root cause at this point). There is no pattern to it, and there is nothing helpful in the logs. Only a message that the machine is unable to bind to the server — if it happens at boot; nothing about why, and nothing if it happens while the machine is on, which it sometimes does. It's a mystery. And until recently, the only fix I could come up with was to log on to the unbound machine and reset the server in the Directory Access application. Part of my research involved looking for a command-line way to do this so that I wouldn't have to log in and use the GUI every time this happened, as it happens fairly often, and the GUI method is slow and cumbersome, especially when you want to get that machine back online ASAP.

It took me a while, but I have found the magic command, at a site called MacHacks. Boy is it simple. You just have to restart DirectoryService:

sudo killall DirectoryService

This forces the computer to reload all the services listed in the Directory Access app, and rebind to any servers that have been set up for authentication. I've added the command to the crontab and set it to run every 30 minutes. That should alleviate the last of our lab problems.

Hopefully the rest of this semester will be as smooth sailing as the past two weeks. I could use a little less systems related drama right now.

Tiger Lab Migration Part 11: Panasas Crashes and Caches

The last time we visited this topic, I thought we were done. Well, turns out I was wrong.

Things are, for the most part, working well now. Finally. We're running Tiger and we've managed to iron out the bulk of the problems. There is one issue which has persisted, however: the home account RAID.

To refresh, our network RAID, which is responsible for housing all our users' home account data, is made by a company called Panasas. Near as I can figure, we've got some experimental model, 'cause boy does it crash a lot. Which is not what you want in a home account server, by any means. After upgrading the Panasas OS awhile back, the crashing had stopped. But it was only temporary. Lately the crashing is back with a vengeance. Like every couple of days it goes down. And when it goes down, it goes down hard. Like physical-reset hard. Like pull-the-director-blace-and-wait hard. Like sit-and-wait-for-the-RAID-to-rebuild hard.

Again: Not what you want in a home account server.

So we've built a new one. Actually, we've swapped our backup and home account servers. See, awhile back we decided it would be prudent to have a backup of the home account server. Just something quick 'n' dirty. Just in case. This was built on a reasonably fast custom box with a RAID controller and a bunch of drives. It's a cheap solution, but it does what we need it to, and it does it well. And now we're using it as the new home account server. So far it's been completely stable. No crashes in a week-and-a-half. Keep in mind, this is a $3000 dollar machine, not a $10,000 network RAID. It's not that fast, but it's fast enough. And it's stable. By god it's stable.

And that's what you want in a home account server.

Moving to the new server — which, by the way, is a simple Linux box running Fedora Core 4 — has afforded us the opportunity to change — or, actually, revert — the way login happens on the Macs. In the latter half of las semester, we were using a login hook that called mount_nfs because of problems with how Mac OS X 10.4.2 handled our Panasas setup, which creates a separate volume (read: filesystem) for each user account. Since we're now just using a standard Linux box to share our home accounts, which are now just folders, we have the luxury of reverting to the original method of mounting user accounts that we used last year under Mac OS X 10.3. That is, the home account server is mounted at boot time in the appropriate directory using automount, rather than with mount_nfs at login. Switching back was pretty simple: Disable the login hook (by deleting root's com.apple.loginwindow.plist file), place a Startup Item that calls the new server in /Library/StartupItems, reboot and you're done, right? Well, not quite. There's one last thing you need to do before you can proceed. Seems that, even after doing all of the above, the login hook was still running. I could delete the scripts it called, but it would still run. Know why? This will blow your mind. Cache.

Yup. It turns out — and who would have ever suspected this, as it's so incredibly stupid — login hooks get cached somewhere in /Library/Caches and will continue to run until these caches are deleted. I'm sorry, but I just have to take a minute and say, that is fucked up. Why would such a thing need to be cached? I mean maybe there's a minimal speed boost from doing this. The problem is that now you have a system level behavior that's in cache, and these caches are fairly persistent. They don't seem to reset. And they don't seem to update. This is like if your browser only used cached pages to show you websites, and never compared the cache to files on the server. You'd never be able to see anything but stale data without going and clearing the browser cache. At least in a browser — 'cause let's face it, this does happen from time to time (but not very often) — there is always some mechanism for clearing caches — a button, a menu item, a preference. In Mac OS X there is no such beast. In fact, the only way to delete caches in Mac OSX is to go to one or all of the various Cache folders and delete them by hand. Which is what I did, and which is what finally stopped the login scripts from running.

If this isn't clear evidence that Mac OS X needs some much better cache management, I don't know what is.

In any case, we're now not only happily running Tiger in the lab, but we've effectively switched over to a new home account server as well. So far, so good. Knock wood and all that. Between the home account problems, the Tiger migration, and getting ready for our server migration, this has been one of the busiest semesters ever. Though I keep this site anonymous because I write about work, I just want to give a nod to all the people who've helped me with all of the above. I certainly have not been doing all of this alone (thank god) and they've been doing kick-ass work. And, though I can't mention them by name, I really appreciate them for it. At the dawn of a new semester, we've finally worked out all of our long-standing problems and can get down to more forward-looking projects.

So ends (I hope) the Tiger Lab Migration.

Three Platforms, One Server Part 4: Redundancy

One of the major hurdles in our server unification project, mentioned in Part 1 of this series, is that of redundancy. In the old paradigm, each platform's users were hosted by a separate server. Mac users authenticated to a Mac Server, Windows users to a Windows Server, and Linux users to an NIS server. While this is exactly what we're trying to avoid by hosting all users on a single server, it does have one advantage over this new approach: built-in redundancy. That is, if one of our authentication servers fails, only the users on the platform hosted by said server are affected. For example, if our Windows Server fails, Windows users cannot login, but Mac users and Linux users can. In our new model, where all authentication for all platforms is hosted by a single server, if that server fails, no user can log in anywhere.

Servers are made to handle lots of different tasks and to keep running and doing their jobs under extreme conditions. To a certain extent, that is the very nature of being a server. To serve. Hence the name. So servers need and tend to be very robust. Nevertheless, they do go down from time to time. That's just life. But in the world of organizations that absolutely must have constant, 24 hour, 'round-the-clock uptime, this unavoidable fact of life is simply unacceptable. Fortunately for me I do not inhabit such a world. But, also fortunately for me, this notion of constant uptime has provided solutions to the problem of servers crashing. And while I probably won't lose my job if a server crashes periodically, and no one is going to lose millions of dollars from the down-time, no SysAdmin likes it when he has to tell his users to go home for the night while he rebuilds the server. It just sucks. So we all do our best to keep key systems like servers available as much as possible. It's just part of the deal.

So how are we going to do this? Well, one of the reasons I decided to use a Mac for this project is that it has built-in server replication for load balancing, and, yes failover. We're not too concerned with the load balancing; failover is what we're after. Failover is essentially a backup database that is a replica of a primary database, and that takes over in the case of a failure of the primary database. Mac Server has this built-in, and from what I read, it should be fairly easy to set up. Which is exactly what we're about to do.

The first thing we need is our primary server. This is the main server. The one that gets used 99% of the time (hopefully). We have this (or at least a test version of it) built already as discussed in Part 1. What we need next is what is called the replica. The replica is another Mac OSX Server machine that is set to be an "Open Directory Replica," rather than an "Open Directory Master."

So I've built a plain old, vanilla, Mac Server, and set it initially to be a Standalone Server. I've given it an IP address, and done the requisite OS and security upgrades. (Oy! What a pain!) In the Server Admin application, I set the new server to be an "Open Directory Replica." I'll be asked for some information here. Mainly, I'll need to tell this replica what master server to replicate. Specifically I'm asked to provide the following at the outset:

IP address of Open Directory master:

Root password on Open Directory master:

Domain administrator's short name on master:

Domain administrator's password on master:

(The domain administrator, by the way, is the account used to administer the LDAP database on the master.)

Once I fill in these fields I'll get a progress bar, and then, once the replica is established, I'm basically done. There are a few settings I can tweak. For instance, I can set up secure communications between the server with SSL. But for my purposes, this would be overkill. I'm pretty much going with the out-of-the-box experience at this point. So for setup, that should be it. Setting up a replica is pretty easy stuff.

Establishing the Replica: Could it Be Any Easier?

(click for larger view)

Now here comes the fun part: testing. What happens if our primary server goes offline? Will the replica take over authentication services? Really? I'd like to be sure. What I'm going to do now is test the behavior of the Master/Replica servers to make sure it acts as intended. The best way I know to do this is to simulate a real-world crash. So I am binding one of my clients to my Master server, with Replica in place. Then I'm going to pull the plug. In theory, users should still be able to login to the bound client. Let's try it...

Bang! It works! I'm a bit surpsrised; last time I tried it, years ago, it (or I) failed. This time, though, it worked. We bound a client to the Master, our mother-ship server. Authentication worked as expected. (We knew we were bound to the new server because the passwords are different.) And then we killed it. We killed the master and logged out. There was some beachballing at logout. But after a few minutes -- like two or three, not a long wait at all -- we were able to complete logout, and then log right back in as though nothing had happened. I tell you, it was a thing of beauty.

So let's briefly recap where we've been and what's left to do.

Where we've been:

  • We've built our Mama Server. Our authentication server for the entire lab.
  • We've figured out how to migrate our users to Mama, and how to handle the required password change.
  • We've solved the inherent problems with Windows clients and figured out a few solutions for handling them involving quotas and various roaming profile locations.
  • We've built and tested the operation of the Open Directory Replica, and it is good.

What's left to do:

  • Well, honestly, not a whole Hell of a lot.
  • The next step, really, is real-world testing. We have a basic model of how our servers and clients should be configured, and it's basically working. To really test this, we'll need to take some actual clients from the lab and set them up to use the new system.
  • Stress testing (i.e. seeing if we can break the system, how it holds up under load, etc.) would also be good, and might be something to do over Winter break a bit, and definitely in the Summer. To do this, we'll need to set up several client systems, and get users (guinea pigs) to do some real work on them all at the same time.
  • Once stress testing is done, if all is well, I'm pretty sure we can go ahead and implement the change. I can't foresee any other problems.

So I'm at a stopping point. There's not much else I can do until the break, at which point I'll be sure and post my test results.

Hope to see you then!