Leopard umask

This is one of those I-keep-forgetting-how-to-do-this posts, so I'm writing it down. It's certainly been posted elsewhere, but I'm tired of going looking every time I need it. So here it is. In Tiger a simple defaults command could be used to modify a user's umask (a setting that controls the default permissions for newly created files and folders). Leopard, however, changes the way this is done. Leopard instead uses a launchd configuration file. To create a custom umask for all users of a system (i.e. all user-level processes):

  1. Create a file called launchd-user.conf.
  2. Place the file in /etc/
  3. Enter the property, then the umask setting in the file, like so: umask 002
  4. Restart the machine.

The restart may not be necessary, but if I recall it was the only way I could get it to work. If you don't want to reboot, you'll at least need to restart launchd and any application or process you want to use the new setting. Rebooting, though, is a nice catch-all.

Some additional info: if you want virtually all applications (i.e. system-level processes) to use a custom umask, you can leave the "user" off the file name. Using /etc/launchd.conf will have said affect, but it is not recommended by Apple (or me for that matter).

Setting a custom umask in general isn't something I recommend either, but it's damn handy in certain file sharing environments in which multiple users need access to the same stuff, but where ACLs — the preferred method for setting up complex file sharing permission sets — aren't an option. Creating a common group for the users and setting up their umask to create files and folders that are group-writable is an acceptable workaround in many scenarios. Which, by the way, is what the above setting will do. A umask of 002 will create files with permissions of 775 (the opposite of 002 — it's a mask, silly).

Okay then. Happy umasking!

Infrastructure

There are a bunch of legacy issues at my new job. Many of them, I believe (I'm not completely sure, I'm still pretty new after all), stem from the once heavy use of IRIX and its peculiarities. We are only just reaching a point at which we can do away, once and for all, with that platform. It's a very complex ecosystem we're dealing with here. Complex, and in some spots delicate. But what surprises me more than a little is that, despite the fact that my new job is at one of the most respected institutions in the country — and one of the more advanced computer labs as well — I find myself facing many of the same issues I tackled in my last position. Those challenges have a great deal to do with creating a simpler, more elegant, more efficient computing environment. And, though the user base has changed dramatically — we're dealing with a much more technically sophisticated group of professionals here, not students — and the technological and financial resources are much more vast, the basic goals remain the same, as do many of the steps to accomplishing those goals. And the one thing those steps all have in common, at least at this stage of the game, is infrastructure.

What makes infrastructure so crucial? And why is it so often overlooked?

Infrastructure is key to creating efficient, and therefore more productive, work environments, whether you're in art, banking, science, you name it. If your tools are easy and efficient to use you can work faster and make fewer mistakes. This allows you to accomplish more in a shorter period of time. Productivity goes up. Simple. Infrastructure is a lot like a kitchen. If it's laid out intelligently and intuitively you can cook marvels; if it's not you get burned.

Infrastructure, for our intents and purposes, is the back-end of the computing environment. Everything you do between computers — that is, every interaction that takes place between your computer and another computer, be it file server, authentication server, web server, what have you — relies on some sort of infrastructure. I'm referring to network infrastructure here, to be sure, but also to the processes for adding and accessing computer resources in any given facility. How, for instance, do we roll out new workstations? Or update to the latest operating system?

Typically, it is Systems Administrators — the very people who know (or should know) the importance of infrastructure — that tend to work between computers the most. We of all people should all know how important a solid infrastructure is for even the simple act of basic troubleshooting: if your infrastructure is solid and predictable, the paths to troubleshoot are greatly lessened and simplified, making your job easier and making you better at it at the same time. Yet infrastructure, time and again, is left to stagnate for a variety of reasons.

I'd like to enumerate a few of those reasons, at least the ones I suspect factor most strongly:

  1. Infrastructure is difficult Infrastructure planning, like, say, interface design, is complicated and often requires numerous iterations, coordinated effort and a willingness to change to successfully implement.
  2. Infrastructure requires coordination Infrastructure changes often require re-educating the user base and creating a collective understanding as well as clear policies on how things are supposed to work.
  3. Infrastructure is not sexy (to most people) The benefits of infrastructure reorganization are often not immediately apparent, or even immediately beneficial for that matter. You might not see the benefits until long after a reorganization.
  4. Infrastructure can be expensive If an infrastructure requires a major overhaul, the cost can be high. Couple that with less-than-immediate benefits and you tend to often meet with a great deal of resistance from the money people who feel that they'd be better served buying new workstations than a faster switch.
  5. Change is scary You know it is.

I've been extraordinarily lucky in that I've been able to learn about infrastructure in a sheltered environment — that of the Education sector — that allowed me copious downtime (unheard of elsewhere) and a forgiving user base. (Students! Pfft!) I'm still pretty lucky in that A) I'm working somewhere where people, for the most part, get it; and B) I have some time, i.e. I'm not just being brought in as a consultant. This last bit is really fortunate because it affords me both ample opportunity to gain an understanding of the environment I'm trying to change as well as the time in which to change it. This is not to say that this sort of thing can't be done in consulting. But it's certainly a much harder sell, and one I'm glad I don't really have to make to such a degree.

Still, with all that, I've got my work cut out for me.

When I first arrived on the scene, The New Lab was using (actually, still is to a large extent) NIS for user authentication. Now this is something I know a bit about, and I can tell you (if you even remember what NIS is anymore) NIS is very, very passe. And for good reason. NIS is like the Web 1.0 of user authentication: it uses flat files rather than databases and is extremely cumbersome and inflexible. Moreover, it is not well-suited to cross-platform operation. It is completely end-of-life and obsolete. To continue to invest in NIS is silly. So one of my first duties was to build an Open Directory server, which relies on numerous databases, each suited to authentication for a given platform. The OD server will be both easier to use (creating users is a breeze) and more capable than any NIS server could ever hope to be (by allowing cross-platform integration down the line, if desired). But so far, for some reason, no one's done this. Partly, maybe, it's just inertia: NIS works fine enough, it's not that big a problem. And maybe it's partly happening now because this is something I just happen to know a lot about and can make it happen quickly and effectively. Because of my background, I also see it as a huge problem: by slowing down the user creation process, you're hindering productivity. And not just physical productivity, but mental productivity. If I have to spend twenty minutes creating a user, not only have I wasted that time on something trivial, but I've expended far too much mental energy for a task that should be simple. And this makes it more difficult to get back to Work That Matters. Again, the beauty of being on staff is that I have time to introduce this gradually. To gradually switch machines over to the new server. To gradually get the user base used to the slightly new way of doing things before we move on to the next item up for bid.

So far, so good.

I've talked to my fellow co-workers as well, and they're all primed to make changes happen. That's really good. We're talking about all kinds of things: re-mapping the network, asset management with DHCP, redoing the scheduling system, and others I can't even think of right now. User authentication took years at my old job. It was, in many ways, a much more complex network than this new one (we don't manage an external network, thank God). But this place has its own set of complexities and challenges, and though the authentication server is basically done, there are a whole host of things I could see happening in the realm of infrastructure. And they're all right there... See them? Just over the horizon.

Should be fun.

There are a few basic things I like to keep in mind when preparing for and making major infrastructure changes. These are the types of problems I look for that give me purpose:

  1. Repeat offenders What problems crop up again and again on the user side? What questions get asked over and over? These are indicators that something is functioning sub-optimally, or that a process could be more intuitive.
  2. Personal frustration What parts of my job are frustrating or infuriating to me? These are usually indicative of a problem, as I tend to get frustrated with things that don't work well. Either that or I need more coffee.
  3. Redundant errors Is there a process that tends to yield mistakes on a regular basis? If so it could probably use some automation or clarification at some point. Sometimes all you need is a clear policy or workflow.
  4. Long-term problems Is there something that everyone complains about, but that just "never gets fixed?" Betcha ten bucks it's an infrastructure problem.
  5. The workflow How do people in the facility currently work? What's the pipeline for getting things done? Are they spending their time working on tech issues when they should be working on production? How could this be easier?

There are probably more, but these are the general things I'm thinking about when considering infrastructure changes. And the better I can understand the people and the technology in a facility the more informed my decisions can be with regards to those changes.

Finally, there are some basic notions I keep in mind when proceeding with infrastructure changes:

  1. Simplify The simpler solution is almost always best, both for the admin and the user. Building a simple solution to a problem is often exceedingly difficult, and I might point out, not necessarily simple on the back-end. But a simple workflow is an efficient one and is usually my prime directive.
  2. Centralize It's important to know when to centralize. Not everything benefits from centralization, obviously. If it did we'd all be using terminals. Or web apps. For everything. But properly centralizing the right resources can have a dramatic affect on the productivity of a facility.
  3. Distribute Some resources should be distributed rather than (or in addition to being) centralized. Some things will need redundancy and failover, particularly resources that are crucial to the operation of the facility.
  4. Educate Change doesn't work if no one knows about it. It's important to explain to users what's changing and also why. Though I've been met with resistance to changes that would actually make a user's job easier (this is typical), making them aware of what and why the change is happening is the first step in getting them to see the light.

It's true that infrastructure changes can be a bit of a drag. They are difficult. They're hard to justify. They piss people off. But in the end they make everything work better. And as SysAdmins — who are probably more intimate with a facility's resources than anyone — we stand as much to gain (if not more!) than our users. And they stand to gain quite a bit. It's totally win-win.

Default Shell Hell

There's a common occurrence in the world of systems administration. Once I describe it you'll probably all nod you're heads knowingly and go, "Yeah, that happens to me all the time." It happened to me recently, in fact.

I was attempting to set a Linux system to authenticate via a freshly-built LDAP server — something I've done many, many times — and it just wasn't working. I could authenticate and log in fine via the shell, but no matter what I tried, whenever I would attempt to log in to Gnome, I'd get an error message saying that my session was ended after less than 10 seconds, that maybe my home account was wonky or I was out of disk space, and that I could read some error messages about the problem in a log called .xsession-errors in my home account.

Of course, certain that my home account was fine and that I had plenty of disk space, the first thing I checked was the .xsession-errors log, which yielded little useful information, and which information led me on a complete and utter wild goose chase. From everything I could glean from this rather sparse log, there seemed to be a problem with Gnome or X11 not recognizing the user. I showed the error to some UNIX-savvy co-workers, one of whom demonstrated that, when booting into run-level 3, logging in and then starting X, login worked fine, thus proving my hypothesis. So began several days of research into Linux run-levels, Gnome, X11, PAM, NSS Switch and LDAP authentication on Linux. All of which was exceptionally informative, but which, of course, failed to yield a positive result.

The final, desperate measure was to scour every forum I could, and try every possible fix therein. And, lo and behold, there, at the bottom of some obscure post on some unknown Linux forum (okay, maybe not that unknown), was my answer: set the default shell. Could it be so simple?

But wait, wasn't the default shell set on my server already?

I checked my server, and sure enough, because of a typo in my Record Descriptor header, the default shell had not been set for my users. Seems X11/Gnome needs this to be explicitly specified in an LDAP environment, because in said environment it is (for some reason that remains beyond me) unable to read the system default.

Setting the default shell for users on my LDAP server (yes, it is a Mac OS X Server) did the trick, and I can now log in normally to Linux over LDAP.

So, after days of researching a problem the solution all boiled down to one, dumb, overlooked setting on my server, a fact I found referenced only at the bottom of some strange and obscure internet forum. Sound familiar? What, pray tell then, should we call this phenomenon? We really need a term for it. Or a perhaps an axiom? Maybe a law or a razor or a constant. Something like:

"For every seemingly complex OS problem there is almost always an astoundingly simple solution which can usually be found at the bottom of one of the more obscure internet forums."

A corollary of which might go something like:

"Always check the bottoms of forums first."

We'll call it Systems Boy's Razor. Yeah, that should do nicely.

If anyone has any better suggestions here, I'm always open. Feel free to let 'em rip in the comments. Otherwise, check your default shells, people. Or at least make sure you have them set.

Time Machine After Logout

I've been using Time Machine for a while now. And I've noticed some interesting things about its behavior. Of particular note, I've noticed that Time Machine does not back up your data when you are logged out. I found this strange until I figured out why this is the case.

I first noticed Time Machine not backing up logged-out users after setting up the staff computers here at work. Oddly, my work computer did back up when I was logged out, which I realized when I noticed a backup failure due lack of drive space. According to the Console logs this backup attempt had occurred in the middle of the night. Clearly Time Machine was able to backup when users were logged out, but it would only do so on my machine. So what was the difference?

By default, Mac OS X wisely un-mounts external volumes when a user logs out. This makes sense for a number of reasons, not the least of which is the fact that it's what users expect, and it's the least likely to break something if a user logs out and pulls their firewire plug without ejecting their disk. It's a very sane default that errs on the side of data protection. But it's not always what you want. For instance, say you have network shares, like external RAID drives, that are connected via firewire (which, in fact, we do). Or say your network backups that run in the middle of night get stored on a firewire volume (which ours do). If you want these drives constantly available you need to be able to keep them mounted even when no user is logged in. Fortunately, Apple provides a method for doing this, though it's by no means obvious.

The trick to keeping external drives mounted after a logout lies in a little .plist file. The name of this file is autodiskmount.plist, and it does not exist by default; you have to make it. In the file should be the following text:

<?xml version="1.0" encoding="UTF-8"?>

<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">

<plist version="1.0">

<dict>

<key>AutomountDisksWithoutUserLogin</key>

<true/>

</dict>

</plist>

Put this file in:

/Library/Preferences/SystemConfiguration/

And reboot. (Yes, reboot.)

Now all external drives (firewire, USB, eSATA, etc.) will stay mounted after a logout. If they're shared, they'll always be available. And they'll always be available to Time Machine.

So the difference between the staff machines and mine? My computer is set to never unmount drives at logout. Apparently, Time Machine is perfectly capable of running even when no one is logged in. But it obviously needs the Time Machine drive available to do so. Keeping firewire drives mounted post-logout will allow Time Machine to work all night long. Sweet!

And since I'm so crazy with the Installer Packages these days, I'm including one here that will install the necessary preference file to make all this happen. You know, just to make your lives a little easier.

Download KeepExternalDisksMounted

You're welcome!

UPDATE 1: A reader asked in the comments how I came to have the preference file installed on my system. I'd put it there long ago because I needed firewire drives mounted for rsync backups of staff machines. But I certainly didn't figure out how to create that file myself. Credit for that goes to this Mac OS X Hints hint. It's got all the details if you're interested.

UPDATE 2: One other thing I forgot to mention: Why is this useful? I mean, if you're logged out you're not really capable of creating any new data, so there's nothing really new to backup anyway, right? This is mostly true, indeed. But imagine your boss uses Time Machine for his hourly backups. Now imagine he creates a whole buttload of data — I don't know, emails to the CEO, photos of his kids, whatever — and he creates this data right before he leaves for the day. Then, safe in the knowledge that Time Machine's got his back, he logs out and goes home for the weekend. That weekend there's a power surge or something, and his machine is fried. "No problem," he thinks, "I have my backup." But his most recent data is gone. His photos, his draft to the CEO, gone. And guess who's to blame? Yup. The Systems Admin. Your ass is grass, and Time Machine is the lawn mower. (Uh, this is why I don't write in the mornings.)

Personally, I think it would be smart if Time Machine asked you at logout if you'd like to make a backup, or at least warned you that backups would not be performed after logout. This seems like a bit of an oversight on Apple's part.

The other time this can be useful is when you're creating your first backup. This is typically a lot of data. Here in the office we told folks to let it run overnight. But they couldn't log out. So we dropped them to the Login Window with Fast User Switching. Still, it would have been that much more intuitive if we'd just told them to log out like they always do, and that their backups would be ready in the morning.

So yeah, not earth-shattering, but still potentially useful. And interesting on an academic level to know that Time Machine will run sans login.

I have to go install that preference file on my staff machines now. Bye!

NetBoot Part 5

So far this NetBoot/NetInstall thing is working out a thousand times better than I ever thought it would. I wish I'd done this years ago. Not only does it save time, it also reduces errors. This is often one of the most overlooked features of automating a process: the less human interaction in the process, the fewer mistakes can be made. I have only to compare the set of instructions I gave to last year's crew for building a new system to the instructions for using the new NetInstall system to see evidence of this truism. The list of human actions to take — and, thus, potentially screw up — is significantly shorter using the new process. And that's a beautiful thing.

At this point I've converted about half the staff to Leopard with the NetInstall system, and for the most part it's been quick and painless for both me and them. Contrast with years past, where upgrading staff computers — which are both the most customized, and the most important to preserve the data of — has been fraught with tension and minor hiccups. This year I almost feel like I've forgotten something, it's been so easy. But staff would surely let me know if there were problems. (I'm so knocking wood right now.)

I've also had an opportunity to test building multiple machines simultaneously. Yesterday I built five Macs at the same time, and, amazingly, all five built in about the same time it takes to build one — about a half an hour. I'm astounded. We should be able to build our new lab workstations this summer in a day. And still have time for a long lunch. And for the most part I'll be able to offload that job to my assistants.

As I finish up the system, I've realized some things. First of all, it sort of reminds me of software development — or at least what I imagine software development to be like — because I'm building little tiny components that all add up to a big giant working whole. Also, as I write components, I find myself able to reuse them, or repurpose them for certain, specific scenarios. So, in a sense, the more I build, the easier the building becomes, as I imagine is true in software development. Organization is also key. I find myself with two repositories: one contains the "build versions" — all the resources needed to build the packages — and one contains the finished products — the packages themselves — organized into something resembling the physical organization (packages for staff computers in one area, packages for workstations in another, for instance). It's shockingly fascinating to work on something like this, something that's built from tiny building blocks and that relies very heavily on good organization. I'm really enjoying it so far, and I'm a little sad that the groundwork is built and it's nearly done. There's just something fundamentally satisfying about building a solid infrastructure. I guess that's just something I innately like about my job.

The next step in this process, as I've alluded, will be to do a major build, i.e. our new batch of workstations when they come in the summer, and an update of all our existing computers — all-in-all about 40 machines. Between now and then there are sure to be some updates, so I'll probably update my base config before we do the rest of the lab. And then will come the fun. I will report back with all the juicy details when that happens, in what will probably be the final installment of this series.

See you in summertime!