Tiger Lab Migration Part 6: Base Config

So this is the part of this epic in which I build what I call the "Base Config" or "BC." The idea behind the BC is simple, really: Build a machine that's got it all (well, almost all), from which all subsequent machines in the lab can be cloned. Building the BC is always a little scary, because any mistake I make on the BC will be propegated to about 25 machines, and consequently will have to be corrected on said 25 machines. So I've got to be careful and thorough in my planning.

Essentially, all my machines are the same, or at least share the same core: the latest and/or greatest version of Mac OS X, major applications from Adobe, Macromedia, Microsoft, and of course Apple, and some smaller applications here and there, mostly utilities and drivers or things like Suitcase. These things go on every Mac in the lab. So they go into the BC Mac as well.

In addition to the OS and the applications, there are some admin things that need to get done: We have some custom scripts and dock items we like to put on the Macs, as well as a Startup Item to mount our home account server via NFS. And, of course, Directory Access must be configured to get authentication, and whatever other services we set up, from our Macserver. Then each preference pane in System Preferences should get configured the way we want. Finally, we add a few things to /etc/hosts and there are a couple cron jobs that need to get setup. And I believe that's it.

And, like I said, I hope that's it, because if it's not -- if I've missed anything -- I'll be paying for it later.

Here's where lists start to come in real handy:

Mac OS X 10.4.2

  1. Install OS
  2. Install all Software Updates

• Local User Accounts

  1. Me (admin)
  2. Lab Assistant (admin)
  3. Student (generic non-admin)

System Preferences

  1. Configure All

Adobe

  1. Photoshop
  2. Illustrator
  3. InDesign
  4. Acrobat
  5. AfterEffects

Apple

  1. XCode
  2. Final Cut Pro Suite (FCP, DVDSP, Motion)

Macromedia

  1. Director
  2. Studio

Microsoft

  1. Office 2004

Other Software

  1. Stuffit
  2. Suitcase
  3. USB Serial Drivers
  4. WACOM Drivers
  5. KeyServer Software

Admin Junk

  1. Configure Directory Access to authenticate against MacServer
  2. Mount Home Account Startup Item
  3. Admin Scripts (local delete, quota alert)
  4. Add servers to /etc/hosts
  5. Add cron jobs (local delete)
  6. Spotlight Disable Script (so that home accounts do not get indexed)
  7. Application Menu

So, that should do it. I'll build this, start testing it, and add anything to the list I forgot. But that's pretty much it. Once this is built and working well, it will be time for the trial by fire. We'll start cloning this machine to the other workstations. This year will be extra special fun, because not only will we be cloning these, we'll also be wiping and repartitioning the internal drives of all our machines. Fortunately I have lots of firewire cables, and very capable and energetic Lab Assistants who are ready, willing, and able (and paid, for that matter) to help me out with all this.

And one last side note: As I build this machine, just for fun, I may create disk images along the way of slightly leaner builds than the final. Like a build with just the OS, then one with just the commercial apps, then one with the drivers, and finally one with all the fixin's. This way I have the various stages available to me in case I need to build, say, staff machines, from a leaner base system, or in case I screw something up and need to go back a step or two, I won't have to start completely from scratch.

So that's the plan. I'll let you know how it goes.

UPDATE 1:
I've just finished the first stage: installing and updating the system software. The OS is at 10.4 2 and all Software Updates have been applied. I have also configured my account, and the other local accounts, and configured all the System Preferences. I have created a disk image of this install, called SysAppsBC-BaseOS.dmg, and scanned it for ASR.

Tiger Lab Migration Part 5: File Sharing Broken

I've been building my Master Tiger system. Actually, it's built, and I've been testing it. And I was going to make it my Radmind Master Client, but I've since scrapped my Radmind plans altogether. So fine. It's now just a matter of uninstalling Radmind from the Master and testing it and making sure all is well. Well, all is not well. Somewhere along the line, Apple File Sharing broke. I don't know exactly how or why. All I know is that, yesterday, after uninstalling Radmind, I tried connecting to the Master from my admin machine via AFP. The share mounted for a few minutes, then my Finder beachballed and I got a new error message alert. I'd never seen this one before:

Pressing the "Disconnect" button does just what you'd think: It disconnects you from the now defunct share.

So, looking into this a bit more, I've discovered that the AppleFileServer process on the Master is crashing whenever I try to connect to it via AFP. After the crash there is copius and completely useless output in the AppleFileService.crash.log, and Personal File Sharing is off in the Sharing System Preference pane. I can connect in the opposite direction -- from the Master to my admin box. File Sharing on the Master, however, seems hopelessly broken.

I'm not sure what to do.

I figure I have three options: 1) I can spend forever and a day trying to figure out what went wrong and maybe fix it, 2) I can wait for the much-anticipated 10.4.2 update and hope that whatever got broken on this machine gets overwritten with fresh new copies from the update, or 3) I can wipe and reinstall. Since it's likely I'll have no luck figuring out the problem, and since I'm not really the wait-and-hope type, I think I'll go with option 3.

Time to build another system.

I hate Tiger.

Tiger Lab Migration Part 4: Spotlight Worries

Here's an interesting little gotcha. Not sure if it's a good thing or a bad thing yet.

Probably bad.

In our lab, home accounts are on a server. Or, more accurately, they're on an NFS mount that is shared from a network RAID. The way this works is fairly simple, but kind of tricky. The Macserver handles the authentication for network users who log into client workstations. The clients always have the NFS RAID mounted at /home. When users log in, Macserver specifies their home accounts as /home/username. To make sure the RAID is always mounted, we use a little startup item that's just a very simple automount script to call the RAID and mount it in /home.

There's one other little thing that's weird about our setup, and that's the way the RAID is configured. Our RAID is a very nice, but proprietary system made by a company called Panassas. The way user accounts are created on the RAID is unique, and I don't fully understand it, as I did not set the RAID up, nor do I maintain it. But essentially, from my understanding, each user's home account on the RAID is a separate partition.

Does anyone see the problem here?

Well, I won't keep you in suspense. If you haven't figured it out, here's the problem -- and the more I think about it, the more I realize that it is a problem and not a boon: Whenever a user logs in, a new partition is mounted via NFS. And guess what happens then. You guessed it (or maybe you didn't): Spotlight starts indexing.

Holy fucking shit.

I have 207 users currently on the RAID. They have quotas between 2 and 7 gigs. And they're mostly completely and totally unaware of Spotlight and its idiosyncrasies. If one of them logs in and then, say, shuts down the machine (it will happen, trust me), the Spotlight index will get hosed and the machine will most assuredly begin acting flakey. Or how about this: What if a user logs in, checks his email, then logs out? Then what if another user logs in, does same, and logs out? What if five users come along and do this? Now we've got spotlight indexing five different network mounts on the same machine. What if that machine then gets rebooted before indexing is complete? I shudder to think. Or, what if a user logs in to a machine, indexing begins, she logs out and then logs into another machine? Now Spotlight is indexing the same mount point -- accessing the same database -- from two different machines. I'll say it again: Holy fucking shit.

This is a recipe for disaster.

Fortunately, I am an expert in the various methods for turning off Spolight. And that's what we'll have to do: turn off Spotlight for all 207 mount points. This isn't really that big a deal. One simple command should do it (I hope). But it gets me thinking about all the various other Spotlight related problems we're bound to encounter. For instance, our users do a lot of video, and they're encouraged to use firewire drives for this. Well, firewire drives are indexed as soon as they're mounted. I have no way to change this. What happens if indexing on a firewire drive is stopped (i.e. the user unmounts his/her drive) before it's complete? Now we have a hosed index on the user's firewire drive. Next machine he/she goes to will try to index the drive again, possibly completing the index, possibly not. And, during the indexing period, will performance drop to levels that do not permit video editing? I just don't know. But if they do, it's going to be a real problem. And just how do you educate 200+ users about this? It's way over most people's heads. This is a technology that is supposed to "just work." Unfortunately, in a multi-user, networked environment like ours, my worry is that it will "just break."

I'm feeling very hesitant about this migration. It wouldn't be the first time Apple's plans for the home user have come at the expense of the networked lab user. They often seem to forget about us, even though, in many ways, it's this sort of environment for which OS X is so great. Ironic. But if you think about it, one of the greatest features of Tiger -- Spotlight -- is completely useless in a networked environment. In fact, Spotlight is not even supposed to run on networked volumes. (Why it does on ours, I do not know, though I suspect it's because we're using NFS.) But the firewire thing is really disturbing, and I think really underscores the need for significantly more control over the behavior of Spotlight. If, in the Spotlight Preferences, there were a checkbox for "Disable Spotlight Indexing on External Drives," I'd be the happiest man alive right now.

As it is, I'm just plain worried.

UPDATE:
Another thought occurs to me: Okay, so I disable Spotlight on all 207 accounts. Well, what happens when we create a new account? Spotlight needs to be turned off for that account too. So basically, what this amounts to now, is a script that gets run at least every time a new user account is created -- possibly at every login, just to be safe -- that disables Spotlight on all mounted home accounts.

Oh joy.

Tiger Lab Migration Part 3: Radmind

Okay. So, Tiger client is working. Moreover, Tiger client seems to work with my Panther Mac Server. And I have a backup disk image of my working Tiger install, should anything go amiss. Time to start setting up Radmind.

The Logic:I have about thirty Mac systems to maintain in a lab set up for art students doing all manner of computer-based art, including: web design, graphics, video, audio, interactive authoring (from screen-based to installation art), and some 3D. Certain software -- like the operating system, for instance, is installed on all the machines. Certain software -- like Max/MSP, for which we have only so many licenses -- is installed only on select machines. Also, some systems are workstations used by students for the creation of their work, but there are also staff machines which serve vastly different purposes and are set up very differently. This means we have multiple hardware/software configrations for the various systems in our department. Keeping these machines up to date can be very challenging. Not only do we need to keep tabs on which systems have which software, we also need to keep tabs on which systems have been recently updated and which ones are in need of being updated. In the past, this has meant keeping a database of system configurations, logging into or polling (via ARD or some such utility) systems to see which ones need updates, and, when updates are required, personally sitting at machines and running software updates by hand. This process is tedious, inefficient, and more importantly, quite error-prone: Do one thing differntly on one machine, and you've suddenly introduced inconsistencies throughout the lab. And with no way to track them, or even revert them should the need arise.

Clearly what is needed is a centralized system for software and OS update management and reversion, whereby changes to be made to workstations on the lab floor can be applied to a single system, tested, and then propagated to the appropriate systems during off-hours or scheduled maintenance times. Radmind is such a solution. And, miraculously, it's completely free.

Wow.

The Goal:We have several different configurations of Mac on the lab floor, and in the various staff offices. Before we start, let's outline them: We have Basic Workstations (BWs) with a basic (though still quite large) set of software; we have what I'll call Max Workstations (MW), which are essentially the same as the BWs, but with a Max/MSP added; we have Physical Computing Workstations (PW), which have the Max/MSP config plus certain drivers required for programming Basic Stamp and the like; we then have Staff Workstations (SWs), which have a leaner software set overall, but which also have software not generally found on the public workstations; next, we have Audio Worksations (AWs), which have the basic set plus -- you gussed it -- audio software and drivers; and finally we have Video Workstations (VWs) that are set up like the Basics, but with a few video do-dads to boot.

A quick note about the Audio and Video workstations: I share maintenance of these machines with our A/V SysAdmin. Essentially, he manages them, but I provide him with a baseline OS install (by way of some sort of cloned image) and advise him with regards to OS updates and the like; he installs, configures, manages and updates any audio- or video-specific software not found in my base config. This area of the lab will be tricky to control with Radmind, particularly the Audio Stations as they often require certain hardware to be available for the software to be installed. Also, since our A/V guy handles upadates to those systems, and since I do not (and this is a good thing), using my Radmind system for A/V updates might prove tricky. I will save the A/V systems for last, and figure out how best to handle them later. Fortunately, Radmind allows for this sort of gradual implementation. Ultimately, though, I may leave them out of my Radmind setup.

The ultimate goal will be to set up one alpha workstation that has everything required for every configuration. Then Radmind can be used to create subsets of this uber-station (henceforth referred to as the Master Client or "MC"), for propagation to workstations with less than the maximum of software installed. The MC will be built around the configuration of the Physical Computing Workstations, as those systems have all the software needed on any other system.

The Process:The first step is to set up the Radmind server, which will be my admin box. That's really easy, and is done. It's simply a matter of downloading the Radmind software packages, and then running the Radmind Assistant application. These can be found here. In the Radmind Assistant I just set my system up as a server. That's all. Oh, and I made sure to set it to use Bonjour for discovery. This makes server discovery from the client a breeze. The client simply looks on the network, via Bonjour, for any Radmind servers, and when it finds them it gives you a list of available server IPs to chose from. Nice.

The next step is a bit more complex. It's time to build the MC. To begin, I am doing a fresh install of Tiger, running all current system updates (were on 10.4.1 as of this writing), and then setting this base install up the way I want it. A surprising amount of stuff gets set at this stage: Network, Energy Saver, Sharing, QuickTime Pro (license and settings), Accounts, and Security preferences all get set here. I am setting up my two local admin accounts at this stage as well. Also, I'm setting up binding to my Macserver for authentication, as well as installing my custom NFS mount Startup Item for mounting our home account RAID. Finally, I will install Radmind, of course. What I want to end up with is a very basic, clean system that represents the bare minimum installation for running in the lab, with no third party apps yet installed, and no customization of cron or any login scripts or anything, except Radmind, which should be set up as well, and should be part of the Base Install on the server so that it can create Radmind-controllable clones of itself, which can then be easily updated. This is my Base Install.

Once the Base Install is done, I will configure the machine to be a Radmind client that is controlled by the Radmind server. The MC will doesn't know it's the MC, and it doesn't really have to. In fact Radmind doesn't even need to be aware of this. The concept of the MC is really for us humans. So the MC will be set up as a client, just like any of the other clients. Essentially, Radmind will then begin using this machine to set up lists of files. These lists are what's really important. The lists will be used to compare files on various clients. Additional clients will be updated based on these lists.

At least that's how it's supposed to work.

Failure:I spent an entire day attempting to set up my MC with Radmind. You need to do two things on the MC before you can really get down to business with Radmind: 1) create a negative transcript for the server to use, and 2) create a positive transcript for the server to use. These are the most basic, fundamental lists that the server uses to compare against clients. The negative transcript is a list of files that should not be propagated to clients, and the positive transcript is a list of files that should get propagated. For some reason, I had endless problems creating these transcripts. The first problem was a discrepency between how the GUI application creates transcripts, and how it reads them, by default. The GUI app is set to "Begin transcript comparison from this path: / (slash)" whereas, the default transcripts created by the application use ./ (dot slash) at the beginning of their file paths. So, right out of the box, the Radmind GUI fails horribly, and my first, vanilla, Radmind-built negative transcript generated all manner of error message. Changing the defaultsfixed it, and I was able to generate a useable negative transcript.

The second problem... Well, I don't know what caused the second problem. Basically, I can't seem to generate a positive transcript that will verify without errors. All I'm trying to do is create the base-loadset.T transcript, using all the defaults in Radmind, and each time I do, on my server I get a list of positive transcripts with numbers like "994" appended to the file names, and my base-loadset.T file generates an unspecified error when I try to verify it on my server. I've tried this numerous times, and the same thing happens each time. Frankly, I'm sick of it. Each base-loadset.T creation takes upwards of half an hour, as the client must compare and then copy the entire loadset (essentially, all the files on the hard drive) to the server. Multiple failures at this stage are infuriating. But what's worse is that there seems to be no way to modify the configuration once it's been uploaded. Making a new loadset with the same name gives me the error "Loadset exists." So the only way to re-attempt loadset creation, or modify a loadset, is to erase it and start over. For a system that's all about monitoring and tracking changes to systems, this seems like a backwards approach. In any case, after a day of trying, I still have not successfully created a working positive transcript.

There are lots of problems with the Radmind implementation. One irksome problem is the inconsistency of just about everything in the application. (I'm talking about the GUI here.) For instance, running through the setup steps frequently yielded different results, both on the server and on the client -- sometimes I'd get errors, go back, repeat and get no errors; sometimes, after setting up my negative transcript, I'd be asked to set up my positive, sometimes I wouldn't. Also, the interface is ridiculously inconsistent: while running the setup steps, pressing the "Go Back" button does not take you back to the beginning of the setup steps. WTF? Maybe they should rename that button "Go Somewhere You've Never Been Before," because that's where it takes you. Another issue I faced was in altering a transcript: the latest version of the Transcript Editor completely garbles your transcript if you add an item. I had to use a previous version to add items to the list. And there's more: Adding a server to your server list in the Radmind preferences does nothing apparently. Even after doing this I was always queried for my server IP. Also, once added, a server cannot be removed from the list. There is a "remove" button, but it does nothing.

This is why Systems Admins should never design software.

It's been suggested that I try using Radmind from the command-line. I am tempted. But the problem is that the Radmind CLI environment and implementation is so complex that I'm liable to spend a week learning it, only to find that it still doesn't work. I've already been online reading various Radmind mailing lists, and people are having all manner of difficulty there as well, particularly going to Tiger. I just don't think, at this stage, it would be wise to continue with this plan when the product is clearly so problematic on so many levels.

Resignation:So, after all this planning and testing, I've decided not to use Radmind after all. My reasoning is basically twofold: 1) Radmind is supposed to make my life easier, not harder. Thus far it's only introduced complications to my life and to the process of administrating my systems. And this is just in setting up my base system. What problems will I encounter when I start adding the many gigabytes of application files? Seems to me a product that is designed to simplify lab management should be fairly straightforward and easy to master. If it's not, what's the point? Using Radmind only adds an extra layer of complexity that I'm not even sure I really need. 2) Radmind is supposed to make updates and installs less error-prone and more consistent throughout the lab. But, again, the Radmind process itself is inherently error-prone, at least in my (and many others online) experience. How can I rely on such a system for lab maintenance with any confidence at all? I simply don't trust it. And if Radmind breaks with each upgrade of Mac OSX (which it might or might not, I just don't know, but indications are that it does), then again, what's the point? For all my work, what do I get?

There's got to be a better way.

Seems to me like the ultimate Radmind solution, at least in GUI-land (or maybe even as CLI solution -- why not?) would be something quite seamless to the admin. (And yes, I will now try to outline what I would like to see in a Radmind-like solution despite having said, not three paragraphs earlier, that Sys Admins should never design software.) I envision something like this: There is an interface called "Base Builder." Here you configure your base system, which would be your Master Client. On the MC you open Base Builder and tell it to use the "Current System" as your base install and it uploads everything to your server. I don't need to see a list of files at this point. Just build the damn system. Keep your lists to yourself, thank you. After the system is built, you can create your "Exclusions" from within the Base Builder app. This is real simple too. You just drop the folders you want to exclude into a GUI window, and then the properties of each exclusion. Now you've altered your base install, so you get a window that says something like, "Base install has changed. Would you like to rebuild the base install on the server?" and a big, fat "Yes" button. Hit "Yes" and your changes are propagated to the server. Simple. And when it's time to add applications or other layers to your base, you go to the "Layer Builder" interface. This is similarly easy to use. Here, you tell the app where to find your base install, or you can say "User Current System." Layer Builder will take a snapshot of the base install and keep that snapshot. You'll install your apps, then tell Layer Builder to create the new layer from a comparison between the base install snapshot and the new system. Layer Builder will allow you to name your new layer, and then save a snapshot of the layer. Finally, you'll have a "Sets Builder" interface. Here you can combine various sets of layers to create different configuration for different machines. This would have three panes: A "Layers" pane, a "Sets" pane and a "Computers" pane. You'd drag layers from the Layers pane into sets in the Sets pane to create your various configs. Then you'd drag sets to computer lists in the Computers pane. In the Computer pane you could run "Compare" to see differences between the actual computer's files and the files in the set. And if there were differences, you could propagate them to the client using something like, oh, I don't know, an "Update" button, let's say. And that's it, basically. For advanced users, you could look at the file lists and make changes between the server and the MC and the various clients. This seems like a key missing feature in Radmind. The ability to change the base config, or any of its transcripts in any meaningful way seems to be absent. These sorts of changes require rebuilding everything, which takes a great deal of time and effort, and is error prone. And isn't the whole point of this to make the process of lab maintenance easier and less error prone?

I realize that what I've described is essentially what Radmind is and does. But Radmind does it in such an abstract and confusing way that the process becomes needlessly complex and defeats its own purpose. Something that companies like Apple understand -- and this is a large part of why I prefer the Mac platform -- is that good visual design and clear language can make an interface, or even a CLI app, a breeze to use, and that that is actually the point of GUI applpications: To make a difficult and confusing process clear and intelligible. Radmind's visual interface is a mess, and its language is dizzingly obscure. Here's a list of termi

nology, for example: negative transcript, positive transcript, command file, loadset, base loadset, overload, configuration. Here is a list of some of the files involved: negative.T, base-loadset.T, base.K. The files that end in .T are transcript files, which are lists of files belonging to a loadset. Get it? Of course not. Who would?

It's totally ridiculous.

Now that I've gotten that off my chest, I need to come up with a good way to proceed with my lab update. And although I am scrapping Radmind for now, I would still like to think of a way to ease future updates and remove the inconsistencies from the way I've done things in the past. There are a few options here. These involve disk images and databases for the most part. In any case, I will be giving this some serious thought as I move forward. But these issues will be the topic of a future article.

Tiger Lab Migration Part 2: Client/Server Interaction

As foretold, I've rebuilt my admin machine with fresh-from-the-factory Tiger. Nice. Things are going well.

I'm testing right now. Partly, I'm testing how Tiger does from a clean install. I'm also testing its interaction with my Panther Server. So far I have hit one, minor snag here, and it appears to be a bug in Tiger's Directory Access application.

In Panther Client's version of DA, the LDAPv3 configuration panel was set up by double-clicking the "LDAPv3" entry under "Services," then clicking "New..." at the bottom of the drop-down panel, and entering your server info in the available fields of the new entry. The Tiger version is slightly different: In Tiger, pressing "New..." brings up a dialogue box called "New LDAP Connection." Here you can enter information about how you want your client to use LDAP (i.e. for contact, or authentication, and whether or not to use SSL encryption). At the bottom of this dialogue is a "Manual" button which allows you to set up the panel the old, Panther-style way. Being old-school, I chose "Manual."

Silly me.

Turns out, setting up the server binding with the "Manual" button works, but the settings don't survive a restart. I tried this numerous times, and it would initially work, immediately binding to the server (which, by the way, is still Panther, and this may be part of the problem, but I seriously doubt it). But after restarting, though the entry would still exist in DA, the binding would be broken. No authentication, no computer management, nothing. The way to get it to work is to use the new, and supposedly improved, dialogue that pops up when you press "New..." in the LDAPv3 configuration panel -- the aforementioned "New LDAP Connection" window. Using this method to set up binding to my Mac Server worked. The new entry, once created in this manner, can be edited after the fact if need be. And best of all, the binding survives a reboot of the client.

Actually, the new dialogue does have one really nice thing going for it: It sets up the server paths in "Authentication" and, if you tell it to, in "Contacts." You used to have to set these up manually, in a separate steps, under the "Authentication" and "Contacts" panes, but now, if you check the appropriate checkboxes, DA does it all for you from the "New LDAP Connection" panel. Nice.

Well, it would be nice, if the "Manual" method worked. Still, it's better than a kick in the teeth.

So, with authentication working between Tiger client and Panther Server, I'm over halfway there. In addition to authentication, preference management (things like Login Items and such) appear to be working. The last thing on the client/server relationship checklist is services, mainly printing services. I'll be kicking this around for a bit, and then I'll be starting my Radmind work.

Just a preview of that: To start, I will install Radmind on my admin box and set it up a a Radmind server. Once that's all up and running (and I should really do some checking up to make sure Radmind is Tiger compatible), the odious task of setting up the master client from scratch will begin. To reiterate, this will be a machine that has everything that I plan to install on the various workstations on the floor. From this, sets will be made for each hardware/software configuration. Each change to the master client will have to be tracked by the server, starting with the base install, and building loadsets (overloads?) as software is added. This will take some time and patience, but it should be worth it in the end.

Let's hope.

So, when next we meet, I will be setting up Radmind. I will do my best to be as detailed and comprehensive in the documentation of this process as I possibly can be.

Oh yeah, and one other thing before I install Radmnd: I'm cloning my working Tiger install. 'Cause you never know.

UPDATE 1:
I have begun building my Master Client machine. The problem of server/client binding failing after reboot is now occuring on this machine, and using the "New LDAP Connection" window to set it up does not seem to work in this case. Thus far, I have been unable to bind my Master Client to the Macserver in a way that the binding survives a restart. This would seem to be the salient error in the system log:
Jun 27 19:43:45 systemsBoyMac /System/Library/CoreServices/mcxd.app/Contents/MacOS/mcxd: DSOpenNode(): dsOpenDirNode("/LDAPv3/192.168.1.10") == -14002

Simply opening the Directory Access application, authenticating, and opening the LDAPv3 configuration panel will bind the client to the server. The message that gets written to the log in this case is:
Jun 27 19:50:02 systemsBoyMac /System/Library/CoreServices/mcxd.app/Contents/Resources/MCXCacher: CacheUser(0, systemsboy) == -14136

I don't get it. I'll post when I find the solution.

UPDATE 2:
Deleted the /Library/Preferences/Directory Access directory -- which contains the preferences for, obviously, Directory Access -- rebooted (twice) and was able to login as a network user both times. I'd done this before. The differnce this time was that I logged in as a network user first, before logging in as a local user. I'm making a third attempt this very moment. The computer is rebooting... Now logging in as a local user... Good... Now as a network user... No luck! The login screen shakes it head at me. It would appear that setting up binding, rebooting, then logging in as a local user will break the binding. Man. That's fucked up.

More to come...

UPDATE 3:
So now I've trashed the DA prefs again. But when I launch the DA application, it's still set up with the old binding config. So apparently, there's some pretty nasty caching going on. I'm trashing all prefs in /Library now, and clearing the mcx_cache from NetInfo. Rebooting. Setting up the Network. Setting up DA. Rebooting. Now I'm bound, and network logins work. What a blast. I hate this shit. Trying again -- rebooting, logging in as a network user, success. One more time, this time local user logs in first -- reboot, wait... Binding is broken... WTF!? I must admit, I'm stumped.

Will keep you posted...

UPDATE 4:
Okay, now this is too bizarre. There are telltale signs that the client is not bound. One of them is that the "Other..." button does not appear in the list of users at the login window; there are only local users in the list. The other is that the server does not appear in /Network/Servers. When the client is bound, you will see a network mount (or actually, a symlink to the network mount) in this directory. So I'm poking around in the Terminal on my client, with the /Network/Servers window open, and I'm looking at logs and whatnot, and all of a sudden I see the server mount point show up. Out of nowhere. And it occurs to me, maybe the binding is just slow. So I reboot the client and just leave it at the login screen. After about three minutes, the "Other..." button shows up. Just pops right up. Client is bound.

I can reproduce this on two machines now, but I can't explain why it takes so long, and why this behavior is so inconsistent. My server is at 10.3.8 and has some strangeness about it. I will be looking at it to see if these problems are perhaps now manifesting themselves more obviously with Tiger clients. I will also consider upgrading the server to 10.3.9, as there are apparently changes to the LDAP schema that were made with that update that were in preparation for Tiger.

I'll let you know what I find...

UPDATE 5:
Another anomaly: Logging in to a bound client and navigating to /Network/Servers and selecting the server automount (we have a home account automount) causes the Finder to beachball indefinitely. Relaunching the Finder kills the beachball, but the network mount is broken (i.e. there is an empty mount point), and lots of automount errors in the system log. Oy! Time to fix/update the server.

I am now cloning my Macserver in preparation for moving to 10.3.9. Hopefully this will at least resolve the automount issues, and maybe even the slow binding issues. Of course, the ultimate solution will be to migrate to Tiger server. Still no idea when I'll be getting my disks, but when I do, there's a great article on the migration process at AFP548.

UPDATE 6:
I have updated my Panther Server to 10.3.9. Though the upgrade went smoothly, and I have experienced no problems with it, it has not solved my Tiger client problems, which are, to reiterate:
1. Binding to the server from a Tiger client takes approximately three minutes after reboot to occur. So there is a three minute period after a reboot during which a network user on a Tiger client cannot login. Suddenly, after three minutes or so, he/she can.
2. Network home account mounts from the Panther server do not automatically mount on the Tiger client. They should, and they do in Panther client. Here's the error message from the Tiger client system log:
Jun 30 16:42:18 systemsBoyMac automount[241]: Can't mount pantherServer:/Volumes/FlashDeveloper on /private/Network/Servers/pantherServer/Volumes/FlashDeveloper: Authentication error (80)

It's looking more and more like I'm going to have to install Tiger server before I can proceed much further.

Shit...

UPDATE 7:
Well, this is turning out to be more fun than a barrel of monkeys.

Finally got the network home account mounting and the user authenticating. It is not an automount problem, but rather an authentication problem. Apparenly, Tiger client cannot login to Panther server if the user's password is of type "Open Directory." Crypt passwords work fine. (I knew there was a reason I wanted to stay with crypt, but nooo, Apple said "Use Open Directory passwords. They're better." Yeah right.

I'm on my way to the Apple Discussions to see what I can dig up.

I'll let you know...

UPDATE 8:
(Hey, that rhymes.)

So I changed the password type of my network user to "Crypt" and could suddenly log in, right? Now here's something weird: I changed that same user's password back to "Open Directory," and guess what? It worked.

This would seem to indicate something screwy with the password server, but I'm hard pressed to say what it is. In any case, this presents a real problem for me, as I have about 50+ users with Open Directory passwords, and the way things are right now, they're not going to work in Tiger. The only way for me to change them is to get all 50 users to come in and change their passwords in Workgroup Manager, and that's just unacceptable to me.

This is why I long for a utility that lets OD admins change user password types without having to reset the password. This doesn't seem like a stretch to me, nor does it seem unreasonable, particularly if you have a lot of users, which I do. Because now I'm stuck with 50 or so users with Open Directory passwords that just won't work, and the only way to fix this is to reset their passwords, when really, all I want to do is change the password type to "Crypt," and all the data I would need to do that (i.e. the passwords) is there on the server. It's just inaccessible by the admin. So fine, give me a utility that lets me change the password type without resetting the password. I don't need to see the password to do this; the utility can do it. I just want to make the change, and I can't, and that's Bad. (And, BTW, the reverse could be true as well: What if you have 300 crypt-style users and you want to change them to Open Directory passwords? As it stands, I guess you're going to have a mightly long line outside your door.)

Anyway, from what I've read at the Apple Discussions, this sounds like it might be a problem not just with Panther servers, but with Tiger servers as well. I'm betting these are all upgraded or migrated servers, and that fresh Tiger (and maybe even Panther) server installs work just dandy, which is why only some people are experiencing problems.

Anyway, this is endlessly annoying. I'm done for the night...